A student works on math problems as part of a trial run of a new state assessment test on Feb. 12 at Annapolis Middle School in Annapolis, Md. (Patrick Semansky/AP)

Julie Campbell is a fifth-grade teacher in Dobbs Ferry, N.Y. She recently completed “Scorer Leader Training” for the English Language Arts Common Core test given in her state to fifth graders, and she says that what she discovered shocked her. Because she signed a confidentiality agreement regarding the current test, she can’t discuss it, but she did take what she learned from her training and applied it to last year’s publicly released fifth-grade English Language questions and “anchor papers” that were released by the New York Department of Education. This is her report.

 

 

By Julie Campbell

What do I do as a teacher when asked to perform an act that goes against my conscience?

Let me begin by saying that I am not against Common Core. I think that there are plenty of good ideas inherent in these standards. I am in favor of the rigor, of the push for critical thinking skills. Certainly I have a few qualms. I wish that educators had had more of a say in their development. I do worry about how the focus on English Language Arts and math negatively impacts the arts and humanities – particularly, the teaching of social studies. I am a bit queasy about the emphasis on nonfiction texts at the expense of literature, and I also have some concerns about the developmental appropriateness of the Common Core Learning Standards as they apply to our youngest learners in grades K-2. All that being said, as a fifth-grade teacher, I find the standards a great starting point for high quality, deep, meaningful instruction. I am not anti-Core.

Additionally, I am not opposed to standardized testing. I think that a good deal of information can be garnered through standardized tests. I started my teaching career working for a nationally renowned SAT prep company, and I learned a whole lot working in the testing business. Standardized tests certainly have their place in education, but they also have their limits. The marriage between standardized testing and Common Core is not a happy one. If the Common Core standards are truly about deep learning and critical thinking, these qualities are some of most difficult to assess using a standardized measure like a multiple-choice test.

While there has been a great deal of “buzz” in the community, in the state, and in the social media about testing lately, it is not my intention to rehash the surfeit of issues already in play. I don’t want to talk about Common Core, the opt-out movement, the length of hours kids sit testing (particularly special education students), the reading level of the passages, Race to the Top, No Child Left Behind, Gov. [Andrew] Cuomo, or the inefficacy of VAM (value-added model), which uses student standardized test scores in questionable ways to evaluate teachers. I don’t want to talk about unions or politics, Michelle Rhee or Eva Moskowitz.

I want to talk about the test itself. It is a fundamentally flawed tool that will only debase the good work we teachers do in the classroom, the work that districts do in designing and implementing quality curriculum, and the work our students do in learning to become enlightened critical thinkers.

Read the newspapers, read your twitter feeds, and you will find lots of people talking about testing but not about the actual test. There’s a reason for that, of course: security. Since the new Common Core tests were put into place, heightened test security to the point of paranoia has become the new normal.  Teachers sign gag orders before they administer exams and sign confidentiality agreements when tests are scored.

This raises another critical issue: scoring. Sending tests away to outside scoring agencies has become the default for most schools. It used to be that most districts sent their teachers to Scorer Leader Training through BOCES and graded tests exclusively “in house”. This was the status quo. This week I attended Scorer Leader Training where my school was the only one represented. We used to have teachers from dozens of schools filling enormous conference rooms. On Tuesday I sat in a room with an instructor and a fellow teacher from my district. With fewer teachers grading these materials themselves (and it really has become a prohibitive process to grade your own tests with rules ranging from no food or drink in the testing room – except sucking candy, no cellphones, lock down procedures for every sheet of paper that rival the protocols used by the National Security Agency, and a small army of scorers reviewing every test) fewer people are talking about what I believe to be the most fundamental problem with testing in the state of New York today: the tests themselves.

Because I have signed a confidentiality agreement, I am not at liberty disclose information about the ELA tests that I have been trained to score this coming Monday. But I do feel comfortable making a few general statements about my experience at Scorer Leader Training, and when possible, I will try and reference questions from last year’s fifth grade ELA test released in 2014 by NYS and publicly available on the Engage NY Web page.   I will make the best of the 2014 examples, but assure you that the problems I am about to describe are glaring and ubiquitous in this year’s exam as well.

First things first, one of the most disturbing trends that I have found examining this year’s and last year’s (released) tests is a shift in thinking toward a kind of intellectual relativism. In other words, any claim that a student makes is correct if he or she substantiates it with some evidence. On the surface this doesn’t sound terribly problematic, but when you start to examine some of the anchor papers, the dilemma with this vein of thinking becomes shockingly apparent. The truth is, not all claims are correct and not all evidence is created equal. Making a feeble claim and using evidence out of context to support that claim is an all too common occurrence on these tests.

I cannot give you specific examples from this year’s test, but I ask you to read the paired passages on pages 59-60 and 68-69 from last year’s test (How to Be A Smart Risk Taker and The Young Man and the Sea). https://www.engageny.org/resource/test-guides-for-english-language-arts-and-mathematics [1]

The four-point extended response question is troubling in and of itself because it instructs students to: explain how Zac Sunderland from “The Young Man and the Sea” demonstrates the ideas described in “How to be a Smart Risk-Taker.”  After reading both passages, one might find it difficult to argue that Zac Sunderland demonstrates the ideas found in “How to be a Smart Risk-Taker” because sailing solo around the world as a teenager is a pretty outrageous risk! But the question does not allow students to evaluate Zac as a risk taker and decide whether he demonstrates the ideas in the risk taker passage. Such a question, in fact, could be a good critical thinking exercise in line with the Common Core standards! Rather students are essentially given a thesis that they must defend: they MUST prove that Zac demonstrates competency in his risk/reward analysis.

So one can hardly be surprised to find an answer like this:

 One idea described in “How to be a Smart Risk-taker” is evaluating risks. It is smart to take a risk only when the potential upside outweighs the potential downside. Zac took the risk because the downside “dying” was outweighed by the upside (adventure, experience, record, and showing that young people can do way more than expected from them). (pg 87)

Do you find this to be a valid claim? Is the downside of “dying” really outweighed by the upside, “adventure”? Is this example indicative of Zac Sunderland being a “Smart Risk Taker”? I think most reasonable people would argue against this notion and surmise that the student has a flawed understanding of risk/reward based on the passage. According to Pearson and New York State, however, this response is exemplary. It gets a 4.

This is just one example that I am allowed to share with you because it has been released by the state. I saw so many variations of this kind of broken thinking when examining the anchor papers at Scorer Leader Training that it made my head hurt.

So where does this anything-goes scoring philosophy come from? I have a guess. In the old days of standardized testing, for each question, scorers were provided with a bulleted list of plausible answers. Most student answers neatly fell within range, and the oddball answer that just maybe could be right warranted a call to the state.   Now, ALL answers are plausible! I suspect this was Pearson’s way of proving that their higher level, critical thinking inducing, Common Core based test questions are so intellectually rigorous that they needn’t provide given lists of right answers for any of their questions. They can’t be accused of forcing kids to be “widgets” if they allow for every possible contingent answer. After all, such a rationale supports outside the box “thinking”. The problem with this fallacy is that not every answer is right; not every claim is valid. Part of being a true critical thinker, part of being an intelligent reader and writer is the ability to make sound and valid claims that are supported with appropriate textual evidence.

In some cases, it seems that acceptable answers with flimsy claims are being employed as a remedy for vague or seemingly absurd question. Take, for example, the two-point question associated with “How to be a Smart Risk Taker” which reads: According to the author, what is the value of being a smart risk-taker? Use two details from the article to support your answer.” How does one classify a “value” question? According to the NYS manual, this is really a “main idea” question in disguise. Here is an answer that got full credit:

 “The value of being a smart risk taker is you choose what you think is right by making a list of upsides and downsides. Also, if you take risks you can try something like joining a club at school and really end up enjoying it.” (pg64)

Here’s the breakdown for why this answer got full credit. According to Pearson “you choose what you think is right” is the first inference. The list of upsides and downsides is one detail. The student then uses an unrelated second detail about joining clubs and school and makes a second inference that you may really end up enjoying it. Formulaically speaking: inference + 2 details will always yield a correct answer[2]. What we have here is a confusing and clumsy answer to a confusing and clumsy question.

One might argue that this way of scoring allows students to scrape up extra points and is actually a boon to teachers and students alike. It boosts scores!   Hurrah!

But in fact, it creates a terrifyingly slippery slope. I think about climate change deniers, the Creationist Museum in Kentucky that shows humans and dinosaurs roaming Earth side-by-side, 9-11 conspiracy theorists, and the Holocaust itself! Throughout history, people have made misguided claims and have supported their thinking with spurious details and evidence. Don’t our children deserve better?

Another disturbing pattern that emerges as one reads the anchor responses for the ELA is what I call “The Easter Egg Hunt.” When it comes to short answer questions in particular, the question that is actually being posed rarely matches the answer required. The wordier the written response, the more likely it is that the student will stumble upon the correct answer, find the decorative egg. (Strategy!) Time after time there is a clandestine condition that must be met in order for an answer to get full credit – “Magic Words.” As my scoring instructor illustrated, it’s kind of like tossing all of the words into a bucket and looking for certain key phrases or ideas to float up to the top.

Often this involves making an “inference.”   Pearson seems terribly confused about what is and is not an inference, and what is and is not a detail. There is no consistency across questions or anchor papers with regard to inferring – but more on this later. In order for a student to get full credit (two points) on a short answer question they must make an inference even if the answer is clearly stated in the passage and doesn’t require any kind of inferring. And of course, the inference can be dead wrong, and that is okay. In real life, if someone asks you a straightforward question, you most likely answer directly and succinctly with correct and appropriate details. In “test life” the best this can get you is partial credit.

 

Ie:

What time is it?                                  12:00 (partial credit)

What time is it?                                  12:00, which is noon when we eat lunch (full credit)

 

Another serious flaw when it comes to the questions on the test is that the questions do not assess the Common Core standards in meaningful, authentic, or even accurate ways.  After reading the story “Deep” (pgs 42 – 44) students were given this gem:

 “How does the narrator’s point of view contribute to the mood of the story? Use two details from the story to support your response.”

When I teach about point of view in literature, I like to examine the motives and deep inner workings of character with my students. For example: how does Gilly’s[3] point of view (as a throw-away child who’s been at the mercy of the foster care system her whole life) affect the way she treats Mrs. Trotter (her new foster mother). Or how does your understanding of the character Mongoose[4] change when the third person lens shifts to Weasel in the second half of the story? Or how is it possible that Crash[5] and Abby, brother and sister, have such different points of view when it comes to Penn Webb and to their family situation? These are genuine, authentic literary questions that have to do with point-of-view.

I have learned that when it comes to state testing, point of view has strictly to do with whether the story is told from a first person or a third person perspective. That is all. If the story is told in third person, you are expected to write something about how a third person point of view allows you to see the thoughts and actions of the character. This is a vapid and shallow bastardization of the very concept of “point-of-view.” It is evident in the question above. How does point of view contribute to mood? The very question doesn’t make any sense! An author doesn’t use point-of-view to develop mood!

Apparently, Pearson was trying to hit Standard 5.6: Describe how a narrator’s or speaker’s point of view influences how events are described. In “The Great Gilly Hopkins,” I would argue that Gilly’s jaded point of view influences the condescending way that she describes the Trotter’s home, family, and the interactions with them at dinner.   But the question above is in no way close to hitting the mark! This is just one example of a standard that has been butchered by the Pearson test-making machine.

I would love to talk about writing and language standards on this test. One of the saddest things I’ve witnessed, having surveyed more than 300 anchor papers, is the proliferation of test prep language and evidence of test prep “brainwashing” jargon in student answers. Students relied on (and misused) catch phrases, almost always to the benefit of their score. Empty phrases and flowery transitions were sprinkled everywhere: meaningless drivel. Clear, concise writing was scarcely seen, and when it was, it was detrimental to the child’s overall score. It seems to me that the convoluted language of the test questions is being translated into the convoluted double-speak that has become the written language of our children. It breaks my heart.

Wordy block quotes that may or may not support the main idea? Points. Overdone introduction with a dubious thesis? Points. Lengthy conclusion? Points. Quality thoughtful, succinct, correct information?   Partial credit.

The last point I want to make about the test itself is the inconsistency in how it is scored. One gets the distinct impression that there were “too many cooks” employed in the making of the test questions and scoring materials. There is rampant inconsistency across questions, constant and disorienting confusion about what is a detail and what is an inference, maddening contradictions, and a reliance on guide papers that must be accepted as “correct” despite blatantly incorrect explanations. Just as we are forced to accept incorrect claims from our students, we must also accept the incorrect claims made by Pearson with absolutely no recourse for complaint.

This year’s Scorer Leader Training was my line in the sand. I can no longer be complicit, nor can I be complaisant. If these tests are the cornerstone of our state’s education plan, the house we are building cannot stand. We cannot build education policy off of a shoddy foundation. We cannot pretend that these tests have genuine educational value – that they are an adequate metric for our Common Core standards. We cannot pretend that these tests can meaningfully and accurately assess our students, our teachers, our principals, and our district at large. It is a farce. The emperor is wearing no clothes!

What’s more nefarious, is that the true crisis in American education meant to be addressed by the implementation of these standardized assessments – the shameful achievement gap between children across New York state and across the country – cannot possibly be bridged using a fatally flawed exam. As poet and activist Audre Lorde once said, “For the master’s tools will never dismantle the master’s house. They may allow us temporarily to beat him at his own game, but they will never enable us to bring about genuine change.” Such is the case with standardized testing here; the seeds of progress cannot germinate in barren soil.

Genuine change must begin with us.

I am passionate about teaching, an advocate for public education, and a firm believer in the fact that “independent thinkers can change worlds.” I’m ready to lift my hand from the buzzer. Are you?

 

[1] All of the passages referenced in this essay are from the 2014 Grade 5 NYS ELA and can on the above hyperlinked site.

[2] This is not always true. Throughout my training, there was a clear and arbitrary bias when it came to what details permitted and what details were forbidden. There was no reason or rhyme. This was when I cried.

[3] “The Great Gilly Hopkins,” by Katherine Paterson

[4] “The Library Card,” by Jerry Spinelli

[5] “Crash,” by Jerry Spinelli