Many of the guest writers on this blog are opposed to the Common Core State Standards, not because they object to standards per se but they find fault with this particular set for one reason or another. This post was written by someone who supports the Core, but takes a critical look at New York State’s Common Core-aligned tests, specifically the eighth-grade math exam that was designed by Pearson. Grant Wiggins is the co-author of “Understanding by Design” and the author of “Educative Assessment” as well as numerous articles on education. A high school teacher for 14 years, he is the president of Authentic Education, in Hopewell, New Jersey, which provides professional development and other services to schools aimed at improving student learning. Wiggins gave me permission to republish this post, which first appeared on his blog, *Granted, and…*. It should be noted that these tests are different from the Common Core exams that are being developed by two multi-state consortia, known as PARCC and SBAC, and being given to students for the first time this school year, but educators have already raised concerns that those two assessments will not represent the “groundbreaking” next-generation exams that Education Secretary Arne Duncan had once said they would be.

By Grant Wiggins

The Common Core State Standards are just common sense – but that the devil is in the implementation details. And in light of the unfortunate excessive secrecy surrounding the test items and their months-later analysis, educators are in the unfortunate and absurd position of having to guess what the opaque results mean for instruction. It might be amusing if there weren’t personal high stakes of teacher accountability attached to the results.

Using the sample of released items in the New York Common Core tests, I recently spent some time looking over the eighth-grade math results and items to see what was to be learned – and I came away appalled at what I found.

Readers will recall that the whole point of the standards is that they be embedded in complex problems that require both content and practice standards. But what were the hardest questions on the 8^{th} grade test? *Picayune, isolated, and needlessly complex calculations of numbers using scientific notation*. And in one case, an item is patently invalid in its convoluted use of the English language to set up the prompt, as we shall see.

As I have long written, there is a sorry record in mass testing of sacrificing validity for reliability. This test seems like a prime example. Score what is easy to score, regardless of the intent of the Common Core Standards. There are 28 eighth-grade math standards. Why do such arguably less important standards have at least five items related to them? (Who decided which standards were most important? Who decided to test the standards in complete isolation from one another simply because that is psychometrically cleaner?)

Here are the released items related to scientific notation:

It is this last item that put me over the edge.

**The item analysis.** Here are the results from the Board of Cooperative Educational Services report to one school on the item analysis for questions related to scientific notation. The first number, cast as a decimal, reflects the percent of correct answers statewide in New York. So, only 26 percent of students in New York got the first item, Question #8, right. The following decimals reflect regional and local percentages for a specific district. Thus in this district, 37 percent got the right answer, and in this school, 36 percent got it right. The two remaining numbers thus reflect the difference between the state score for the district and school (.11 and .10, respectively).

Notice that, on average, only **36%** of New York State 8^{th} graders got these 5 questions right, pulling down their overall scores considerably.

Now ask yourself: given the poor results on all five questions – questions that involve isolated and annoying computations, hardly central to the import of the Standards – would you be willing to consider this as a valid measure of the Content and Process Standards in action? And would you be happy if your accountability scores went down as a teacher of eighth-grade grade math, based on these results? Neither would I.

There are 28 standards in eighth-grade math. Scientific Notation consists of four of the standards. Surely from an intellectual point of view the many standards on linear relationships and the Pythagorean theorem are of greater importance than scientific notation. But the released items and the math suggest each standard was assessed 3-4 times in isolation prior to the few constructed response items. Why five items for this standard?

It gets worse. In the introduction to the released tests, the following reassuring comments are made about how items will be analyzed and discussed:

Fair enough: you cannot read the student’s mind. At least you DO promise me helpful commentary on each item. But note the third sentence: The rationales describe why the wrong answer choices are plausible but incorrect and are based on common errors in computation. (Why only computation? Is this an editorial oversight?) Let’s look at an example for arguably the least valid questions of the five:

Oh. It is a valid test of understanding because you say it is valid. Your proof of validity comes from simply reciting the standard and saying this item assesses that.

Wait, it gets even worse. Here is the “rationale” for the scoring, with commentary:

Note the difference in the rationales provided for wrong answers B and C: “may have limited understanding” vs. “may have some understanding … but may have made an error when obtaining the final result.”

This raises a key question unanswered in the item analysis and in the test specs. Does computational error = lack of understanding? Should Answers B and C be scored equal? (I think not, given the intent of the standards). The student “may have some understanding” of the standard or may not. Were Answers B and C treated equally? We do not know; we can’t know given the test security.

So, all you are really saying is: wrong answer.

Answers A, B and C are plausible but incorrect. They represent common student errors made when subtracting numbers expressed in scientific notation. Huh? Are we measuring subtraction here or understanding of scientific notation? (Look back at the standard.)

Not once does the report suggest an equally plausible analysis: students were unable to figure out what this question was asking!!! The English is so convoluted, it took me a few minutes to check and double-check whether I parsed the language properly:

Plausible but incorrect. The wrong answers are “plausible but incorrect.” Hey, wait a minute: that language sounds familiar. That’s what it says under every other item! For example:

All they are doing is copying and pasting the SAME sentence, item after item, and then substituting in the standard being assessed!! Aren’t you then merely saying: We like all our distractors equally because they are all “plausible” but wrong?

Understanding vs. computation. Let’s look more closely at another set of rationales for a similar problem, to see if we see the same jumbling together of conceptual misunderstanding and minor computational error. Indeed, we do:

Look at the rationale for B, the correct answer: it makes no sense. Yes, the answer is 4 squared which is an equivalent expression to the prompt. But then they say: “The student may have correctly added the exponents.” That very insecure conclusion is then followed, inexplicably, by great confidence: “A student who selects this response “understands the properties of integer exponents…” – which is of course, just the Standard, re-stated. Was this blind recall of a rule or is it evidence of real understanding? We’ll never know from this item and this analysis.

In other words, all the rationales are doing, really, is claiming that the item design is valid – without evidence. We are in fact learning nothing about student understanding, the focus of the standard.

Hardly the item analysis trumpeted at the outset.

Not what we were promised. More fundamentally, these are not the kinds of questions the Common Core promised us. Merely making the computations trickier is cheap psychometrics, not an insight into student understanding. They are testing what is easy to test, not necessarily what is most important.

By contrast, here is an item from the test that assesses for genuine understanding:

This is a challenging item – perfectly suited to the standard and the spirit of the standards. It requires understanding the hallmarks of linear and nonlinear relations and doing the needed calculations based on that understanding to determine the answer. But this is a rare question on the test.

Why should the point value of this question be the same as the scientific notation ones?

In sum: questionable. This patchwork of released items, bogus “analysis” and copy and paste “commentary” give us little insight into the key questions: where are my kids in terms of the standards? What must we do to improve performance against these standards?

My analysis, albeit informal, gives me little faith in the operational understanding of the standards in this design, without further data on how item validity was established, whether any attempt was made to carefully distinguish computational from conceptual errors in the design and scoring- and whether the test-makers even understand the difference between computation and understanding.

It is thus inexcusable for such tests to remain secure, with item analysis and released items dribbled out at the whim of the Department of Education and the vendor. We need a robust discussion as to whether this kind of test measures what the standards call for, a discussion that can only occur if the first few years of testing lead to a release of the whole test after it is taken.

New York State teachers deserve better.