The school “accountability movement” has relied in large part on standardized test scores to evaluate students, schools, teachers, principals and districts. It started under the No Child Left Behind Act, which went into effect in 2002 under President George W. Bush, grew during the Obama administration and is continuing with somewhat less fervor today.
The movement led to classrooms dominated by test prep and a severe narrowing of the curriculum to a primary focus on subjects being testing: reading and math. More and more tests were piled on during the school year, eventually sparking a grass-roots resistance nationwide in which parents opted their children out of tests. Even some supporters of using high-stakes tests as a key assessment tool came to realize that the movement had gone too far.
There are big questions that remain about the test-based accountability movement, including who allowed it to happen. Below is a Q&A with Koretz about his book and what he calls a “testing charade.”
Q) The title of your book is, if anything, provocative: “The Testing Charade: Pretending to Make Schools Better.” That sounds like a slam on the entire standardized testing accountability movement that started with George W. Bush and moved into the Obama administration. Is it?
A) Yes, it is a critique of the entire test-based accountability movement — in fact, going back years before George Bush and NCLB [No Child Left Behind]. If you line up the effects of this approach, the answer is clear: It has been a failure. The improvements it has produced have been limited, and these are greatly outweighed by the serious damage it has done. Of course, in many places, improvements appeared to be big, but most often, this was just inflated test scores.
Q) First please explain how the movement came about and what it looked like.
A) It had several sources. One was dissatisfaction with the achievement of American students, particularly those in the bottom of the distribution. Another was a widespread view that schools didn’t hold educators accountable in a useful way — an opinion that I share. Yet another was a simplistic notion of how tests could be used to provide more accountability. The resulting policies and practices took many different forms over the years, but the basic principle has stayed the same: Give a few tests and mete out punishments and rewards based on scores.
Q) Why do you call it a charade?
A) Because many people have been pretending that test-based accountability has been working as promised. Faced with pressure to raise scores, many educators cut corners, and one result was badly inflated score gains. Some cheated, but much of the fake improvement has been produced by bad test prep that isn’t considered cheating. Evidence of score inflation and bad test prep has been accumulating for more than a quarter century, but many people have turned a blind eye to this. So time after time, we have had proclamations of success, but it’s often a sham.
Q) What about the tests themselves? What do they really tell us about what individual kids actually learn?
A) Charade is not an anti-testing book. The problem is not tests. The problem is the misuse of tests. Tests can be a useful tool, but policymakers have demanded far more of them than is reasonable, and this has backfired.
Used appropriately, standardized tests are a valuable source of information, sometimes an irreplaceable one. For example, how do we know that the achievement gap between minority and majority students has been slowly narrowing, while the gap between rich and poor students has been growing? Or that students in the United States know less math than their peers in Japan? Standardized tests. Standardized tests can also be very useful to help diagnose the strengths and gaps in individual students’ learning.
But in our educational system, the use of tests has been anything but appropriate. Policymakers have ignored the fact that tests capture only some of what we want students to accomplish and even less of what we want schools to do. And they created perverse incentives that led educators to cut corners and inflate scores. Ironically, this made test scores less valuable than they would have been. Inflated scores don’t provide a trustworthy indicator of what students actually learn.
Q) Statisticians warned against the test results for high-stakes decisions, yet very smart people in policymaking decisions did it anyway. Is there a way to explain that?
A) That is a troubling question that I raise in “The Testing Charade.” None of problems I describe in the book are new. The warning flags have been up for a very long time. For well over 60 years, testing experts have warned educators that pressure to raise scores would cause score inflation and that test scores by themselves are not sufficient to evaluate schools. Over 40 years ago, in one of the most cited papers in the social sciences, Don Campbell repeated the warning about score inflation and the corruption of instruction. As I note in “Charade,” studies documenting bad test prep and score inflation in response to high-stakes testing started appearing almost 30 years ago, and the first study documenting more severe score inflation among disadvantaged students — and, hence, illusory improvements in achievement gaps — was published more than 15 years ago. And very consistent evidence of these problems continued to accumulate over the years.
So your question is right on target: Why did people persist with this approach despite all of those warnings and all of the evidence? Just based on my own experience, I think it was for several reasons. Some policymakers simply didn’t know; most don’t read social science; and many had no experts on hand to warn them. Others were confronted with the evidence but blew it off — like the superintendent I mention in “The Testing Charade” who told me that there couldn’t be a problem with score inflation in his district because the gains were too rapid! Some did understand the problem but were hamstrung by political constraints, particularly after the enactment of No Child Left Behind. A few tried to confront the problems head-on, but, given the constraints they faced, there wasn’t a great deal they could do. And, finally, I have to point the finger at some social scientists and testing experts. Many chose to downplay or simply ignore the problem. In fact, many still do, which has given policymakers cover to continue with these failed policies.
(Correction: The study documenting more severe score inflation among disadvantaged students — and, hence, illusory improvements in achievement gaps — was published more than 15 years ago, not more than 25 years ago, as an earlier version said.)