This was written by Todd Farley, author of “ Making the Grades: My Misadventures in the Standardized Testing Industry.” Farley worked for 15 years in the K-12 testing business for many of the biggest players (Pearson Education, Educational Testing Service, American Institutes of Research, etc.) on many of the biggest tests (National Assessment of Educational Progress, California High School Exit Exam, Florida Comprehensive Assessment, Virginia Standards of Learning, etc.). A vesion of this appeared on The Huffington Post.

By Todd Farley

As school reformers try to push American education towards a place where standardized tests become more important than teachers, I have to wonder why. What exactly has occurred in standardized testing over the last decade that justifies such belief in large-scale assessments, or such blind faith in the completely unregulated, massively profitable industry that writes and scores NCLB tests?

We know that testing data can be manipulated to tell any story.

We know that a school administration—by making test questions easier or lowering cut scores—can portray improvement in its classrooms even when such improvement doesn’t really exist, as happened most recently in 2009 in the New York City schools.

We know that “rogue” teachers or administrators—by erasing incorrect student answers and changing them to correct ones—can show student achievement even if there is no such achievement, as scandals in Atlanta and Detroit during 2010 both revealed and the current erasure investigation in Washington, D.C. suggests.

And we know, as I describe in my book, that the testing companies fudge numbers all the time, whether reliability numbers (to show the industry is doing a more “standardized” job than it really is); validity numbers (to show the industry is doing a more accurate job than it really is); or score distribution numbers (when test scoring companies work to ensure student results match the predictions of their own psychometricians).

Psychometricians, of course, are the rock stars of the testing world, omniscient statisticians doing a job virtually no one comprehends. While I don’t claim to understand their mysterious math, I do find it odd that during my long career writing and scoring tests, I only once laid eyes on a psychometrician, and that was during a pick-up soccer game at the Educational Testing Service. Never when I wrote tests, or scored tests, or met with teachers to discuss those tests, did I see a psychometrician, meaning the most important people in the testing industry are people who don’t often know what the tests look like and don’t usually see the students’ answers to them.

We also know the testing industry regularly messes up.

In the last decade or so scoring errors have occurred on tests returned to students in Arizona (1999-2000), Washington (2000), Virginia (2005), Florida (2006), South Carolina (2008), and Minnesota (2010), not to mention Indiana, Illinois, Connecticut, etc...

In 2000, a scoring error by NCS-Pearson (now Pearson Educational Measurement) led to 8,000 Minnesota students being told they failed a state math test when they did not, in fact, fail it (some of those students weren’t able to graduate from high school on time).

In 2004, ETS erroneously informed over 4,000 teachers they had failed a PRAXIS exam that they had actually passed, leading to lost jobs and lawsuits aplenty. In 2006 Pearson again erred, giving lower scores than were deserved to more than 4,000 students taking the SAT, with the company making the excuse (apparently with a straight face) that their blunder resulted due to “abnormally high moisture content” in that year’s score sheets. Also, most of those errors were discovered only after a test-taker complained about a score, not when any company voluntarily disclosed the problem, raising questions about the legitimacy of every other test administered over the last 10 years.

Even without such obvious errors, in my career there seemed to be a major disconnect between the profit motive of the testing industry’s major players (Pearson Education, McGraw-Hill, Riverside Publishing, ETS K-12, DRC…) and any altruistic goals for American education. For the many years I scored students’ tests, I saw an industry primarily focused on meeting deadlines and completing contracts, with the importance of the correct scores being put on tests coming in second to the rush to get any score put on them. My work in test development was no different, with the companies who employed me willing to take huge shortcuts in developing tests because meeting a contract’s deadline was clearly more important than the quality of any assessment.

Last year I was amazed to see the management of a publishing company giving its test developers only four weeks to produce K-12 assessments for the Detroit Public Schools (a school system now bankrupt but then willing to pay millions to a testing company); later, however, that short time-frame looked like a leisurely vacation compared to breakneck pace the company next worked its employees at, when the staff was required to pound out more than 200 Common Core Standard tests over the next two months.

Two hundred tests is probably more than a not-for-profit like ETS has developed in its entire history, but in a rush to address the new CCS market that company worked its employees nearly to exhaustion and seemed willing to go to any length to write tests: They recycled items used many times on previous assessments, re-aligned items to link them to academic standards they were barely linked to, hired people with neither teaching nor testing experience to work as full-time test developers, and re-hired testing vendors previously fired for the poor quality of their work (one of those vendors celebrated its renewed contract by immediately advertising on Craigslist, hoping to find anyone at all willing to write test questions for $8 each).

It’s not like questions about the efficacy of the testing industry haven’t long been raised, since even before the dramatic increase in testing that has recently resulted from NCLB. In 2001, a New York Times story about testing errors quoted various employees at a test-scoring factory in Iowa City who doubted the quality of the work being done: “There was a lack of personnel, a lack of time, too many projects, too few people,” one said, while another noted the surfeit of work she faced meant she was concerned about “[her] ability, and the ability of the scorers, to continue making sound decisions and keeping the best interest of the students in mind.”

In 2002, Amy Weivoda raised similar concerns at Salon.com, noting her experience scoring tests “led [her] to believe with absolute certainty that standardized tests are an utter waste of money and valuable teaching time, and that they measure nothing more than a state government’s willingness to waste money.” She pointed out that some of her test-scoring colleagues “were so spectacularly sociopathic they could find no other work. Some of the scorers had earned their degrees in prison.” In 2009, a test scorer in Jacksonville, Florida wrote a two-act play about his career, a drama he said highlights the “silliness” of standardized testing.

In 2010, Dan DiMaggio cited many of the same issues in The Monthly Review , writing of test scoring being standardized only in its “mystifying training process, supervisors who are often more confused than the scorers themselves, and a pervasive inability of these tests to foster creativity and competent writing.” That dismay with the test scoring process was found again in a 2011 article in the Minneapolis City Pages, a story that concludes with one of the scorers commenting that the limitations of large-scale assessment were obvious to all: “Nobody is saying, ‘I’m doing good work, I’m helping society,’” she says. “Everyone is saying, ‘This isn’t right.’”

The City Pages story ends with a quote from a Pearson Education spokesman, in which the company man notes the complaining scorers were “people who have a very limited exposure and narrow point of view on what is truly a science.”

Lest anyone buy too heavily into the “science” of standardized testing, remember that 2009 audits performed by the U.S. Department of Education of tests in Tennessee and Florida found identical problems to those the scorers detailed (not to mention other problems as well). While all the complaints above are about test-scoring and not test development, it’s important to remember the open-ended questions on tests that are scored by humans are the sort of “next generation” assessments the Obama administration is moving towards.

With even the president recently deriding the emphasis on “filling in bubbles” that results from multiple-choice tests, the current education reform agenda instead seems to be aimed at tests that address critical thinking skills, including “students’ ability to read complex text, complete research projects, excel at classroom speaking and listening assignments, and work with digital media.” Impressive jargon indeed, but every one of those tasks needs to be assessed by a living, breathing human being, and there’s more than enough evidence that the living, breathing human beings currently doing that job either don’t do it very well or don’t think it can be done very well.

If we concede from the evidence above that the testing industry seems ill-prepared to score all the student responses filled with incredibly complex thinking that the “next generation” of tests will surely generate, I can imagine only two other ways those students answers will be assessed.

Either classroom teachers will be hired to score the answers to those tests, or the student responses will be “read” and scored by the new automated scoring technologies powered by artificial intelligence.

The problem with teachers scoring the tests, of course, is that the education reformers believe this country’s teachers can’t be trusted to make decisions about kids, so that option seems unlikely. More likely is that those new tests will be scored by those vaunted automated scoring technologies, machines that can assess student answers to open-ended questions without being able to actually read them. Of note, no one is claiming those computers can read, only that it has been proven statistically that those automated systems can score student tests as accurately as do the temporary employees currently doing the job.

That argument, of course, is presented as a defense of the new technologies, although I’m not sure how much solace we should find in knowing that a computer that can’t read a student’s answer understands it just as well as does some bored slacker being paid slave wages to give only fleeting glances to the work.

For the last 10 years in this country we’ve regularly seen standardized tests results that can’t be believed. Still, the United States seems to be heading towards taking the decisions about American education out of the hands of American educators and instead placing that sacred trust in the welcoming arms of an industry run entirely without oversight and populated completely with for-profit companies chasing billions of dollars in business

When next some standardized test scores are found to be incorrect or fraudulent (because they will), or some standardized testing company commits or tries to cover up another egregious error (because they will), perhaps then we can admit large-scale assessment isn’t the panacea it’s often been touted to be.

Perhaps then we can concede that an educational philosophy based on a system of national standardized tests isn’t any Brave New World of American education; it’s just a bad idea that even the Chinese are already turning away from as being too inefficient and antiquated.

Caveat emptor, America. Buyer beware.


Follow The Answer Sheet every day by bookmarking http://www.washingtonpost.com/blogs/answer-sheet. And for admissions advice, college news and links to campus papers, please check out our Higher Education page. Bookmark it!