You know things are going very badly for public school teachers when The New York Times editorial board calls a bad teacher evaluation system a “sensible policy change.”

The Times ran an editorial on Wednesday that smacked Chicago teachers for striking against a school reform package pushed by Mayor Rahm Emanuel, a former chief of staff of President Obama. It says in part:

Teachers’ strikes, because they hurt children and their families, are never a good idea. The strike that has roiled the civic climate in Chicago— and left 350,000 children without classes — seems particularly senseless because it is partly a product of a personality clash between the blunt mayor, Rahm Emanuel, and the tough Chicago Teachers Union president, Karen Lewis. Beyond that, the strike is based on union discontent with sensible policy changes — including the teacher evaluation system required by Illinois law — that are increasingly popular across the country and are unlikely to be rolled back, no matter how long the union stays out.

The Washington Post editorial board has also supported test-based teacher evaluation, including in this piece on the strike.

The Post editorial says that “the system developed by Chicago officials, on which they offered to work with the union, is careful to measure student growth. That means teachers aren’t blamed if their students start out behind but instead are evaluated on their ability to make progress during the year they have responsibility.” Well, a single test is hardly the way to tell whether a child has made progress. How many of you have had a headache or been sick or emotionally upset and bombed a test? There are ways to measure how students are achieving without a standardized test.

Think what you want about the Chicago teachers strike. But that doesn’t change this:

The Times can say that using standardized test scores to evaluate teachers is a sensible policy and Obama can say it and Education Secretary Arne Duncan can say it and Emanuel can say it and so can Bill Gates (who has spent hundreds of millions of dollars to develop it) and governors and mayor from both parties, and heck, anybody can go ahead and shout it out as loud as they can.

It doesn’t make it true.

Can all these very smart people be wrong? Yes, according to many experts on assessment who have done extensive research on the subject.

These experts have said over and over and over that the method by which test scores are factored into an evaluation of how effective a teacher is are dramatically unreliable and unfair. Some say it will destroy the teaching profession because it will identify effective teachers as ineffective and ineffective teachers as effective. Some bad teachers will be fired but some good ones will too. Others will leave in disgust.

That’s what happened, for example, in New York City when Carolyn Abbott, who teaches mathematics to seventh- and eighth-graders at the Anderson School, a citywide gifted-and-talented school on the Upper West Side of Manhattan, learned that her “value-added” score made her the worst eighth grade teacher in the entire city. The score of course didn’t reflect that her students already scored near 100 percent proficiency and were doing advanced math — but the formula didn’t care.

The value-added formulas actually compare how students are predicted to perform on the state ELA and math tests, based on their prior year’s performance, with their actual performance, as Teachers College Professor Aaron Pallas wrote here. Teachers whose students do better than predicted are said to have “added value”; those whose students do worse than predicted are “subtracting value.” By definition, he wrote, about half of all teachers will add value, and the other half will not.

No, Abbott’s case wasn’t an aberration. Lots of scores are wrong. Yet state after state insists on foisting this on teachers and even principals.

This isn’t just about the adults. Kids suffer when good teachers are said to be bad and bad teachers are said to be good, and especially when standardized tests have such high stakes that teachers feel forced to tailor their teaching to the test.

And think about this: If teachers are evaluated on test scores, there has to be standardized test for every class. What happens when tests have such high stakes? Kids learn how to pass tests rather than how to solve problems and think creatively.

Even the former education commissioner of Texas, Republican Robert Scott, recognized this and said earlier this year that all of this testing is a “perversion” of what a quality education should be.

Of course teachers should be evaluated — and evaluated better than they have been in most places for decades. Yes, bad teachers should be removed from the classroom, sooner rather than later. There are ways to do this that is fair, and it is already being done in places such as the high-achieving Montgomery County Public School system in Maryland.

In fact, The New York Times’ own columnist Michael Winerip wrote about that evaluation system last year in this story, which noted that then-Superintendent Jerry Weast had rejected $12 million in Race to the Top money because it required districts to use test scores to evaluate teachers. Weast was quoted as saying: “We don’t believe the tests are reliable. You don’t want to turn your system into a test factory.” Most Washington D.C.-area superintendents still think it’s a bad idea, including Weast’s successor, Joshua Starr. (Unfortunately Winerip doesn’t write about education anymore for The Times.)
In a Q & A with area superintendents, I quoted Loudoun County Public Schools Superintendent Edgar Hatrick as saying, “It is troubling that we will now take tests we’re not sure are good measures of student performance and extrapolate teacher performance from student scores.”

The Illinois law calls for an evaluation system in which at least 20 percent is based on student standardized test scores. Emanuel wants fully half of the evaluation to be based on the test scores.

(Incidentally, former Washington D.C. schools chancellor Michelle Rhee instituted a teacher evaluation system a few years ago that had 50 percent of individual assessments linked to student test scores — in courses where standardized tests were given — but her successor, Kaya Henderson, just dropped it down to 35 percent because of problems with the system.)

Read this from a letter that scores of researchers from 16 universities throughout the Chicago metropolitan area sent to Emanuel warning against the “value-added” system of teacher involvement, which uses complicated formulas to factor test scores into an evaluation:

As university professors and researchers who specialize in educational research, we recognize that change is an essential component of school improvement.  We are very concerned, however, at a continuing pattern of changes imposed rapidly without high-quality evidentiary support.

The new evaluation system for teachers and principals centers on misconceptions about student growth, with potentially negative impact on the education of Chicago’s children.  We believe it is our ethical obligation to raise awareness about how the proposed changes not only lack a sound research basis, but in some instances, have already proven to be harmful.

Professors in Georgia sent a letter to their governor against value-added evaluations. More than 1,500 New York principals and more than 5,400 teachers, parents, professors, administrators and citizens have signed an open letter blasting that state’s value-added evaluation system, the letter which can be found here.

The National Research Council, the research arm of the National Academies, which include the National Academy of Sciences, the National Academy of Engineering and the Institute of Medicine, issued a major report last year on this issue that said:

The standardized test scores that have been trumpeted to show improvement in the schools provide limited information about the causes of improvements or variability in student performance.This would be true, presumably, for any school system that use standardized tests as a measure of achievement.

Mathematicians have said that value-added models are hardly ready for prime time as teacher evaluation tools, and that includes John Ewing, president of Math for America, a nonprofit organization dedicated to improving mathematics education in U.S. public high schools. In this post he said:

The most common misuse of mathematics is simpler, more pervasive, and (alas) more insidious: mathematics employed as a rhetorical weapon—an intellectual credential to convince the public that an idea or a process is “objective” and hence better than other competing ideas or processes. This is mathematical intimidation. It is especially persuasive because so many people are awed by mathematics and yet do not understand it—a dangerous combination.

The latest instance of the phenomenon is valued-added modeling (VAM), used to interpret test data. Value-added modeling pops up everywhere today, from newspapers to television to political campaigns. VAM is heavily promoted with unbridled and uncritical enthusiasm by the press, by politicians, and even by (some) educational experts, and it is touted as the modern, “scientific” way to measure educational success in everything from charter schools to individual teachers.

Yet most of those promoting value-added modeling are ill-equipped to judge either its effectiveness or its limitations. Some of those who are equipped make extravagant claims without much detail, reassuring us that someone has checked into our concerns and we shouldn’t worry. Value-added modeling is promoted because it has the right pedigree — because it is based on “sophisticated mathematics.” As a consequence, mathematics that ought to be used to illuminate ends up being used to intimidate.

That’s what value-added is doing: Intimidating teachers, and many of them believe that it will be the end of the teaching profession. Who would want to go into a profession where a big part of the evaluation system is faulty?

Follow The Answer Sheet every day by bookmarking