The New York Times  published a story on Hanna Skandera, New Mexico’s highly controversial education secretary-designate (she’s been “designate” for three years because Democrats refuse to hold a hearing to confirm her)  under this online headline: “A Push for Teacher Accountability Meets Resistance in New Mexico.”

What’s wrong with this headline speaks to what’s wrong with a lot of the debate about school reform today. The problem is the indiscriminate and inaccurate use of the word “accountability.”

Accountability, according to the Merriam Webster dictionary, is

the quality or state of being accountable; especially : an obligation or willingness to accept responsibility or to account for one’s actions <public officials lacking accountability>

The headline says that there is a push in New Mexico for “teacher accountability,” but somebody or something (read it: teachers) is resisting. There go those pesky teachers again, turning themselves inside out to avoid being “held accountable.”

Undoubtedly there are some teachers who don’t want to be evaluated because they aren’t good at their jobs. But even if that were true of every single teacher in New Mexico, which of course it isn’t, what Skandera is pushing isn’t a reasonable “accountability” system on teachers. It is an unreasonable imitation of a real evaluation system that uses student standardized test scores for up to half of a teacher’s assessment. Evaluating teachers by test scores has been the rage for some time, despite warnings from assessment experts that it’s a bad idea. The only thing that makes Skandera’s system worse than the others is that most of them have test scores account for a smaller percentage.

There are other ways to fairly evaluate teachers, and some school districts have done it for years. It’s not magic. High-stakes evaluation by test score is a terrible idea. Michael Feuer, dean of George Washington University’s Graduate School of Education and Human Development, explained in a speech that I published in 2012:

 We want to trust that teachers will work hard on behalf of our kids. But we maintain the right to evaluate their performance, just as we trust police officers to handle crime while we hold them accountable for process and results. We try to balance teachers’ professional norms, i.e., their rights to work with at least limited autonomy in their classrooms, against our rights as citizens to know how the kids are doing. No profession grants automatic or complete autonomy to its members or exempts them from external evaluation and accountability; the challenge is finding the right balance between blind trust at one extreme and stifling control at the other.

Choice of metrics, therefore, and how they are used and understood, is the central problem of accountability. Americans like numbers (just open the sports section of any newspaper), so it’s not surprising that test scores have long been the darling of reform. Although it is easy to demonize the testing industry, the history is more complex. Their flaws notwithstanding, tests used properly can provide at least a crude approximation of the performance of students and the effects of teachers.

The good news is that a century of investment in the science of measurement has resulted in significant progress in the validity and fairness of inferences derived from test scores. Still, test scores are always estimates, based on statistical sampling of complex domains, and as any respectable psychometrician will attest, the estimates come with nontrivial error terms.

Which leads to the bad news: over time, and despite the periodic outcry of the professional measurement community itself, policymakers have tended to ignore the error term. They have developed exaggerated notions of the precision of the measures, and have come to rely on tests for too many purposes. And by attaching so-called “high stakes” consequences to test results, reformers with even good intentions have unwittingly created incentives for gaming the system and distracting students from mastering the relevant material. So the approximations get cruder, to the point where we no longer have confidence in the validity of the information. Remember, the basic question — how are our kids doing? — is a legitimate expression of our accountability rights. If the answer is blurred and misleading — untrustworthy — we feel cheated, or at least uncomfortable.

And the plot thickens. For not only does excessive use of tests lead to public confusion, it saps the morale and effectiveness of the very professionals in whom we have entrusted the education of our children. According to recent polling data, morale of teachers in the United States has hit an all-time low, in part because of an accountability system that seems to have run amok.


Run amok it has. Teachers in New Mexico have tried to protested by holding rallies and wearing black clothing and filing lawsuits but Skandera, a member of former Florida governor Jeb Bush’s Chiefs for Reform group, carries on to impose it.

It is an old and tired notion that “reformers” in the Jeb Bush mold — those who believe test scores should be the chief measure of accountability — somehow hold the banner of “accountability” in education, or, for that matter, the very term “reform.” Soon it will be a new year, and it’s time we all start calling things by their real names. “Accountability” systems based on test scores are really unworkable and punitive evaluation schemes. Let’s stop giving them the legitimacy that the word “accountability” suggests.