The Washington PostDemocracy Dies in Darkness

Arne Duncan’s reaction to new research slamming teacher evaluation method he favors

Education Secretary Arne Duncan has been a proponent of using students’ scores on standardized tests to evaluate teachers, even as a growing mountain of evidence has shown that the method now used in most states, known as “value-added measures,” is not reliable. With two recent reports released on VAM adding to warnings long given by assessment experts, I asked the Education Department whether Duncan’s position had changed.

VAM purports to be able to take student standardized test scores, plug them into a complicated formula and measure the “value” a teacher adds to student learning. The method has been adopted as part of teacher evaluations in most states — with varying weights put on the results — but for years researchers have said the results aren’t close to being accurate enough to use for decisions that matter.

The American Statistical Association, the largest organization in the United States representing statisticians and related professionals, said in an April report that value-added scores “do not directly measure potential teacher contributions toward other student outcomes” and that they “typically measure correlation, not causation,” noting that “effects — positive or negative — attributed to a teacher may actually be caused by other factors that are not captured in the model.” This month, two researchers reported that they had found little or no correlation between quality teaching and the appraisals that teachers received using VAM.

For years, many prominent researchers have warned against using VAM. They include a 2009 warning by the Board on Testing and Assessment of the National Research Council of the National Academy of Sciences, which stated that “VAM estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.” The Educational Testing Service’s Policy Information Center has said there are “too many pitfalls to making causal attributions of teacher effectiveness on the basis of the kinds of data available from typical school districts,” and Rand Corp. researchers have said that VAM results “will often be too imprecise to support some of the desired inferences.”

These are just a few of the concerns that have emerged in recent years over this method. Still, there are economists enthusiastic about VAM, and their work has been embraced by school reformers who have opted to use it as part of teacher and even principal evaluation and who have chosen to ignore the much larger body of evidence warnings against using VAM for high-stakes purposes. In fact, when Michelle Rhee was chancellor of the D.C. public school system (from 2007-2010), she liked VAM so much that she instituted an evaluation system in the district in which nearly every adult in a school building was evaluated in some part by student standardized test scores, including the custodial staff. The percentage of a teacher’s evaluation linked to VAM depends on the state, from a small percentage up to 50 percent.

Such reliance on standardized tests for “accountability” purposes is not what many supporters of President Obama had expected from his administration. For six years, the administration has pursued school reform policies that have made standardized tests the chief metric for evaluation of students, teachers, principals and schools even in the face of growing resistance inside and outside public schools.

When I asked the Education Department whether Duncan was aware of the latest research on VAM and whether it had changed his opinion, I received this response in an e-mail from his press secretary, Dorie Nolt, reflecting Duncan’s position:

“Including measures of how well students are learning as part of multiple indicators of educator effectiveness is part of a set of long-needed changes that will improve classroom learning for kids. Growth measures are a significant improvement over the system that existed before, which failed to produce useful distinctions in teacher performance. Growth measures — including value-added measures — focus attention on student learning and show progress. While these measures are better than what existed before, educators will continue to improve them, and sharp, critical attention from the research community can help.”

As to whether Duncan is aware of the latest research, she said:

We keep track of all major research on this topic.

So if you are wondering whether Duncan and his team have been affected by the new research (or even the old VAM research), the answer seems to be a resounding “no.”