The fundamental flaws of ‘value added’ teacher evaluation

Evaluating teachers by the test scores of their students has been perhaps the most controversial education reform of the year because while it has been pushed in a majority of states with the support of the Obama administration, assessment experts have warned against the practice for a variety of reasons. Here Jack Jennings, found and former president of the non-profit Center on Education Policy explains the problem. This appeared on Huffington Post.

By Jack Jennings

American tourists are often amused when traveling on the London “tube” to hear the announcement at each station to “mind the gap.” This attention-getting advice is meant to warn passengers exiting the subway car to step over the space between the car and the platform.

American education has its own gap, and it might be helpful if we repeatedly heard public announcements to “mind” it. This gap is the distance between what policymakers are putting in place and what research has found.

The most controversial example of this gap between policy and research relates to the current fight to change the methods used to evaluate teachers’ classroom performance. The Obama Administration, charitable foundations, conservative critics of teacher unions, and various others are encouraging state governments to revise teacher evaluation systems to consider the impact of individual teachers on their students’ achievement. The states are responding. According to the National Council on Teacher Quality, 36 states and the District of Columbia have made policy changes in teacher evaluation since 2009, and 30 states now require these evaluations to include objective evidence of student learning.

The stakes are high because teachers could lose their jobs if they have low ratings on these new evaluations. Their salaries and promotions could also be affected, as well as their standing among their peers.

Proponents of change rightfully argue that current teacher evaluation systems are inadequate. Often, these involve a short “walk in” visit by the principal or another type of cursory review. Clearly, better ways of evaluating teachers must be found.

The problem comes with many of the alternatives being proposed or implemented, especially those that rely heavily on tests. In particular, the Obama administration, through its Race to the Top competitive grants and its waivers of No Child Left Behind Act (NCLB) requirements, is putting pressure on states to incorporate student test scores as a significant component of any new teacher evaluation system.

To fulfill the conditions for receiving Race to the Top grants or No Child Left Behind waivers, states are often turning to the reading and math tests used for NCLB accountability. Since those tests are already administered to almost all students in grades 3 through 8 and once in high school, it’s understandable that states would choose to use them for teacher evaluation purposes.

The common sense rationale for linking teacher evaluations to student test scores is to hold teachers accountable for how much their students are learning. The favorite way of measuring gains, or lack thereof, in student learning is through “value-added” models, which seek to determine what each teacher has added to the educational achievement of each of his or her students.

Even though it seemingly makes sense to look at individual gains attributable to particular teachers, this method is fundamentally flawed due to the nature of current state tests, as well as the methods used to assign students to teachers and other reasons. These tests were not designed to be used in that way, and various aspects of their administration make this use improper.

In a briefing paper prepared for the National Academy of Education (NAE) and the American Educational Research Association, Linda Darling-Hammond and three other distinguished authors reached the following conclusion: “With respect to value-added measures of student achievement tied to individual teachers, current research suggests that high-stakes, individual-level decisions, as well as comparisons across highly dissimilar schools or student populations should be avoided.” The paper goes on to say that “in general, such measures should be used only in a low-stakes fashion when they are part of an integrated analysis of what the teacher is doing and who is being taught.” (Disclaimer: Although I am a member of NAE, I did not research or write that paper.)

The paper highlights three specific problems with using value-added models to evaluate teacher effectiveness, especially for such important decisions as teacher employment or compensation:

  1. Value-added models of teacher effectiveness are highly unstable. Teachers’ ratings differ substantially from class to class and from year to year, as well as from one test to another.
  2. Teachers’ value-added ratings are significantly affected by differences in the students who are assigned to them, even when models try to control for prior achievement and student demographic variables. In particular, teachers with large numbers of new English learners and others with special needs have been found to show lower gains than the same teachers who are teaching other students.
  3. Value-added ratings cannot disentangle the many influences on student progress. These include home, school and student factors that influence student learning gains and that matter more than the individual teacher in explaining changes in test scores.

Cautions about value-added testing have also been expressed by a group of testing and policy experts assembled by the Economic Policy Institute. This group concluded that “[w]hile there are good reasons for concern about the current system of teacher evaluation, there are also good reasons to be concerned about claims that measuring teachers’ effectiveness largely by student test scores will lead to improved student achievement.”

In a similar vein, W. James Popham, professor emeritus at UCLA and test design expert, has concluded that the use of students’ test scores to evaluate teachers “runs counter to the most important commandment of educational testing — the need for sufficient validity evidence.”

Despite this strong advice based on research, states are pushing ahead to incorporate value-added models into their teacher evaluation systems. In Louisiana and Florida, two states that have made sweeping changes, the state legislatures eliminated teacher tenure and instituted systems that rely to a substantial degree on test scores to determine employment and salary. Many other states are not far behind.

Why have politicians and others decided to ignore the research and use defective systems to make major decisions about retaining teachers or determining their pay? Why are we not “minding the gap”?

Possibly, proponents of change felt they had to push hard to eliminate a defective system. In addition, some research, including an ongoing study of measures of effective teaching supported by the Gates Foundation, gives credence to the use of student achievement measures when combined with other measures, such as teacher observations and student feedback, as part of an effective teacher evaluation system. It is also possible that the researchers who raise serious concerns about the emphasis being placed on test score measures have not effectively stated or publicized their objections. Regardless of reason, there is trouble ahead.

In Florida an administrative order already temporarily stopped that system. In that state as well as others, groups are planning to litigate as soon as a teacher is fired or teacher salaries are lowered based on the results of a value-added model. Research findings will likely be used to discredit the value-added approach.

The shame of all this is that there is another way. As the National Academy paper points out, some other tools to measure teacher effectiveness are more stable and sophisticated. These include assessments of the actual performance of teachers and on-the-job evaluations, both of which rely on professional judgments.

Are states not fully embracing such options because they are more complex and have higher training costs than simply using test scores? If so, they are being very short-sighted.

We have ignored the advice to mind the gap for too long. The way we educate our children and treat our teachers should be based on facts, not on impulses.

Also on The Answer Sheet

What we did -- and didn't -- learn from education research in 2012