(bigstock)

Even as the Obama administration keeps extending its support for using standardized test scores for high-stakes decisions — see its new draft proposals to rate colleges of education based on the test scores of the graduates’ students — a national principals group is taking a stand against it.

The Board of Directors of the National Association of Secondary School Principals has given preliminary approval to  a statement that rejects  linking educators’ jobs and pay to standardized test scores that are plopped into a formula that can supposedly determine exactly how much “value” an individual educator has added to students’ academic growth.

The method of linking test scores to job status and pay is known as “value-added measurement” or VAM, and it has been embraced by school reformers, including the Obama administration, as a prime way to hold educators “accountable” for how well their students do in class. In some states, up to 50 percent of an educators’ evaluation is based on VAM systems.  The problem is that numerous assessment and statistics experts — including the American Statistical Association — have said that VAM formulas (there are many) are not a valid or reliable method of assessing the value of individual teachers and principals.

Last April, the  Statistical Association, the largest organization in the United States representing statisticians and related professionals, said in a report that value-added scores “do not directly measure potential teacher contributions toward other student outcomes” and that they “typically measure correlation, not causation,” noting that “effects — positive or negative — attributed to a teacher may actually be caused by other factors that are not captured in the model.” After the report’s release, I asked the Education Department if Education Secretary Arne Duncan was reconsidering his support for value-added measures, and the answer was no.

The NASSP issued a release saying that its governing body has given preliminary approval and that the organization will meet early next to finalize the decision.  That release quotes Mel Riddile, a former National Principal of the Year and chief architect of the NASSP statement, as saying:

“We are using value-added measurement in a way that the science does not yet support. We have to make it very clear to policymakers that using a flawed measurement both misrepresents student growth and does a disservice to the educators who live the work each day.”

Here’s the NASSP statement that has received preliminary approval:

Purpose

To determine the efficacy of the use of data from student test scores, particularly in the form of Value-Added Measures (VAMs), to evaluate and to make key personnel decisions about classroom teachers.

Issue

Currently, a number of states either are adopting or have adopted new or revamped teacher evaluation systems, which are based in part on data from student test scores in the form of value-added measures (VAM). Some states mandate that up to fifty percent of the teacher evaluation must be based on data from student test scores. States and school districts are using the evaluation systems to make key personnel decisions about retention, dismissal and compensation of teachers and principals.

At the same time, states have also adopted and are implementing new, more rigorous college- and career standards. These new standards are intended to raise the bar from having every student earn a high school diploma to the much more ambitious goal of having every student be on-target for success in post-secondary education and training.

The assessments accompanying these new standards depart from the old, much less expensive, multiple-choice style tests to assessments, which include constructed responses. These new assessments demand higher-order thinking and up to a two-year increase in expected reading and writing skills. Not surprisingly, the newness of the assessments combined with increased rigor has resulted in significant drops in the number of students reaching “proficient” levels on assessments aligned to the new standards.

Herein lies the challenge for principals and school leaders. New teacher evaluation systems demand the inclusion of student data at a time when scores on new assessments are dropping. The fears accompanying any new evaluation system have been magnified by the inclusion of data that will get worse before it gets better. Principals are concerned that the new evaluation systems are eroding trust and are detrimental to building a culture of collaboration and continuous improvement necessary to successfully raise student performance to college and career-ready levels.

Specific questions have arisen about using value-added measurement (VAM) to retain, dismiss, and compensate teachers. VAMs are statistical measures of student growth. They employ complex algorithms to figure out how much teachers contribute to their students’ learning, holding constant factors such as demographics. And so, at first glance, it would appear reasonable to use VAMs to gauge teacher effectiveness. Unfortunately, policy makers have acted on that impression over the consistent objections of researchers who have cautioned against this inappropriate use of VAM.

In a 2014 report, the American Statistical Association urged states and school districts against using VAM systems to make personnel decisions. A statement accompanying the report pointed out the following:

Another peer-reviewed study funded by the Gates Foundation and published by the American Educational Research Association (AERA) stated emphatically, “Value-Added Performance Measures Do Not Reflect the Content or Quality of Teachers’ Instruction.” The study found that “state tests and these measures of evaluating teachers don’t really seem to be associated with the things we think of as defining good teaching.” It further found that some teachers who were highly rated on student surveys, classroom observations by principals and other indicators of quality had students who scored poorly on tests. The opposite also was true. “We need to slow down or ease off completely for the stakes for teachers, at least in the first few years, so we can get a sense of what do these things measure, what does it mean,” the researchers admonished. “We’re moving these systems forward way ahead of the science in terms of the quality of the measures.”

Researcher Bruce Baker cautions against using VAMs even when test scores count less than fifty percent of a teacher’s final evaluation.  Using VAM estimates in a parallel weighting system with other measures like student surveys and principal observations “requires that VAM be considered even in the presence of a likely false positive. NY legislation prohibits a teacher from being rated highly if their test-based effectiveness estimate is low. Further, where VAM estimates vary more than other components, they will quite often be the tipping point – nearly 100% of the decision even if only 20% of the weight.”

Stanford’s Edward Haertel takes the objection for using VAMs for personnel decisions one step further: “Teacher VAM scores should emphatically not be included as a substantial factor with a fixed weight in consequential teacher personnel decisions. The information they provide is simply not good enough to use in that way. It is not just that the information is noisy. Much more serious is the fact that the scores may be systematically biased for some teachers and against others, and major potential sources of bias stem from the way our school system is organized. No statistical manipulation can assure fair comparisons of teachers working in very different schools, with very different students, under very different conditions.”

Still other researchers believe that VAM is flawed at its very foundation. Linda Darling-Hammond et al. point out that the use of test scores via VAMs assumes “that student learning is measured by a given test, is influenced by the teacher alone, and is independent from the growth of classmates and other aspects of the classroom context. None of these assumptions is well supported by current evidence.” Other factors including class size, instructional time, home support, peer culture, summer learning loss impact student achievement. Darling-Hammond points out that VAMs are inconsistent from class to class and year to year. VAMs are based on the false assumption that students are randomly assigned to teachers. VAMs cannot account for the fact that “some teachers may be more effective at some forms of instruction…and less effective in others.”

Guiding Principles

Recommendations

References

 

Conference Ignite ’15 in San Diego, CA.