How teachers are evaluated has become one of the big issues in the ongoing strike by Chicago public school teachers as well as in the many debates on school reform being conducted around the country.

Assessment experts say that the method of using student standardized scores to gauge a teacher’s effectiveness is unreliable, but reformers still insist on using this “value-added” method of evaluation. Some reformers, such as Chicago Mayor Rahm Emanuel, want as much as half of a teacher’s evaluation to be linked to student test scores.

“Value added” scores sometimes label very effective teachers as ineffective, and vice versa. How can that happen? Here’s a case that tells you how an excellent teacher got a low value-added score. This story is not an aberration.

It was written by Sean C. Feeney, principal of The Wheatley School in New York State and president of the Nassau County High School Principals’ Association. He is the co-author of an open letter of concern about New York state’s new test-based educator evaluation system that has been signed by thousands of people.

By Sean C. Feeney


These state-supplied scores were the missing piece in a teacher’s final end-of-year score — potentially determining whether or not a teacher is deemed Ineffective and therefore subject to requiring a Teacher Improvement Plan (TIP) within 10 days of the start of the school year. These scores were not available to schools until the third week of August. So there you have it: high-stakes information that can potentially have a serious impact on a teacher’s career being supplied well past any sort of reasonable timeframe. Welcome to New York’s APPR system!

As a principal, I sat with each of the teachers who received a score from the state and tried to explain how the state arrived at these scores out of 20 points. One of the first teachers with whom I did this was Ashley.

Ashley is the type of teacher that all parents want for their child: smart in her content area and committed to making a difference in her students’ lives. Ashley works incessantly with her students, both inside and outside of the classroom.

During her free time, Ashley can always be found working with small groups of students in the hallways or any free space in the area. She has taken our school’s math teams on weekend trips as our mathematics team has found success in various competitions. Over the past four years, 91% of her 179 Algebra 1, Geometry or Algebra 2/Trigonometry students have passed the corresponding Regents examination on their first attempt.

At the end of every year, students and parents send in countless notes of thanks to Ashley for her tireless efforts. Ashley has worked with our highest achieving students as well as many of those who struggle with mathematical understanding. For those who struggle, Ashley has a well-deserved reputation for making them more confident, successful and comfortable with the material. Last spring, Ashley was recognized as the Parent Teacher Organization teacher of the year.

So what score did the state assign Ashley? Well, she earned a score of 7 out of 20 points. According to the state’s guidelines, this makes Ashley a Developing teacher. Goodness. To those of us who know Ashley and have had the pleasure of working with her over the years, this is a jaw-dropping result. Ashley’s score defies all understanding of who she is as an educator. Her score flies in the face of how she is valued in our school and what she has done for students in our school. Her score contradicts the thoughtful evaluations given to her over the past five years.

How, then, is one to understand this score?

Officials at our State Education Department have certainly spent countless hours putting together guides explaining the scores. These documents describe what they call an objective teacher evaluation process that is based on student test scores, takes into account students’ prior performance, and arrives at a score that is able to measure teacher effectiveness. Along the way, the guides are careful to walk the reader through their explanations of Student Growth Percentiles (SGPs) and a teacher’s Mean Growth Percentile (MGP), impressing the reader with discussions and charts of confidence ranges and the need to be transparent about the data. It all seems so thoughtful and convincing! After all, how could such numbers fail to paint an accurate picture of a teacher’s effectiveness?

(One of the more audacious claims of this document is that the development of this evaluative model is the result of the collaborative efforts of the Regents Task Force on Teacher and Principal Effectiveness. Those of us who know people who served on this committee are well aware that the recommendations of the committee were either rejected or ignored by State Education officials.)

One of the items missing from this presentation, however, is an explanation of how State officials translated SGPs and MGPs into a number from 1 to 20. In order to find out how the State went from MGPs to a teacher effectiveness score out of 20 points, one needs to refer to the 2010-11 Beta Growth Model for Educator Evaluation Technical Report. Why a separate document for explaining these scores? Most likely because there are few State officials who are fluent in the psychometrics necessary to explain how this part of our APPR system works.

It is incredulous that the state feels that it is perfectly fine to use a statistical model still in a beta phase to arrive at these amorphous teacher effectiveness scores. I make it a point not to use beta software on my computer, for I do not want something untested and filled with bugs to contaminate the programs that are working fine on my machine. It is a shame that the State does not have the same opinion regarding its reform initiatives.

As explained in the technical paper, the SGP model championed by New York State claims to account for students who are English Language Learners (ELL), students with disabilities (SWD) and even economically disadvantaged students as it determines a teachers adjusted mean growth percentage. While the statistical explanation underlying the SGP model is carefully developed, nowhere do the statisticians justify the underlying cause for any change in student score measured. In other words, what is the research basis for attributing any change in score from year to year to the singular variable of a teacher? The reason why this is never explained is because there is virtually no research that justifies attributing the teacher as the sole cause of a change in student score from year to year.

So if it is not solely the teacher who caused the change in score, to what should one attribute a change in student score? Well, that is a question that continues to challenge statisticians and educational researchers. Despite the hopes and declarations of so many of our present-day “reformers,” we simply do not have to tools necessary to quantify the impact a single teacher has on an individual student’s test score over the course of time. Derek Briggs presented a critique of the use of SGPs in this paper.

How can one explain Ashley’s shockingly low score, however? As a principal who has always availed himself of data when evaluating teachers, I would sit down and have a conversation about the test results so that I could put them in context. Here is what we know about the context of Ashley’s score:

* This year, Ashley’s score was based on her two eighth grade classes, not the results of her Regents-level classes

* The two eighth grade classes were different curricula: one was an Algebra course and the other was a Math 8 course.

* The Algebra 8 course is geared towards the Regents exam, which is a high-school level assessment that is beyond the mathematical level of the NYS Math 8 examination. Ninety one percent of Ashley’s students in this class passed the Regents Algebra 1 examination. There is different content on the Math 8 exam, which can make it a challenge for some of our weaker Algebra students. In fact, of the students who took the Algebra course, one-quarter of them passed the Regents examination but scored below proficiency on the Math 8 exam.

* In the two weeks prior to the three-day administration of the Math 8 exam in April 2012, students in Ashley’s class had one week of vacation followed by three days of English testing. In the two weeks leading to the beginning of the Math 8 exam, Ashley saw her class only three times.

Rather than place the student results in context, the State issued a blind judgment based on data that was developed through unproven and invalid calculations. These scores are then distributed with an authority and “scientific objectivity” that is simply unwarranted. Along the way, teacher reputations and careers will be destroyed.

Despite the judgment of the New York State Education Department, Ashley remains a model teacher in our school: beloved by students and parents; respected by colleagues and supervisors. She continues to work on perfecting her practice and helping her students gain confidence and skills. My hope, of course, is that she will continue to feel that she is part of a profession that respects teachers and students alike, not one that reduces them to a poorly conceived and incoherent number.

Follow The Answer Sheet every day by bookmarking .