The information below comes from Elizabeth Phillips, principal of P.S. 321 in Park Slope, N.Y., about how badly the newly released rankings of New York City public school teachers reflect the reality at her school. Phillips wrote that she is “absolutely sick” about the public release of the Teacher Data Reports (TDR) of some 18,000 teachers based entirely on student standardized test scores. And, she said, the amount of data that is wrong is “staggering.” This same information was posted earlier on the New York City Publbic School Parents blog.

From an email sent by Principal Elizabeth Phillips:

Having seen the TDRs [Teacher Data Reports] when they first came out, I can say that they are extremely inaccurate, both in terms of actual mistakes and in how data is interpreted, particularly for teachers of high performing children. Here is some more detail on that:

1. The amount of data that is simply wrong is staggering. In my school alone, the first year of the TDRS, for just two grades (since of course that is all we have getting TDRs) 4-6 teachers have inaccurate data as follows:

* One teacher who taught in 08-09 but was on child care leave for years before that time has data for a previous must be data from someone who was in that same room the previous year.

* For both of my upper grade CTT -[Collaborative Team Teaching] classes, the special education teacher has a data report that is for all 29 kids; the general education teachers in those classes have no TDRs. (This does appear to be corrected for 2010.)

* In one case, a teacher who has taught 4th grade for 5 years has no data for previous years.

2. Even in cases where the data is correct, I believe the conclusions are arbitrary and often flawed. I do not believe that because the average scale score went from 3.97 (taken from students’ third grade test scores) to 3.92 (their fourth grade scores) a teacher is necessarily a poor teacher, but in this case, she ended up in the 6th percentile for this particular year due to a statistically insignificant change. In fact, the particular teacher in question is an exceptionally strong teacher by any other measure (parent feedback, colleague’s opinions, my observations over many years). As in the School Progress Reports, particularly when you get to high levels of achievement these small differences are not meaningful.

I realize that some would argue that since teachers are compared to “like” teachers, the data is accurate-however, when you actually know the teachers it just isn’t so. We have great teachers with very high scores, very low scores, and middle scores. In this example, whether we are saying 3.97 or 3.92 (with a perfect score equating to 4.5) the class average is extremely high. Once you get to this high level small changes are meaningless in terms of tracking children’s progress. The assistant principals and I have often debated one or two ambiguous questions, since the answers are not always clear!

It is wrong to call a great teacher a failing teacher because a few kids got 3-4 questions wrong one year rather than 2-3 questions wrong the year before. It is particularly problematic given that the 3rd grade test in the past was very different from the 4th grade test. It could be that the children in a particular class were always weaker in writing, but the 3rd grade test for the years the TDRs are being released had very little writing compared to the 4th grade test, so the children may not actually do worse; it may be that they are just tested on different material.

3. Related to the above, there are many reasons why the data may not truly be comparable from class to class. Some of it has to do with the differing tests from grade to grade (which will be improved at least in terms of the format of the test as of the 10-11 year), but there are other factors too. Even though we work hard to make all of our classes equal in terms of academic level, behavior, etc, there is no question that in certain situations where children have recently gone through a very traumatic experience, I will hand pick a teacher who is strong academically and also nurturing. So, one teacher may have some “harder to teach” children, even if they are children with the same test score. There is also the issue of who the AIS (Academic Intervention Services) teacher is for each class. She may have a big impact on the test scores in some classes, yet that isn’t taken into account. Or, it could be that in some schools we decide that with kids scoring high 3s and 4s we devote more time to non-tested subjects--art, music, dance, drama--while in some schools more test prep is done.

4. We know from the School Progress Reports how inaccurate grading based on minute differences in test scores can be. One example: PS 321 was in the 83rd percentile for 2010-11, the 95th percentile in 2009-10, the 59th percentile in 08-09; the 36th percentile in 07-08 and the 57th percentile in 06-07. Basically, there is no way that our school has changed that dramatically year to year.

In fact, the difference in grades wasn’t great (B, B, A, A, A), but the percentile is ... and that is with 550 tested kids in the sample. When we’re looking at a class of 29 kids, a couple of kids having a bad day can make a huge difference. I know that the “average” range on the TDRs is huge because of the DOE’s [Department of Education’s] awareness of the inaccuracy of looking at small differences, but there are two problems with this. First of all, whether the DOE says it is average or not (which it is according to the DOE), parents seeing a teacher rated in the 30th percentile are going to be upset! And, as I note above, it doesn’t even really seem to work for the very top or bottom. As many statisticians have written, there is no data that supports using “value added” in the way that these TDRs do.

Just FYI, here is a chart I prepared that shows how the percentiles fluctuate wildly from year to year, even with the % of children getting 3s and 4s in ELA [English-Language Arts] and Math stay fairly constant and with over 550 test children! The fluctuations for a sample size of 29 children (a class) will be even greater and the percentiles meaningless:


Year: school’s percentile of achievement based on School Progress Report/ % students levels 3&4 in ELA/% levels3&4 in math/ school Progress Report grade

2006-07: 57th percentile/88.7%/94% / B

2007-08: 36th percentile/86.3%/92.3%/ B

2008-09: 59th percentile/91.1%/95.3%/ A

2009-10: 95th percentile/85.3%/86.9%/ A

2010-11: 83rd percentile/86.7%/91.9%/ A

5. The idea of the TDRs being publicly released with names attached is incredibly demoralizing to teachers--and this includes ones who scored above average. Because they understand that some of their well respected colleagues scored low, there is the feeling that this is arbitrary, and that “this could me next year. “ I worry that some of the best teachers, who are the ones who have options for jobs elsewhere, will leave the system. The timing of this is particularly problematic given that four years of budget cuts have made teachers jobs that much harder, with higher class size and fewer support services. I think it is clear that when teachers are demoralized they cannot do as good a job teaching, so it is the children who will suffer.

6. To improve the quality of education in a significant way, we need to get thoughtful, high performing, dedicated young people to enter the profession. Treating teachers disrespectfully, which is what I believe the public release of TDRs with names attached does, will make teaching-at least in public schools--a less and less appealing option for high performing college graduates with other choices. This is partly Bill Gates’ argument in his New York Times op ed piece.

7. I honestly cannot understand how public ranking of teachers by percentile will have anything but a negative effect on teaching and learning. Particularly in middle school, I can imagine teachers losing control as children and parents take the position, “why should I listen to you, you’re a below average teacher.”

There are many other very serious problems with the TDRs. There is no question that as testing becomes more and more high stakes, with teachers’ jobs dependent on student test scores, in many many schools, the curriculum will be narrowed.

I believe it will lead to a widening of the “achievement gap” since it will be much easier for high performing middle class or upper middle class schools with very involved parents to resist the impluse to narrow the curriculum.

With low performing schools, the temptation will be greater as they face state and city sanctions that can result in school closure. In all elementary schools, it will be harder to get senior teachers to teach grades 4-5 ... until of course everyone is tested in every grade, which will just make it hard to get knowledgeable people to teach in public schools at all!


Follow The Answer Sheet every day by bookmarking