New problems with New York’s teacher evaluation plan found

APPR apHere’s a new post from award-winning Principal Carol Burris of South Side High School in New York about the state’s controversial new educator evaluation system. Burris has for more than a year chronicled on this blog (she calls it Star Wars here, and other things here and here and here, for example) the implementation of the system, which ignores research by using student standardized test scores to assess teachers and which has already started to negatively impact young people.

Burris was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and in 2010,  tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She is the co-author of the New York Principals letter of concern regarding the evaluation of teachers by student test scores. It has been signed by more than 1,535 New York principals and more than 6,500 teachers, parents, professors, administrators and citizens. You can read the letter by clicking here. 

By Carol Burris

This weekend, I took a look at the teacher evaluation plan Commissioner John King imposed on the teachers and principals of New York City.  I was taken aback by how different the scoring bands were from the 3012C cut scores for New York State teachers. Why would John King dramatically change the bands for New York City from those written into 3012c, the teacher evaluation law, which is now in effect across the state?

In order to understand the answer, we need to look at what the legislature created. Below is the scoring bands for 2011-13.

 

3012c APPR Cut Scores for NYS

 

Growth or Comparable Measures Locally-selected Measures of growth or achievement Other Measures of Effectiveness(60 points)   Overall Composite Score
Ineffective 0-2 0-2 (locally developed) 0-64
Developing 3-8 3-8 (locally developed) 65-74
Effective 9-17 9-17 (locally developed) 75-90
Highly Effective 18-20 18-20 (locally developed) 91-100

 

It is important to keep two things in mind. First, the 3012c scoring bands were prepared so that teachers who were found ‘ineffective’ in the two growth measures  would be ‘ineffective’ overall—2+2 with even a perfect 60 in the ‘other measures’, results in a score of 64–‘ineffective’ overall.

This point system also created other problems, which I wrote about last year here. For example, if a teacher was effective in the first two columns with scores of 9 and 9, she needs at least 57 points in the ‘other’ 60 to be ‘effective’ overall. That is a lot of points. So, many districts made sure that ‘effective’ translated into at least 57/60 points, even though that made the ‘other measures’ score band lopsided by leaving the majority of points in the ineffective range. This assured that a teacher who is ‘effective’ in all three components would be ‘effective’ overall.

But a more serious problem remained — the problem of the developing teacher. Suppose you are a developing teacher with scores of 3 and 3 in the two growth measures.  If you do not get 59 out of 60 points, you are rated ‘ineffective’ overall, which can lead to your termination.  This is a problem that cannot be fixed with the 3012c scoring band. Even if your district negotiated 60/60 points for highly effective teachers, 59/60 points for effective teachers, and 58/60 points for developing teachers, the teacher who is at the low end of developing in growth scores is doomed to be rated ‘ineffective’– 3+3+58 = 64.

Apparently, the UFT rightly insisted that the ‘developing teacher problem’ be fixed.  The commissioner gave them entirely different score bands in the first two measures, as he filled in the blanks in ‘other measures’, which would normally be done through negotiations. The resulting bands are given below.

Commissioner Imposed Cut Scores for New York City

Growth or Comparable Measures Locally-selected Measures of growth or achievement Other Measures of Effectiveness(60 points)   Overall Composite Score
Ineffective 0-12 0-12 0-38 0-64
Developing 13-14 13-14 39-44 65-74
Effective 15-17 15-17 45-54 75-90
Highly Effective 18-20 18-20 55-60 91-100

 

If you get a SLO (which are absurd measures I wrote about here) and if you create your local measure right, you can just shift targets around to make these very odd bands work.

But teacher of Grades 3-8 get a “growth score” based on the 3-8 tests.  So how can it be that a teacher in Scarsdale who gets a 12 is effective, while a teacher in New York City who gets a 12 is ineffective? The commissioner assures everyone that he will have a “fix” to make it all work out. I wouldn’t bet my SLO on it. When you look at the plans side by side the differences are glaring.

Perhaps he will impose his new and improved score band on all of New York State teachers so that they will have only 3 scores (15, 16, 17) to show they are effective in student growth, but a whopping 13 ways to show that they are ineffective. It is absurd to think you can create a credible point system to evaluate educators.

And if he does impose the New York City bands on the entire state, what will happen to all of the 2013-14 negotiated agreements we needed to finish by June before our teachers leave for the summer?

And what of all of the teachers in Buffalo, and Albany and Long Island who believe that they are rated ‘developing’ only to find that they will be ‘ineffective’ overall?  That foolish inequity, with real life consequences, will be shared with parents next fall, because Mayor Bloomberg, along with the governor and the legislature who put 3012c in effect, thought that parents had the “right to know’’ a teacher’s score.

So what should be done?  First, get rid of the points and move to a professional evaluation rubric system, like the one adopted by Massachusetts, which does not insist that test scores trump all.  That system was approved by Race to the Top. Their system is designed to improve teachers, not fire them.

Second, follow the advice of every New York State advocacy group from the Superintendents to SAANYs to NYSUT who have pleaded that this year’s APPR scores be considered a pilot program only.

Third, admit and explain the problems that have been caused by the scoring bands and assure the teachers and principals of New York that you are seeking to develop a thoughtful evaluation plan that involves ALL stakeholders in an authentic way, which was not how 3012c was designed.

Doing the above would show New York educators and parents leadership in which they can have confidence.  The Board of Regents is showing extraordinary courage in refusing to approve SED’s recommendation to move to a 25% value-added model model.  Perhaps they will show leadership on this issue as well.

The New York State Education Department found all of this amusing in 2011, likening it to a ‘plane being built in the air’.

Well, that certainly has been an accurate description. I do not find it amusing. As a principal who always took the supervision and evaluation of teachers seriously, I find all of this very sad.

Also on The Answer Sheet

The Common Core's fundamental trouble