In the following post, a New York City teacher relates his experience of being evaluated by the scores of students he doesn’t have. Jake Jacobs is in his eighth year as an art teacher, currently in a high-needs public school in New York City.
In the piece, he refers to APPR, which is New York State’s educator evaluation system. He also refers to VAM, “value-added” measurement which purports to be able to determine the “value” a teacher brings to student learning by dropping test scores into complicated formulas that can supposedly strip out all other factors, including the conditions in which a student lives. Assessment experts have warned policymakers not to use VAM for high-stakes purposes, but they are being used anyway to evaluate teachers.
By Jake Jacobs
I’m a New York City art teacher whose “effective” rating last year dropped to “developing” because of student standardized test scores — in math, a subject I don’t teach.
Yes, New York City takes Common Core math and English Language Arts test scores and attributes them to teachers who teach different subjects, even though they are not certified to teach those subjects, and even though they may never have met the tested students. Tens of thousands of teachers of science, social studies, all the arts, physical education, foreign language, technology and other subjects have at least 20 percent of their evaluation based on math or English Language Arts test results. (Because I am now required to have an “improvement plan,” I am curious to hear how teachers can improve the scores of kids we don’t teach.)
Another set of “local” tests, also out-of-subject, count for another 20 percent in thousands more cases, doubling the impact.
So I was stunned to read Governor Andrew Cuomo’s recent oped in Newsday, where he seems not to know how many teachers’ evaluations hinge on state test scores, even when they teach non-tested subjects. The governor wrote:
“Interestingly, whatever percent is assigned to standardized testing will only affect a small minority of teacher evaluations as only 20 percent of teachers are in subjects and grades that have state testing.”
In New York City (and presumably most everywhere else) out-of-test-subject teachers are the majority. New York State’s district policies are all different, but New York City alone accounts for thousands of cases of teachers who don’t teach math or English but are judged by them anyway. In middle schools, it’s been estimated that over 60 percent of New York City teacher evaluations are out-of-subject.
The online education publication Chalkbeat tried to remind New Yorkers in January that New York City teachers have their ratings “lumped in” together, making the evaluations rest on low-quality and arbitrary data, unrelated to the professional practice in the teacher’s license area:
“Most of the state’s teachers are rated in large part based on test scores of subjects and students that they do not teach because there is no state test for their students or subject area. That leaves districts and schools to decide how physical education, arts, and foreign language teachers, among others, will be measured. New York City has filled those gaps by using school-wide scores on math and English tests, and in some cases using city-created tests in other subjects.”
Implausibly, the governor also asserted in his Newsday op-ed:
“Virtually everyone also agrees that New York’s teacher evaluation system is not accurate and is skewed in its construction to provide favorable results for teachers.”
This, after the governor imposed it on teachers and parents over sustained objection, without input or buy-in from teachers
Teachers have long cited research warning about the unreliability of using student test scores to evaluate teachers. By looking at the actual technical document describing the metrics of New York’s APPR and VAM, even a layman can see the authors describe their models as “estimates” and “predictions” subject to wild variance.
Policymakers seem not to know about this policy
The governor doesn’t seem to understand the policy, but he isn’t alone. In recent weeks, I have pop-quizzed various New York State Senate and Assembly members on out-of-subject testing. I haven’t found any lawmakers who were aware of the policy, but after learning about it, most agreed it sounds like it needs discussion.
How It Works in NYC
Because we cannot bog kids down with standardized tests in every single subject, schools pick their poison, deciding which out-of-subject teachers will be assessed with students’ math scores and which will be assessed with English Language Arts scores. Social studies’ teachers are usually attached to ELA scores. Science teachers are attached to math scores.
Next to be decided is which student group will be measured. Will it be the kids we teach (but what if we teach them for only a part of the year)?
Or an entire grade? Or will schoolwide test results be used as an evaluation metric? Across New York, more than 700 districts had to submit their individual evaluation plans — based on dozens of variables — for approval to state education authorities, with almost half needing to re-apply a second time.
The other 20 percent
“Local Measures” — which generally means scores from different math and English Language tests created by education companies rather than teachers — account for another 20 percent of all teacher evaluations. These are not the state-mandated standardized tests, which are aligned to the Common Core, but rather tests purchased by individual districts. For arts teachers, math or English Language Arts scores on these exams are used for the “local measure,” meaning that test scores account for a full 40 percent of my assessment.
Does this mean that officials want me to replace my student-centered arts instruction with math instruction? They say we need only to “integrate” math and English into the non-core subjects we teach, but having 40 percent accountability to math scores sends me a very direct message that I should be teaching much more math (or looking for a school with more proficient students).
Threat of Dismissal
There are thousands of teachers in New YorkC who already saw their APPR ranking drop just because of the portion of the evaluations that is taken from the state-mandated standardized tests (which, again, count for 20 percent of evaluations). I’m one example. My “effective” rating dropped to “developing: solely because of the 20 percent based on Common Core math scores — because I did my job, teaching art.
I specifically took a position teaching art in a high needs school because it is supposed to be the antidote to the drudgery and high-pressure of academics. That was right before this policy was announced. Now, I’m evaluated on a metric beyond my control:
Teachers whose ratings dropped to “ineffective” will be just one more ineffective rating away from dismissal, under changes to the evaluation system proposed by Cuomo. This is because the governor vetoed his own “safety net” bill, reneging on the campaign promise of a two-year moratorium for consequences on ineffective ratings.
Not really taking the tests
I see students who are discouraged by the difficulty of the first few pages of the tests give up quickly and fill in bubbles at random just before they are collected. Written portions have one sentence or are left blank. Remember, the governor announced on TV last year that “the tests don’t count” for students. Some debate whether they should fill circles all down one row or not. A colleague told me one of her students by chance improved her grade by filling in the bubbles to spell her name in the answer grid last year.
Immediate negative impact
In practice, this evaluation scheme has caused an immediate hiring crisis in schools like mine. Experienced teachers now avoid schools with low-performing students. The very best math teachers in my school left for higher performing schools in the summer of 2011, as soon as this policy was announced. Last year, we had to hire four of eight teachers who have never taught before. Now, kids in struggling schools are less likely to see a seasoned teacher, as we penalize all teachers of kids with social-emotional obstacles to learning.
This also affects the composition of classes. Think about the disruptive students who need counseling or intervention. This testing regime makes them toxic, less likely to find support from teachers because they are highly unlikely to score “proficient” on the standardized tests. But this policy also helps bad teachers escape detection if their school has adequate math and/or English Language Arts scores. The validity of APPR for math and ELA teachers is hotly contested, but when it comes to out-of-subject testing, it doesn’t pass the laugh test.
Currently, New Rochelle is debating this policy for next September. Earlier this month, a lawsuit was announced in Tennessee challenging the use of the practice. A recent article in the Washington Post reported on the pending legal challenges to out-of-subject testing in evaluations, updating a Valerie Strauss piece from the previous year. This story flies under the radar perhaps because it’s awkward that the union leadership quietly okayed this policy without going to the membership. But then, just few weeks ago, the UFT’s Mike Mulgrew publicized this policy on his blog page, saying:
“Most teachers don’t even work in grades or subjects that are covered by a state exam, which means they will be evaluated based on student work they had no hand in. For those who do, basing 50 percent of evaluations on state test scores will force them to spend 100 percent of their time on test prep.”
How did it get this way?
If out-of-subject testing seems illogical, it may be because it was left unplanned, “to be worked out later.” Chalkbeat first described the policy as “the best option available until more credible alternatives can be developed.” But they also questioned Shael Polakow-Suransky, then the Department of Education’s chief academic officer, why we were rating teachers by their colleagues’ scores. His reply:
“If the legislature had wanted us to be fully compliant at the outset, they would have put in place a massive funding program to support assessments to support every single subject…[b]ut they decided to have a statewide evaluation system in place and then to build it from there.”
After contacting New York State Regents Chancellor Emeritus Robert Bennett, I received a call from the state Education Department. I was initially invited to speak to the assistant state education commissioner but ended up talking to someone from the Office of Teacher and Leader Effectiveness. The 30 minute-call only raised more questions about the policy. They said New York City and individual schools did have other options, such as using alternative “performance based assessments,” but chose out-of-subject evaluations to encourage “team teaching,” that is, jointly planning lessons in all subjects to support growth on math and ELA tests.
But what of these “performance based assessments?” Implemented only in a handful of schools, this cumbersome, reinvent-the-wheel option was not offered at my school — and I suspect others. It’s also important to note the hastily crafted “emergency” policy was sprung on teachers in 2013 just two days before school started, requiring an extremely involved process for forming a Measures of Student Learning committee in compliance with a 57-page PDF under a daunting deadline. Not surprisingly, failure to comply meant everything defaults to the state plan, which counts math or ELA scores for every teacher.
As we debate teacher accountability, it would help to know how many teachers statewide are evaluated out-of-subject.
In late February, the New York City Department of Education revealed they will finally be developing assessments for out-of-subject teachers, recruiting 100 non-core subject teachers for preliminary focus testing. But this only opens up another potential nightmare scenario. Will they create standardized assessments in arts instruction – based on age instead of ability – and compel every teacher to focus on the same things? NYC art teachers have standards today that allow individual teachers the flexibility to adapt curriculum to meet student need holistically. I can’t imagine what the test-makers might come up with, but I suspect Picasso, Warhol and Pollock would be deemed not proficient.