The aim was to create teacher evaluation systems that depended on student standardized test scores and observations by “peer evaluators.” These systems could identify teachers who were most effective in improving student academic performance. That, along with new professional development and other factors, it was conjectured, would then improve classroom staffing decisions and get more highly effective teachers working with the neediest students.
The method by which the test scores were used is called value-added modeling (VAM), and it involves plugging the scores into formulas that supposedly can tell how much a teacher has contributed to the growth a student is making academically. VAM — which supposedly can factor out every other influence on a child, such as being hungry — has been panned by assessment experts for high-stakes decisions, but the education world embraced it during the Obama administration, anyway.
The 526-page report, titled “Improving Teaching Effectiveness: Final Report,” was the work of the Rand Corporation and the American Institutes for Research and was funded by the Gates Foundation.
This post was written by Carol Burris, a former award-winning New York high school principal who is executive director of the Network for Public Education, a nonprofit group that advocates for traditional public schools. Burris was named the 2010 Educator of the Year by the School Administrators Association of New York State, and in 2013 the same organization named her the New York State High School Principal of the Year. Burris has been chronicling problems with modern school reform and school choice for years on this blog.
By Carol Burris
The New York Times called it “the first principals’ revolt in history.” During the fall of 2011, 658 New York state principals signed onto a document voicing their strong objections to the state’s new teacher evaluation system. During the next few months, the number of objecting principals would swell to 1,555. Thousands of educators, researchers and parents joined in. The message that united them was simple — evaluating teachers using student test scores was a terrible idea.
With a disregard that bordered on contempt for school leaders, Gov. Andrew M. Cuomo, the self-proclaimed “lobbyist for students,” pushed himself into the debate by insisting that teachers who were rated ineffective on the student score component of the new evaluation be rated ineffective overall.
By January 2015, he called the 2011 plan he crafted “baloney” and championed a new plan that further increased the test score component, while demanding that districts call in outside observers to evaluate teachers in an attempt to make sure that teachers got the low scores he believed they deserved. A few months later, his approval rating dropped to under 50 percent due primarily to the public’s disapproval of his education policies. A divided New York State legislature is still battling over how to fix the mess produced by an evaluation system that became known as the plane being built in the air.
Was it worth all the political and financial capital it took to create a broken system that few, if any, believe works?
Apparently not, according to the final report of a longitudinal study by the RAND Corporation and the American Institutes for Research (AIR), which was funded by the Bill & Melinda Gates Foundation. (See in full below.)
The study examined the effects over six years of the Gates Foundation’s Intensive Partnerships for Effective Teaching (IP) initiative that included, as a key feature, teacher evaluations systems similar to New York’s. It concluded that the IP project did not improve either student achievement or the quality of teachers. In fact, it did more harm than good.
Beginning in 2009, three school districts and four charter school management organizations (CMOs) participated in the Intensive Partnerships for Effective Teaching initiative. Its purpose was, in large part, to align the participants’ evaluation systems with a model designed by the Gates foundation’s Measures of Effective Teaching (MET) Project.
The evaluation model was part of a bigger package of teacher reforms that were supposed to result in student test-score improvement and better access to high-quality teachers for low-income minority students. Components of IP included the metric–driven evaluation system, mentoring, hiring practices, individualized professional development based on teacher evaluations, the redistribution of “effective teachers” and merit pay. Its assumption was that the application of business principles such as financial rewards, standardized inspections and measurement would transform the teaching profession and increase student learning.
The cost was astronomical. Across the seven sites over half a billion dollars were spent — $574.7 million between November 2009 and June 2016. While many believed that the Gates Foundation paid the bill, overall the foundation paid less than 37 percent — $212.3 million. Taxpayers paid most of the costs via local or federal tax dollars.
Florida’s Hillsborough County Public Schools was one of the participants. Its program alone cost $262.2 million. Federal, state and local taxpayers paid $178.8 million, far more than the Gates Foundation’s contribution of $81 million. Gates used his money as a lever to open the public treasury to fund his foundation’s idea. The taxpayers picked up the lion’s share of costs.
There were indirect costs as well. According to the study, the average principal spent 25 percent of her time administering the complicated evaluation system and teachers spent hours every month on their own evaluations.
The report estimated that “IP costs for teacher-evaluation activities totaled nearly $100 million across the seven sites in 2014–2015 … the value of teacher and SL [school leader] time devoted to evaluation to be about $73 million, and the direct expenditures on evaluation constituted an additional $26 million.” According to Business Insider, the total cost of IP was nearly $1 billion.
When President Barack Obama’s education secretary, Arne Duncan, decided to include compliance with similar models of evaluation in order for states to receive Race to the Top funds, billions of federal taxpayer dollars were put in play. States and local school districts were forced to ante up for data-collection systems, new tests designed to produce metrics of student growth, training seminars that infantilized experienced principals, and pages upon pages of rubrics designed to turn the art and science of teaching behaviors into a numerical score.
Nowhere was the allegiance to the teacher evaluation model stronger than it was in the seven sites that participated in the IP project. A careful reading of the nearly 600-page report reveals what shocking waste of taxpayer dollars and professional time the project was. Below is a summary of its most important findings.
One of the goals of IP was to help districts recruit better teachers and to assign the most effective teachers to classrooms with low-income minority students. This was to be accomplished through revised recruitment practices as well as financial incentives for teachers to work in high-needs schools.
One participating district, Shelby County Schools in Tennessee, turned over its teacher recruitment efforts to the New Teacher Project (TNTP). TNTP was founded by Michelle Rhee, the controversial former chancellor of D.C. public schools who was a leader in corporate-style education reform. The Gates Foundation gave TNTP $7 million in 2009, the year that it published a report entitled “The Widget Effect,” which was highly critical of the teacher evaluation systems that the foundation was so anxious to replace.
Shelby County Schools allowed TNTP to run its human-resources department, resulting in a strained relationship between TNTP and the existing staff. Other participating districts and CMOs used some TNTP services and, following the advice of TNTP, sought teachers from alternative preparation programs, most notably Teach for America (TFA).
This, according to the report, resulted in increased teacher turnover, since many TFAers only “intended to remain in teaching for only a few years.” The report found no evidence that the quality of the teachers recruited improved.
Access to ‘effective’ teachers for disadvantaged students
A related goal of the project was to move “effective” teachers into schools with the most disadvantaged kids. Not only was this goal not realized, there was evidence that in one district access to more effective teachers declined.
Even with a cash incentive, teachers were reluctant to transfer to schools with high needs because they believed that would result in their receiving a lower VAM score, which was now part of their evaluation. VAM refers to value-added modeling, which in this case uses student standardized test scores in a complicated computer model to supposedly determine the “value” of a teacher on the growth of student achievement by in part factoring out all other influences.
There was statistically significant evidence that the project decreased low-income minority students’ access to effective teachers in Hillsborough County Public Schools — both between schools and within the same school — as teachers sought to flee to the honors classes to avoid low VAM scores, which under the new evaluation system, could cost them their jobs.
Although the report notes that some reformers hoped that the new evaluation system would result in teacher dismissals in the range of 20 percent, the actual rate of dismissal based on performance was similar to the rate under the former system — around 1 percent.
The report, to its credit, notes the dissonance that is created when a summative evaluation system intended to have high-stakes consequences is also used for coaching and teaching improvement, as it was in this project. It simply cannot meet both objectives, and as principals evaluated teachers to help them improve, they hesitated to give them punitive scores.
Regarding student achievement, the report admits that the project failed to meet its improvement goal. The conclusion is, in my opinion, understated. The graphs and charts in the report tell a story of the participating public districts’ and charter chains’ students often falling behind students in similar schools that did not engage in the project.
Take a look at tables 13.1- 13.5 found between pages 462 and 480 of the report. The red boxes indicate where scores were significantly lower than in matched schools not involved in the project. The green boxes indicate where scores were significantly higher than in comparison schools. Gray boxes indicate no statistically significant difference either way. In every table, the red boxes exceed the number of green boxes. In the case of the Tennessee school district (Table 13.3), across every grade level in every year of the project, scores had negative effects or no effects on student achievement.
Overall conclusions of the report
The report ends by acknowledging that the project fell flat and speculation about why that occurred.
“The IP initiative might have failed to achieve its goals because it succeeded more at measuring teaching effectiveness than at using the information to improve student outcomes,” it says.
Six hundred pages and sadly it still misses the point.
The project failed because evaluating teachers by test scores is a dumb idea that carries all kinds of negative consequences for achieving the goal we all want — improved teaching and learning. Every good principal knows that improvement in teaching requires coaching built on a relationship of trust and mutual respect — not boxes and metrics intended to determine whom to punish and whom to reward.
None of the above should have come as a surprise. The New York State Principals letter of 2011 predicted what would happen: teachers who did not want to teach the most disadvantaged students; no improvement in student achievement; inordinate demands on teacher and principal time and a waste of the limited resources of the school.
The Bill and Melinda Gates Foundation could have saved itself millions and taxpayers billions if they had the humility to heed the rebel principals’ advice. However, even after the findings of the final report they have no regrets and instead blame the victim. In a statement to Business Insider, Allan Golston, who is in charge of the foundation’s education initiatives, said, “this work, which originated in ideas that came from the field, led to critical conversations and drove change.”
Change for the sake of change — the nearly $1 billion project in disruption. I wonder what “growth score” Golston will get in his evaluation this year.