Teach for America, the nonprofit organization that places high-achieving college graduates in school districts in underserved areas of the country, hasn't lacked for evaluations over the years. As I explained back in April, the majority of evaluations have shown either that TFA teachers are as effective as their peers, or that they are even better than traditional teachers in some categories. A vocal minority resists this conclusion, but the best data we have suggests that TFA either does no harm or does active good.

The best evidence we had before today was a randomized evaluation conducted by Mathematica Policy Research between 2001 and 2003, which found that TFA teachers bested other teachers at teaching math — with gains for students equal to about a month of additional instruction — and were not significantly different from them on teaching reading.

A follow-up using the same data showed that that result held for students across the math score distribution, not just the average student. "These results suggest that allowing highly qualified teachers, who in the absence of TFA would not have taught in these disadvantaged neighborhoods, should have a positive influence not just on students at the top of the achievement distribution but across the entire math test score distribution,” the authors concluded.

That consensus was bolstered in a big way Tuesday by the release of a new Mathematica evaluation of both TFA and the Teaching Fellows program, which runs highly selective, city-specific teacher placement programs somewhat akin to TFA but targeted at both kids just out of college and at professionals looking for a career change (think Prez).

The report, which was sponsored by the Institute of Education Sciences, the research arm of the Department of Education, compares TFA and Teaching Fellow participants teaching secondary math (that is, math at both the middle and high school levels) to their peer teachers, who either came in through traditional routes or through a less selective alternative program.

The Teaching Fellow math teachers were no more or less effective than the comparison group, but the TFA teachers produced gains "equivalent to an additional 2.6 months of school for the average student nationwide."

**Who, what, where, when?**

The study looked at both the 2009-2010 and 2010-2011 school years, and looked at thousands of students and hundreds of teachers in a variety of states. "The final TFA study sample consisted of 4,573 students, 111 classroom matches, 136 math teachers, 45 schools, and 11 districts in 8 states," the authors* write. "The final Teaching Fellows study sample consisted of 4,116 students, 118 classroom matches, 153 math teachers, 44 schools, and 9 districts in 8 states."

The characteristics of the schools in the TFA study, compared to all secondary schools with TFA teachers and all secondary schools period, are summarized in this table:

Urban areas were slightly overrepresented and suburban schools underrepresented, but rural schools were pretty accurately represented). The racial demographics of the study sample were very, very close to those for all TFA schools, and the mean percentage of students eligible for free lunches was roughly the same too. The study sample over-represented the South and underrepresented the West, but the Northeast and Midwest had roughly the same shares of study sample schools as they do of TFA schools in general. The schools were, on average, slightly larger than the mean size of all TFA schools.

Interestingly, there were no charter schools in the TFA sample. Melissa Clark at Mathematica says this is due to the requirements of carrying out randomized experiments. "We could only choose schools that could support this kind of random study, and charter schools tended to be smaller, and did not have at least two teachers teaching the same course at the same time," she explains. And if you don't have at least two teachers in a school teaching the same course, you can't compare teachers in the school to each other, which is what the study needed to do. Magnet schools, however, were slightly overrepresented relative to all TFA schools.

The demographics of the studied TFA teachers, and the teachers they were compared to, differed markedly as well as the following table shows.

TFA teachers are younger than both secondary school teachers nationwide and the comparison group teachers. They're pretty similar in racial and gender terms to the full sample of secondary school teachers, but the comparison group is much less white, more black, and more female than the TFA group.

The TFA group is likelier to have gone to selective schools than the comparison group, but less likely to have majored in math or to have an advanced degree:

They also took fewer college math courses than the comparison group:

But if you look at the TFA and comparison teachers' scores on the Praxis II Mathematics Content Knowledge Test or the Praxis II Middle School Mathematics Test, two standardized tests meant to evaluate math knowledge among secondary math teachers, TFA teachers do better:

As would be suggested by the entire point of the program, TFA teachers are less experienced at teaching:

The results on Teaching Fellows are broadly similar, with the exception that they're older, male-er, and have taken more college math classes than TFA teachers. They're still younger and less math-educated than their comparison group, however.

**The results - TFA**

Before we get into results, I should note that most of the numbers here are expressed as standard deviations, which can be an impenetrable metric for the layperson. Luckily, it's easy to convert it into a more meaningful number. Hanley Chiang, a coauthor of the report, describes the process:

According to previous research by Carolyn Hill and her colleagues, students’ average growth in math achievement over a single school year is about 0.27 standard deviations in the middle and high school grades. Therefore, if a gain is expressed in standard deviations, then dividing that gain by 0.27 gives you the fraction of the school year that is equivalent to that particular gain. For instance, a gain of 0.07 standard deviations (the achievement difference between students of TFA teachers and students of comparison teachers) is equivalent to about one-fourth of a school year (0.07 / 0.27 = 26 percent). If you assume that a school year consists of roughly 10 calendar months, then 26 percent of a school year translates into about 2.6 months of math instruction.

So, in our report, there are two simple steps to converting any standard deviation gain into months of learning: First, divide by 0.27. Second, multiply by 10.

With that in mind, here's the nickel summary for TFA:

These are, frankly, devastating for many critics of past positive TFA studies, who have relied heavily on the fact that they compare TFA teachers against both experienced veteran teachers and teachers who entered through less selective alternative routes, which they argue stacks the deck in TFA's favor.

“Let’s say you go to Reagan airport, and Delta says you have three options: one pilot who has had 30 hours of training, another who’s had five weeks of training, and another who’s been piloting for five years and has been piloting this plane for a whole year. Which pilot do you want?” Julian Vasquez Heilig, an associate professor at the University of Texas in Austin and author of some critical research on TFA, said in April. “When they compared the TFA teachers to the certified teachers, they weren’t better. There’s no significant result. So they’re comparing five weeks to 30 hours.”

That was true when Heilig said it, but it's not true anymore. TFA teachers beat not only the alternative entry teachers but certified teachers too. What's more, they beat both novice and experienced teachers:

Again, the 0.07 standard deviation gain compared to experienced teachers is equivalent to 2.6 extra months, or a 26 percent increase in the length of the school year. That's huge. TFA teachers do especially well among high school students, though middle-schoolers see gains from TFA teachers too.

Critics could still argue that this could mean TFA teachers are just better at "teaching to the test," rather than teaching real math skills. But as the Mathematica researchers note, this concern is misplaced. "At the middle school level, we measured performance on state math tests, high stakes tests. We knew they were taking them seriously," Clark explains. "But the flip side is that they might have been teaching to the test. At the high school level, since students are assessed at every grade level, we instead administered a test which was subject-specific, for algebra I and algebra II, geometry, and general high school math. The teachers had never seen it before and could not have been teaching to the test, and we also found effects at the high school level." Indeed, the effects at the high school level are *stronger* than at the middle-school level. If TFA teachers were teaching to the test, they weren't doing a great job of it.

**The results - Teaching** **Fellows**

The results for Teaching Fellows programs were less rosy. Here's the summary:

So Teaching Fellows do solidly better than those coming from non-selective alternative routes, but a bit worse than traditional teachers. Similarly, they do better than novice comparison teachers but substantially worse than experienced ones.

In a reversal of the TFA findings, the Teaching Fellows do better with middle schoolers than high schoolers:

Because of the study's design, you can't make any definitive statements about whether TFA works better or worse than the Teaching Fellows' programs. The samples come from different schools, the comparison groups looked different, etc. But this does give a more complex — though not wholly negative — picture than the one the study gives for TFA.

**What explains all this?**

Perhaps the most interesting part of the study is the end, where the researchers try to isolate what it is about TFA that makes it so effective. Interestingly, they find that the most obvious factor — the selectivity of the schools TFA participants went to — doesn't actually matter much. Higher Praxis II scores and the presence of TFA participants who'd used math in a non-teaching setting gave the program an edge, but not a big enough one to offset the experience advantage that the comparison teachers had.

"Overall, the observed characteristics could not account for why TFA teachers were actually more effective than comparison teachers," the authors write. "TFA teachers’ lower experience levels suggested that they would be less effective than comparison teachers to an extent that would more than offset the other observed characteristics, such as Praxis scores, on which they had an advantage. On net, based on all teacher characteristics in the analysis, students of TFA teachers would have been predicted to score 0.028 standard deviations lower than students of comparison teachers. In fact, students of TFA teachers actually scored 0.075 standard deviations higher than students of comparison teachers."

So while this is easily the most rigorous evaluation of TFA to date, more study is needed to know what exactly it is about the organization that promotes such solid math gains. But it does confirm that, as Andrew Rotherham writes, when it comes to TFA, "Rigorous studies *consistently *show modest or significant positive effects – and perhaps more importantly given the context of the advocacy debate, they don’t show harm." That doesn't mean every TFA teacher is great, just as it doesn't mean every traditional secondary math teacher is bad; I certainly know my share of burnt-out TFAers who regret doing it. And it doesn't mean there aren't solid normative complaints to lodge against the organization.

But the claim that it doesn't promote learning in math just doesn't hold up to scrutiny anymore. "It is a great day," Steve Mancini, a spokesman at Teach for America, emails. "We are really proud of our teachers-and their students."

* The full listing is Melissa A. Clark, Hanley S. Chiang, Tim Silva, Sheena McConnell, Kathy Sonnenfeld, and Anastasia Erbe (all at Mathematica) and Michael Puma (at Chesapeake Research Associates).