wpostServer: http://css.washingtonpost.com/wpost

The Post Most: Local

Answer Sheet
Posted at 05:00 AM ET, 06/28/2011

The problem with how we evaluate success in ed reform


This was written by Matthew Di Carlo, senior fellow at the non-profit Albert Shanker Institute, located in Washington, D.C. This post originally appeared on the institute’s blog.

By Matthew Di Carlo

In the mid-1990s, after a long and contentious debate, the U.S. Congress passed the Personal Responsibility and Work Opportunity Reconciliation Act of 1996, which President Clinton signed into law. It is usually called the “Welfare Reform Act,” as it effectively ended the Aid to Families with Dependent Children (AFDC) program (which is what most people mean when they say “welfare,” even though it was [and its successor is] only a tiny part of our welfare state). Established during the New Deal, AFDC was mostly designed to give assistance to needy young children (it was later expanded to include support for their parents/caretakers as well).

In place of AFDC was a new program – Temporary Assistance for Needy Families (TANF). TANF gave block grants to states, which were directed to design their own “welfare” programs. Although the states were given considerable leeway, their new programs were to have two basic features: first, for welfare recipients to receive benefits, they had to be working; and second, there was to be a time limit on benefits, usually 3-5 years over a lifetime, after which individuals were no longer eligible for cash assistance (states could exempt a proportion of their caseload from these requirements). The general idea was that time limits and work requirements would “break the cycle of poverty”; recipients would be motivated (read: forced) to work, and in doing so, would acquire the experience and confidence necessary for a bootstrap-esque transformation.

There are several similarities between the bipartisan welfare reform movement of the 1990s and the general thrust of the education reform movement happening today. For example, there is the reliance on market-based mechanisms to “cure” longstanding problems, and the unusually strong liberal-conservative alliance of the proponents. Nevertheless, while calling education reform “the new welfare reform” might be a good soundbyte, it would also take the analogy way too far.

My intention here is not to draw a direct parallel between the two movements in terms of how they approach their respective problems (poverty/unemployment and student achievement), but rather in how we evaluate their success in doing so. In other words, I am concerned that the manner in which we assess the success or failure of education reform in our public debate will proceed using the same flawed and misguided methods that were used by many for welfare reform.

I often hear policymakers and pundits of both parties assert, without a trace of doubt, that welfare reform was a success. For years after the law’s enactment, their evidence seemed to consist of one thing: The number of people on welfare rolls was declining. The interpretation of these data was that the law was helping people move from “welfare to work.” As the years went on, supporters of welfare reform also pointed to declining poverty rates as evidence of success.

Both inferences are, by themselves, unsupported at best. Sure, it’s conceivable that declining welfare rolls might signal that most recipients are moving off public assistance and into gainful employment. But, given the fact that the law was specifically designed to cut the welfare rolls (by imposing work requirements and time limits on benefits), an equally plausible interpretation is that people were being forced off welfare into deeper poverty. Similarly, it is entirely possible that the coincidence of shrinking welfare rolls and declining poverty rates reflects a causal connection, but – given the fact that we were undergoing an economic boom at the time – it’s more likely that the decline in poverty was a function of overall economic conditions and not of a relatively small program affecting poor single mothers.

To truly evaluate the effects of welfare reform, it is necessary to go far beyond improper causal inferences drawn from deficient data. When you do, you get a somewhat different picture. In fact, it has been shown that welfare reform, while no doubt successful in many cases, actually acted to weaken the safety net for the poorest Americans, while a decline in the overall poverty rate masked an increase in the nation’s rate of “deep poverty.” There were unintended consequences, such as decreased college attendance among poor mothers who had to work for their benefits. And, finally, much of the alleged success was due more to the strong economy of the mid- to late-1990s than to TANF itself, as is evident in lower success rates among recent welfare “leavers” compared with the late 1990s, as well as the failures of many states’ TANF programs to respond adequately to the current recession.

We can debate whether the welfare reform law was successful – and there are many studies on this topic, covering a wide variety of outcomes – but the point is that the “conventional wisdom” that it was a smashing success, fueled by mostly positive media coverage, was based on crude interpretations of inadequate evidence.

I’m concerned that we’re headed down this same road in education policy. For example, let’s say, hypothetically, that Florida, which just enacted huge changes to teacher personnel policy, makes a significant gain on its 2013 NAEP results. You can already hear the exultant cries: The reforms worked! Other states must follow suit!

First, as is the case with declining welfare rolls, an increase in cross-sectional test scores does not necessarily signal actual progress (it could be, for instance, the result of cohort differences), and even if the gains are “real,” they may not always signal greater readiness for college and career. Moreover, just like poverty rates and welfare reform, it would be hard, absent rigorous evaluation, to say whether it is Florida’s teacher-focused reforms themselves that were responsible, or general conditions such as higher standards and more effective administration. Finally, in much the same way that poverty rates can increase while “deep poverty” rates can go up, overall test score averages mask underlying distributional differences – by income, language ability, race, achievement level, and a dozen other things – that are rarely accorded any attention in our public discourse.

But I don’t need hypothetical examples. Our welfare reform-style evidentiary standards are all around us. Rising test scores are reflexively attributed to the radical changes in New Orleans public schools after Hurricane Katrina, ignoring the possibility that the largest displacement since the Civil War had changed the composition of the city’s public school students. Author Richard Whitmire has spent the past year or so traveling the country, offering simplistic misinterpretations of testing data as “evidence” of Michelle Rhee’s effectiveness (as has Rhee herself).

Our entire charter school debate is a never-ending battle between proponents and opponents determined to show (often improperly) that charters do or do not boost test scores – with almost nobody asking the important question: Why? In fact, to whatever degree NCLB was spurred along by the “Texas miracle” (for example, “miracle” graduation rates in Houston were actually due to unrecorded dropouts), our primary federal education policy was enabled by a misreading of flawed data.

This needs to stop, on both “sides” of the debate. It took over a decade for pundits and policymakers to start asking serious questions about the true effects of welfare reform, and, even today, very few people beyond the research community have seen or examined the evidence (which is, by the way, still growing -- you cannot evaluate a policy intervention solely in the short-term). Instead, declining rolls and poverty rates were used in much the same way as many use test scores: Incorrectly.

We can and do disagree about the merits of the new policies now being enacted throughout the nation. But there should be no disagreement as to the necessity of monitoring their impacts rigorously and continuously. Let’s point out when data are interpreted improperly, even when it doesn’t support our own point of view. Let’s look at a range of outcomes, not just test scores, and certainly not just proficiency rates. And let’s not make causal arguments based on any analysis that is not specifically designed to identify a causal impact.

Some policies work, others don’t, and there’s nothing wrong with trial and error, so long as you know which is which.


Follow The Answer Sheet every day by bookmarking http://www.washingtonpost.com/blogs/answer-sheet. And for admissions advice, college news and links to campus papers, please check out our Higher Education page. Bookmark it!

By  |  05:00 AM ET, 06/28/2011

Read what others are saying

    © 2011 The Washington Post Company