By W. Steven Barnett
Preschool has become a higher profile political issue this year as a ballot and budget issue and a key debating point in a fair number of Governors’ races. Therefore, it is not unexpected that the libertarian Cato Institute would publish an attack on public support for preschool education shortly before the election. The aim seems to be to raise a cloud of uncertainty regarding preschool’s benefits that is difficult to dispel in the time before the election, the think tank version of an “October Surprise.” Moreover, the review by David J. Armor contains so many inconsistencies, errors, omissions, and misleading arguments that it is hard to know where to begin in critiquing the paper. And, so much of it has been said before that it is difficult to justify a lengthy new rebuttal. Fortunately, another economist Tim Bartik has already produced a blog post here that explains in detail some of the key errors. In what follows I briefly summarize Bartik’s response supplemented with a few points of my own and then go on to address several additional shortcomings of the review in-depth.
Armor’s primary arguments are: (1) the Perry and Abecedarian programs that produced large long-term benefits and high rates of return can’t be generalized to programs today; (2) the recent regression discontinuity studies finding strong positive effects for state pre-K are biased by methodological flaws; and, (3) “gold standard” experiments (randomized trials) with Head Start and the Tennessee pre-K program find that effects fade-out. Bartik debunks these in detail. I restate the highlights below with a few additions of my own.
First, if one really believes that today’s preschool programs are much less effective than the Perry Preschool and Abecedarian programs because those programs were so much more costly and intensive, and started earlier, then the logical conclusion is that today’s programs should be better funded, more intensive, and start earlier. I would agree. Head Start needs to be put on steroids. New Jersey’s Abbott pre-K model (discussed later) starts at 3 and provides a guide as it has been found to have solid long-term effects on achievement and school success. Given the high rates of return estimated for the Perry and Abecedarian programs, it is economically foolish not to move ahead with stronger programs.
Second, Armor’s claims regarding flaws in the regression discontinuity (RD) studies of pre-K programs in New Jersey, Tulsa, Boston, and elsewhere are purely hypothetical and unsubstantiated. Every research study has limitations and potential weaknesses, including experiments. It is not enough to simply speculate about possible flaws; one must assess how likely they are to matter. For example, invisible aliens could supply the children who attended pre-K with answers to the tests. I can’t disprove it, but no evidence supports such a claim. Bartik reports evidence contradicting Armor’s speculation that the RD data collection method is biased in favor of the preschool group. In addition, Armor reports that RD studies find some state pre-K programs have no positive effects while others find that a program’s effects vary by outcome. Somehow he fails to connect the dots. If the results of RD studies were driven primarily by the biases Armor alleges, then the RD results would be positive across the board for all state pre-K programs.
Third, the evidence that Armor relies on to argue that Head Start and Tennessee pre-K have no long-term effects is not experimental. It’s akin to the evidence from the Chicago Longitudinal Study and other quasi-experimental studies that he disregards when they find persistent impacts. Bartik points to serious methodological concerns with this research. Even more disconcerting is Armor’s failure to recognize the import of all the evidence he cites from the Tennessee study. Tennessee has both a larger experimental study and a smaller quasi-experimental substudy. The larger experiment finds that pre-K reduces subsequent grade retention, from 8% to 4%. The smaller quasi-experimental substudy Armor cites as proof of fade-out finds a much smaller reduction from 6% to 4%. Armor fails to grasp that this indicates serious downward bias in the quasi-experimental substudy or that both approaches find a large subsequent impact on grade retention, contradicting his claim of fade-out.
Among the many additional errors in Armor’s review I address 3 that I find particularly egregious. First, he miscalculates cost. Second, he misses much of the most rigorous evidence. And, third he misrepresents the New Jersey Abbott pre-K programs (note to the Cato fact checkers: repeatedly misspelling a study’s name suggests you have not actually read it) and its impacts.
Armor ballparks the annual cost of public funding for pre-K with the average expenditure for K-12 education of $12,000 per child. He then assumes 100% enrollment to get a national cost of $50 billion. Of course, enrollment in public pre-K will never reach 100 percent as some will continue to stay home or go to private schools, but the bigger problem is that Armor confuses marginal and average cost. His figure includes all the costs of special education, but government already pays for preschool special education in every state. New York alone spends an estimated $2 billion annually on these services. Many other children currently attend publicly funded or subsidized services including state and local public pre-K, Head Start, and childcare. All of this should be subtracted from total cost to figure out the added cost. In addition, while costs paid by parents should not be subtracted from the taxpayer costs, cost savings to parents should be counted on the benefit side. The real added cost to our society of offering high quality pre-K to all 4-year-olds is far, far less than Armor contends.
When a reviewer calls for policy makers to hold off on a policy decision because more research is needed, one might assume that he had considered all the relevant research. However, Armor’s review omits much of the relevant research. For example, he fails to include studies that compare one preschool approach to another even if they are gold standard experiments with long-term follow-up. A number of these find larger lasting effects for one approach, adding strong evidence that preschool per se can produce lasting benefits. It would take many pages to review what Armor omits, but it can be found in other readily accessible reviews (including a focus on domestic, international, and both). Particularly noteworthy is the nonpartisan Washington State Institute for Public Policy’s statistical review of 49 studies. A major finding is that state pre-K programs outperform Head Start, which directly contradicts Armor’s suggestion to the contrary.
To better understand what Armor omitted, consider just one of the better-known studies he failed to review. The Institute for Developmental Studies (IDS) conducted a true experiment with 402 children randomly assigned to attend one year of preschool at age 4 or to a control group. The preschool program was offered in the public schools. One teacher and an aide staffed each classroom of 17 children. Estimated effects at the end of one year were about .40 standard deviations in language and math at the end of the year and persisted at .20 standard deviations through at least 3rd grade. These results are much larger than those found in the Head Start impact study and remarkably similar to those from one year of the similarly resourced Abbott preschool program, discussed below. The results refute Armor’s key claims about what the research shows with a large sample and program comparable to some in large-scale operation today. Longer-term follow-up found benefits in educational outcomes and employment persisting into adulthood, but because of considerable attrition in the adult follow-up, these do not inspire as much confidence as the earlier results.
Finally, the remarkable story of New Jersey’s Abbott Pre-K program is largely lost in translation as reviewed by Armor. Perhaps the closest of today’s large-scale programs to the Perry Preschool in design, this program staffs each classroom of no more than 15 children with a teacher and aide. Teachers have four-year degrees and early childhood certification. Teachers and aides are paid on a public school scale, though two-thirds work in private providers contracted by school districts. There is a system of teacher coaches to assist with continuous improvement of practice and family workers to engage parents. A state Supreme Court order mandated the program in 31 school districts with about 25 percent of the state’s population. The program is open to all children in those districts beginning at age 3.
Unfortunately, Armor misreports some basic facts about the Abbott pre-K program and its estimated impacts, trying to suggest that it is no better than typical Head Start. This is untrue. In the first large scale evaluation of Abbott pre-K the average classroom score on the ECERS-R rating scale was above good (5.2), even though the program was not fully mature. Although ratings of Head Start from the National Impact Study indicated that the vast majority of Head Start’s score at least this well, independent data on Head Start found that only 40% scored 5 or higher. Furthermore, Head Start teachers in the national study had much lower educational qualifications and received far less in-class coaching compared to teachers in Abbott pre-K. Looking at outcomes, estimated initial impacts of one year of the Abbott pre-K program at age 4 were roughly 3 times as large as those in the Head Start impact study, not the same size as claimed by Armor.
As the Abbott program increased enrollments on its way to universal coverage, the National Institute for Early Education Research (NIEER) at Rutgers University employed both the RD approach maligned by Armor and an alternative approach in which children who did not attend Abbott pre-K were identified in the same kindergarten classes as those who did to serve as a comparison with statistical adjustments. Note that the biases alleged by Armor to favor the preschool group in the RD study would favor the control group in the second approach. The second approach produces noticeably smaller (though still about twice as large as in the Head Start impact study for language and math) estimates of the program’s initial impacts, which Armor might interpret as evidence that the RD estimates are biased upward. My view is that the second approach underestimates impacts. In any case, it is the second approach that provides estimates of later effects in kindergarten, second, fourth, and fifth grades. At second grade, these are quite comparable in size to the 3rd grade effects reported by the IDS randomized trial, and comparable, if slightly smaller, at grades 4 and 5.
The information provided above amply demonstrates that Armor’s review is far from rigorous and his conclusions are unwarranted. Those who want an even more comprehensive assessment of the flaws in Armor’s review can turn to Tim Bartik’s blog post and a paper NIEER released last year, as little of Armor’s argument is new. For a more thorough review of the evidence regarding the benefits of preschool I recommend the NIEER papers and WSIPP papers already cited and a recent review by an array of distinguished researchers in child development policy. If all the evidence is taken into account, I believe that policy makers from across the political spectrum will come to the conclusion that high-quality pre-K is indeed a sound public investment.