This is the fourth post in a series on the effectiveness of faith-based prison programs, based on my recent Alabama Law Review article, Do Faith-Based Prisons Work? (Short answer: no.) Monday’s post introduced the issue, Tuesday’s post surveyed some of the least valid studies, and Wednesday’s post critiqued the studies that used propensity score matching and discussed other possible empirical strategies. Throughout, I’ve been putting the faith-based prison research side-by-side with the private schools research, because evaluations of each raise similar methodological problems. The fact that both are voluntary means that they can attract fundamentally different sorts of people, so their good results might be attributable to the higher-quality participants they attract.

Today, I’ll be discussing the most valid studies — those that use rejected volunteers as the control group. When a program has fewer spots than the number of applicants, and when participants are chosen off the waiting list randomly, and when we only compare participants with rejected applicants (entirely ignoring the ones who didn’t apply), we can be more confident that our control group is truly comparable to our treatment group.

Unsuccessful applicants seem like the best possible control group, but in fact they’re not completely ideal. For instance, there may be nonrandom attrition from the program. “[I]f the more motivated parents among the unsuccessful applicants were more likely to enroll their child in a private school outside of the choice program”—where statistics aren’t being kept—the unsuccessful applicants group would look worse, and the estimate of the effect of getting a voucher would be inflated.

There may be other issues, like exceptions to random assignment—a “sibling” rule for schools, or allowing some rejected students to enter from waiting lists after the beginning of the year—or just lack of oversight of the random selection process. Some analysts, like John F. Witte, have therefore concluded that the rejected applicants approach is worse than instrumental variables or even than standard approaches that don’t control for selection.

But even if one concludes that rejected applicant groups aren’t ideal for schools, the problems seem much less in prisons. The attrition in favor of schools outside the system doesn’t seem so problematic in the prison context since both the successful and the rejected applicants are, so to speak, a captive audience. The same goes for sibling rules. Oversight of the random selection process is still important, but overall, it seems like rejected applicant studies of faith-based prisons are substantially better than the other studies to date. (And other methods that take selection on unobservables into account, like instrumental variables or exogenous policy shocks, simply haven’t been attempted for faith-based prisons.)

The first few studies below find no positive effect of faith-based programs; the next few do find some effect.

1. The Texas InnerChange Studies

Byron Johnson and David Larson conducted a preliminary evaluation of a Texas-based InnerChange Freedom Initiative program (IFI). (This report was based on data in an earlier report by Brittani Trusty and Michael Eisenberg.) They compared the 177 IFI participants against three different groups: (1) a “match group” of 1,754 inmates who “met IFI selection criteria but did not participate in the program,” (2) a “screened group” of 1,083 inmates who “were screened as eligible for the program but did not volunteer or were not selected for program participation,” and (3) a “volunteer group” of 560 inmates who “actually volunteered for the IFI program, but did not participate, either because they were not classified as minimum-out custody, their remaining sentence length was either too short or too long to be considered, or they were not returning to the Houston area following release.” Of these three groups, only the third avoids selection bias.

IFI participants did no better than the other groups in either two-year re-arrest or reincarceration rates. Two-year re-arrest rates were 36.2% for the IFI group, compared to 35% for the match group, 34.9% for the screened group, and 29.3% for the volunteer group. Two-year reincarceration rates were 24.3% for the IFI group, compared to 20.3% for the match group, 22.3% for the screened group, and 19.1% for the volunteer group.

It’s true that IFI graduates had lower re-arrest (17.3%) and reincarceration (8.0%) rates. But IFI’s definition of “graduation” is “quite restrictive” and includes completing 16 months in the IFI program, completing 6 months in aftercare, and holding a job and having been an active member in church for the 3 months before graduation. Inmates could be removed from the program “for disciplinary purposes,” “at the request of IFI staff,” “for medical problems,” and “at the voluntary request of the applicant.” The set of inmates who “graduated” from the program is thus tainted by self-selection (the decision to participate), selection by the program staff (the decision not to expel), and “success bias” (the decision to finish the program, which in this case even includes a post-release component).

2. OPPAGA’s FCBI Study

Florida’s Office of Program Policy Analysis and Government Accountability (OPPAGA) published a report on several “faith- and character-based programs” in Florida prisons.

Some of these programs were institution-wide, “offered to all inmates,” and “incorporated into the facility’s mission.” These programs included Bible study groups, Native American prayer, parenting skills, and yoga classes, so they really don’t count as “faith-based prisons” as we are using the term here.

Other programs were dorm-based; the dorms were “established as . . . enclave communit[ies] within the prison compound.” The dorm-based programs “provide a more intensive experience than the prison-wide programs” and look more like the faith-based prisons that we have been discussing.

The authors compared 1,293 inmates released from a faith- and character-based institution with 2,283 inmates who had requested transfer to such an institution but weren’t placed there before their release. They also compared 1,311 inmates released from a faith- and character-based dorm with 9,988 inmates who had requested transfer to such a dorm but weren’t placed there before their release. (The study doesn’t say why the comparison inmates weren’t accepted.)

For the institution-wide programs, the study found that inmates’ relative risk of reoffending ranged from 0.85 to 0.95 relative to the comparison group, depending on the institution. The authors found no positive effect of the dorm-based programs—on the contrary, the relative risk of reoffending for inmates released from such dorms was 1.03 relative to the comparison group.

3. Hall’s Putnamville Study

Stephen Hall examined the effect of the Biblical Correctives to Thinking Errors program on in-prison infraction rates of inmates at the Putnamville Correctional Facility in Indiana. The study was open to volunteers who weren’t participating in other treatment programs, who regularly participated in chapel programs, and who had graduated from the chapel’s Christian twelve-step program. After 46 inmates responded and 8 of these were transferred or discharged, the remaining 38 were divided into a treatment group of 10 and a control group of 28.

There were no infractions in the treatment group, and 17 infractions in the control group (all from 6 of the 28 members). The difference was significant, but the authors wrote that “the sample size in this study is too small to make a case for validity.”

4. Hercik et al.’s Kairos Horizon Study

Jeanette Hercik and her coauthors evaluated the effect of participation in the Kairos Horizon Communities in Prison program at Florida’s Tomoka Prison.

The authors considered 413 inmates who participated in any of the first five classes of the program. (Class One ran from November 1999 to October 2000; Class Two ran from May 2000 to April 2001; and so on.)

First, participants were compared against their previous selves. After the treatment started, the proportion of participants with at least one discipline report dropped from 24.4% to 12.3%, and this proportion remained in the 12–17% range through three years after the start of treatment (two years after the end of treatment). Similarly, the proportion of participants with at least one segregation stay dropped from 20.6% to 10.6%, and this proportion hovered around 15–16% through three years after the start of treatment, with a blip up to 18.2% in the 25–30-month range.

Next, the 157 participants in Classes Four and Five were compared against two different groups: a “Matched Comparison” group of 157 inmates who were eligible but didn’t apply, and a “Waiting List Comparison” group of 248 inmates who were eligible and did apply. From the start of treatment, the proportion of the treatment sample with at least one discipline report was lower than for either of the comparison samples (14% versus 25% and 31%, respectively), and the proportion stayed lower through two years after the start of treatment, though this difference wasn’t significant past the 12-month mark. Similarly, the proportion of the treatment sample with at least one segregation stay was lower than for either of the comparison samples (13% versus 26% and 25%, respectively), and the proportion stayed lower through two years after the start of treatment; these differences were all significant.

The probability of re-arrest of participants during the follow-up period (19.0% among those released during the study period) was greater than that of the matched comparison group (15.2%) and basically the same as that of the waiting list group (19.6%). Program participation may be associated with a somewhat longer time for re-arrest (3.5 months for the treatment sample, 1.4 months for the matched comparison group, and 3.2 months for the waiting list comparison group), but the standard deviations are so large that I doubt that these differences are significant.

The matched comparison sample is subject to self-selection bias, and the comparison of participants to their previous selves is probably also biased because those who choose to participate probably have a greater responsiveness to the material. So the waiting list comparison group is the most valid control group. For this group, while the difference in discipline reports and segregation stays may be significant, participation seems to confer no significant advantage in the probability of re-arrest.

5. Wilson et al.’s Detroit TOP Study

Leon Wilson and his coauthors prepared an unpublished report on an ex-prisoner aftercare program, the Detroit Transition of Prisoners (TOP) program.

A group of 135 former inmates who participated was compared to a 139-member designated control group, mainly composed of former inmates who applied but were turned down because they didn’t meet the inclusion criteria. The TOP program was trying to take people it believed to be high risks, so the treatment group was actually estimated to be at higher risk for recidivism than the non-treatment group.

The recidivism rate was 18% for graduates, as compared to 57% for the control group. However, the set of “graduates” is the result of a significant weeding-out process. Of the 124 initial participants, only 66 remained in the program for six months and only 47 remained after a year. Only 40 graduated from the program; others didn’t complete the one-month probationary period, were terminated for rule violations, didn’t participate, or just lost contact with the program after applying. These groups all had recidivism rates much higher than 18%, and even mostly higher than the 57% of the control group.

The study doesn’t give the recidivism rate for the entire population of participants. Using the authors’ data from their adjusted regressions, we can estimate the recidivism rate at roughly 52% for participants and 57% for the control group, which isn’t a significant difference. But once we use the adjusted recidivism rate, which the authors obtain after controlling for risk rating, age, and education—so the treatment and control groups are more comparable—the recidivism rate comes out at roughly 54% for participants and 68% for the control group, which is a significant difference.

6. O’Connor et al.’s Detroit TOP Study

Tom O’Connor and his coauthors examined the same Detroit TOP program. They compared the 60 men who applied for and were accepted into TOP with two control groups—a set of 109 rejected applicants and a random sample of 174 non-applicants who were at the pre-release centers involved in the program. The rejected applicants were rejected for various reasons: some were rejected because they wouldn’t be living in Detroit, some because they had insufficient prior church involvement, and some because they had too much time left to serve at the time they applied. Demographic data suggested that the participating group had the highest risk of recidivism, the rejected volunteer group had the next highest risk, and the random sample of non-applicants group had the lowest risk of the three groups.

First, the authors looked at the likelihood of being returned to prison for escaping from the pre-release center. At least when looking at those with three or more felonies, participants did better than rejected volunteers, who did better than the random non-applicants. (On the other hand, the participants had, on average, more church involvement than the rejected applicants.) However, participants with less than three prior felonies did worse than the rejected volunteers and no better than non-applicants.

Next, the authors looked at the likelihood of being returned to prison for a parole violation or a new crime. Unfortunately, at this point the authors divided participants into those who stayed with the program and those who were discharged, whether for lack of participation, inappropriate conduct, or escape. This reintroduces selection. We don’t know what the results would have been if the group hadn’t been subdivided. But even with the subdivision of the participating group into those who continued and those who didn’t, the continuing group and the rejected volunteers group were both “two times less likely to have a parole violation or new crime than the general population of ex-offenders.” Thus, if the group hadn’t been subdivided, we probably would have found that participation conferred no benefit over the rejected volunteers.

7. Education Studies

Private school studies have also been able to use control groups of rejected applicants, thanks to the advent of small-scale voucher programs with a limited number of spots.

Some voucher programs distribute vouchers on a first-come, first-served basis, so the rejected applicants—the ones who applied too late—likely differ systematically from those who were accepted. (Some of the faith-based prison programs above, which don’t say how people made it off the wait list, are potentially vulnerable to this problem.)

The most recent studies use data from school voucher programs with limited slots and random selection of students off the waiting list. In principle, voucher programs could be problematic ways of testing private versus public school effectiveness. If voucher programs, through the threat of competition, encouraged public schools to improve, a comparative analysis would understate any positive effect of private schools. Fortunately (for private school researchers), voucher programs have, for political purposes, tended to be extremely limited. Some studies have argued that vouchers improve public schools, but clearly the extent of any improvement is much less than it would be if vouchers were more widely adopted. So this methodological concern shouldn’t worry us much.

Several papers have analyzed the Milwaukee Choice program, using unsuccessful applicants as their comparison group. Jay Greene and his coauthors found that private schools produced significant gains in math scores in students’ third and fourth years in the program, though no significant effects for reading. (It’s plausible that school reforms would improve math more than reading, since math is learned primarily in school while reading is also practiced outside of school.) John F. Witte found no significant effects for reading and weak effects for math. Cecilia Rouse found no consistent effects for reading; for math, there seemed to be some effects, but not until two years after application, and some other specifications yielded no significant differences until the fourth year. However, all three of these papers were apparently based on inaccurate test score data. Greene and his coauthors, using a corrected data set, found significant effects on math scores starting three years in and significant effects on reading scores three or four years in.

Paul Peterson and his coauthors analyzed the New York City School Choice Scholarships program. They found that being offered a voucher had a positive and significant effect on both math and reading scores, at least in grades four and five.

Outside of the public-private school debate, Alan Krueger used a rejected-applicants approach in concluding that smaller class sizes increased average performance on standardized tests.

A few studies have merged the unsuccessful-applicants approach and the instrumental-variables approach. Not all successful applicants enroll in choice schools, so if one uses a rejected-applicants approach, one shouldn’t compare the rejected applicants with people who actually use the program—that would reintroduce self-selection. Rather, one should compare the rejected applicants with the successful applicants, regardless of whether they used the program. The measured effect isn’t the true effect of actually attending a choice school. Instead, one should interpret the estimate as an “intention-to-treat” effect rather than as a “treatment” effect.

This is a good approach, since offering the voucher is “the only policy instrument available to policy makers,” who after all can’t force parents to remove their children from public schools. (This point also applies to faith-based prisons.) Still, one may be interested in the actual effect of attending a choice school, particularly if one is a parent. To solve the self-selection problem in the choice whether to attend a choice school, one can use an instrumental-variables approach, using whether one gets a voucher to predict whether one attends a choice school.

Cecilia Rouse took this approach with the Milwaukee voucher program and found that attending a voucher school raised math scores by about 3 percentile points (an estimate she thought overstated the true effect of the program) and had no effect on reading scores. Paul Peterson and William Howell took the same approach with the voucher programs in New York City, Dayton, and Washington, D.C., and found significant achievement gains among African-Americans, immediately in the case of New York City and in the second year in the case of Dayton and Washington, D.C. In other work, Peterson and his coauthors found that switching to a private school had a significant effect, at least after the first year, for African-Americans, but no significant effect for other ethnic groups.

So, what’s the bottom line? After discarding the faith-based prison studies tainted by self-selection bias, we’re left with two studies that find no effect of faith-based programs, one study that’s too small to be meaningful, and three studies that find some effect, even if the effect that a few of these find is quite weak. And of those three, two aren’t about prisoners at all, but about after-care of released prisoners, and the remaining one shows no significant effect once the prisoners have been released. So we have no study that actually finds a significant effect of an in-prison faith-based program on recidivism.

Tomorrow’s post will conclude this series, with a discussion of what to do about all this research and whether there’s any hope for the future.