Marvin Olasky, editor-in-chief of the Christian biweekly World magazine (and author of various books, including The Tragedy of American Compassion), has recently written about faith-based prisons (here and here), citing my work (my Alabama Law Review article, which you might have seen blogged here — here’s the fifth post in the series, containing links to the previous posts).

I read several issues of World in the mid-’90s and enjoyed them, particularly Olasky’s editorials, so I was pleased that my work had come to the attention of this particular audience. As you may recall, one of my points was that comparing graduates of a faith-based program with non-participants is misleading, because what about the participants who dropped out? If the graduates were the “best” of the participants while the drop-outs were the “worst”, we should compare the best of the participants with the best of the non-participants. But unfortunately, we don’t know who the best of the non-participants are, because they haven’t bothered to identify themselves to us. So at the very least, we should compare total participants with non-participants.

Now I’ve also argued that this, too, is misleading, because there’s a fundamental difference between participants and non-participants. Participants are the ones who wanted to join for some reason; they had some sort of motivation. And this motivation is likely to be a crucial determinant in whether they reoffend later. (Not that all participants are like that, but all that’s necessary for this argument to work is that there be some correlation.) So there’s some reason to think that participation already predicts lower reoffense rates, even if the program itself has absolutely no effect (and, in fact, even if the program is somewhat harmful). Essentially, it just serves to identify the better inmates for us.

So even comparing participants with non-participants isn’t very good because the results are subject to self-selection bias. But comparing graduates with non-participants is even worse because you’re now piling on a second layer of selection bias. (This may be self-selection bias if the drop-outs drop out voluntarily, or it may be selection by the program staff, if they kick out those who are likely to fail or who don’t seem like they’re “with the program”.)

Olasky disagrees; here’s an excerpt, but you can read the whole thing:

A faith-based program—OK, let’s cut the euphemisms and say (90 percent of the time) “Christian program”—is more like communion than magic. It’s not pixie dust. Rather, it intensifies the life or death, God or Satan decision we all have to make at some point during our lives.

Prisoners who enter a Christian program often want to because they’ve had failure after failure: They are not more likely to succeed than others, just more desperate. Christian programs lead prisoners to water, but only God can make them drink. Those who turn aside miss their great opportunity, and may even increase their commitment to evil. That’s why it’s important to compare program graduates with the general prison population. By doing so we see that faith-based prisons (and other faith-based programs) do work for some but not all—and that’s a surprise only to those who believe in magic.

I just want to say one more thing, related to a quote by Byron Johnson, the author of a number of important faith-based prison studies, that appears in one of the posts:

[A]s Baylor professor Byron Johnson, an exceptionally clear-headed social scientist, pointed out when we discussed the matter, “The only way to totally eliminate selection bias is by utilizing a true classical experimental design—these studies randomly assign individuals into treatment and control groups. I’d love to do such a study with a faith-based program, but, of course, we cannot randomly assign individuals into a faith-based program—they must participate voluntarily. Therein lies the real problem. Only the results of a true experiment would apparently satisfy Volokh and other critics, and that won’t happen.”

Johnson is right that random assignment is great for methodological validity. (Not that that’s the only thing that would satisfy me: in my Alabama Law Review article, I suggest other possibilities that have been successfully tried in the literature evaluating public vs. private schools (sometimes public vs. Catholic schools): the instrumental variables method and the exogenous policy shock method.)

But what I find somewhat mystifying is Johnson’s statement that he’d “love to do such a study” but that, “of course, we cannot randomly assign individuals”. Why do I find this mystifying? Because, in my article, I discuss a random-assignment method that has been used successfully in evaluating faith-based prisons. Of course people have to participate randomly, but you can have programs that are oversubscribed, and you can let people in off the wait list randomly. Suppose you have 200 people signing up for a program with 100 spaces. You let 100 in randomly, and the other 100 stay in the general population. Then, instead of comparing participants with non-participants, you compare participants with rejected volunteers only.

I love this method, because if the usual difference between participants and non-participants is one of motivation, the rejected-volunteers method basically equalizes that factor between the control group and the treatment group. Obviously, it’s important that the assignment method be completely random, or else the groups aren’t truly comparable. And, as described above, it’s important that you keep the drop-outs in the evaluation, rather than just comparing graduates against rejected volunteers.

The rejected-volunteers method has been used in a number of education studies, as I discussed in my fourth post on faith-based prisons:

Several papers have analyzed the Milwaukee Choice program, using unsuccessful applicants as their comparison group. Jay Greene and his coauthors found that private schools produced significant gains in math scores in students’ third and fourth years in the program, though no significant effects for reading. (It’s plausible that school reforms would improve math more than reading, since math is learned primarily in school while reading is also practiced outside of school.) John F. Witte found no significant effects for reading and weak effects for math. Cecilia Rouse found no consistent effects for reading; for math, there seemed to be some effects, but not until two years after application, and some other specifications yielded no significant differences until the fourth year. However, all three of these papers were apparently based on inaccurate test score data. Greene and his coauthors, using a corrected data set, found significant effects on math scores starting three years in and significant effects on reading scores three or four years in.

Paul Peterson and his coauthors analyzed the New York City School Choice Scholarships program. They found that being offered a voucher had a positive and significant effect on both math and reading scores, at least in grades four and five.

Outside of the public-private school debate, Alan Krueger used a rejected-applicants approach in concluding that smaller class sizes increased average performance on standardized tests.

A few studies have merged the unsuccessful-applicants approach and the instrumental-variables approach. Not all successful applicants enroll in choice schools, so if one uses a rejected-applicants approach, one shouldn’t compare the rejected applicants with people who actually use the program—that would reintroduce self-selection. Rather, one should compare the rejected applicants with the successful applicants, regardless of whether they used the program. The measured effect isn’t the true effect of actually attending a choice school. Instead, one should interpret the estimate as an “intention-to-treat” effect rather than as a “treatment” effect.

This is a good approach, since offering the voucher is “the only policy instrument available to policy makers,” who after all can’t force parents to remove their children from public schools. (This point also applies to faith-based prisons.) Still, one may be interested in the actual effect of attending a choice school, particularly if one is a parent. To solve the self-selection problem in the choice whether to attend a choice school, one can use an instrumental-variables approach, using whether one gets a voucher to predict whether one attends a choice school.

Cecilia Rouse took this approach with the Milwaukee voucher program and found that attending a voucher school raised math scores by about 3 percentile points (an estimate she thought overstated the true effect of the program) and had no effect on reading scores. Paul Peterson and William Howell took the same approach with the voucher programs in New York City, Dayton, and Washington, D.C., and found significant achievement gains among African-Americans, immediately in the case of New York City and in the second year in the case of Dayton and Washington, D.C. In other work, Peterson and his coauthors found that switching to a private school had a significant effect, at least after the first year, for African-Americans, but no significant effect for other ethnic groups.

Anyway, the first rejected-volunteers study that I discuss in my Alabama Law Review article is called The InnerChange Freedom Initiative: A Preliminary Evaluation of a Faith-Based Prison Program. Here’s what I say about it:

[The authors] compared the 177 IFI participants against three different groups: (1) a “match group” of 1,754 inmates who “met IFI selection criteria but did not participate in the program,” (2) a “screened group” of 1,083 inmates who “were screened as eligible for the program but did not volunteer or were not selected for program participation,” and (3) a “volunteer group” of 560 inmates who “actually volunteered for the IFI program, but did not participate, either because they were not classified as minimum-out custody, their remaining sentence length was either too short or too long to be considered, or they were not returning to the Houston area following release.” Of these three groups, only the third avoids selection bias.

IFI participants did no better than the other groups in either two-year re-arrest or reincarceration rates. Two-year re-arrest rates were 36.2% for the IFI group, compared to 35% for the match group, 34.9% for the screened group, and 29.3% for the volunteer group. Two-year reincarceration rates were 24.3% for the IFI group, compared to 20.3% for the match group, 22.3% for the screened group, and 19.1% for the volunteer group.

It’s true that IFI graduates had lower re-arrest (17.3%) and reincarceration (8.0%) rates. But IFI’s definition of “graduation” is “quite restrictive” and includes completing 16 months in the IFI program, completing 6 months in aftercare, and holding a job and having been an active member in church for the 3 months before graduation. Inmates could be removed from the program “for disciplinary purposes,” “at the request of IFI staff,” “for medical problems,” and “at the voluntary request of the applicant.” The set of inmates who “graduated” from the program is thus tainted by self-selection (the decision to participate), selection by the program staff (the decision not to expel), and “success bias” (the decision to finish the program, which in this case even includes a post-release component).

So I conclude that, properly interpreted, we don’t have evidence that the program was a success. But (at least a portion of) the methodology was great: it used the rejected-volunteers method, which allows us to bypass the self-selection problem that plagues some of these other studies.

There are still some problems — I discuss the “resource effect” in my article; this is the problem that the rejected volunteers don’t get a comparably funded secular program, but rather get the “status quo” of whatever happens to be available in their prison. So there are still problems of comparability; if we find a positive effect overall, we can’t be sure whether the faith aspect was what worked or just the fact that these guys got resources. But, I argue, this is a less serious problem. The self-selection problem meant we didn’t even know whether the program worked. (We didn’t even know if it was better than nothing.) The resource problem means that if we get a positive effect, the program at least looks like it works relative to the status quo, and we often aren’t in the position of having comparably funded secular programs. So “better than nothing” might be the best we can hope for.

Who are the authors of this methodologically (partly) great study? Byron Johnson and David Larson.

So Johnson’s assertion that random assignment is great but we can’t do it is somewhat mystifying to me.