This is the fifth post in a series about my new article, Prison Accountability and Performance Measures, which is in the current issue of the Emory Law Journal. In Monday’s post, I introduced the issue and advocated greater use of performance measures, which I’ll come back to later this week. In Tuesday’s post, I discussed why we don’t know much about the comparative cost or quality of public vs. private prisons. In Wednesday’s post, I talked about why introducing performance measures would be a good idea. In Thursday’s post, I discussed some of the normative issues involved in what measures to choose and started on some potential concerns about, and critiques of, implementing performance measures. Today, I’ll finish up that discussion and talk about what sorts of undesirable strategic behavior performance measures might engender.

*     *     *

Undesirable Strategic Behavior

Perhaps the biggest disadvantage of using performance-based compensation is the strategic behavior it may spawn. This strategic behavior may come in several flavors. First, there is the possibility of manipulating the performance goals themselves. Second, effort may be distorted away from some dimensions and toward others. Third, effort may be distorted away from some groups of inmates and toward others. And fourth, performance measures may simply be falsified.

1. Manipulating the Goals

The Government Performance and Results Act of 1993 is one example of a recent effort to inject performance measures into government agencies that hasn’t lived up to the hopes of its supporters.

One of the problems was that setting the performance goals was left to the agencies that were to be evaluated. Agencies “tr[ied] to protect themselves by devising euphemistic performance goals in order to ensure that they [could] ‘pass’ their own grading criteria.” The Patent and Trademark Office, faced with rising backlogs, set itself progressively longer targets of “average total pendency” from year to year, rising from 27.7 months in fiscal year 2003 to 29.8 months in 2004, 31.0 months in 2005, and 31.3 months in 2006. John DiIulio had warned of a similar danger: “that measurement-driven government workers will, so to speak, ‘set up the target in order to facilitate shooting.’” The similar problem was observed in the U.K., where “Next Steps agencies,” a type of performance-based organization, set their own targets, which often reflected merely an incremental improvement rather than an assessment of what was possible.

Why would agencies set goals in such unambitious ways? Perhaps because agencies feared being punished for bad performance with budget cuts. Various politicians have indeed suggested that agencies’ funding be tied to their performance results, and agencies’ performance results have indeed been relevant to the administration’s budget proposals, so this fear may have been reasonable—though it’s also possible that performance scores have merely given political cover for cuts to programs that the administration wanted to defund for other reasons. On the other hand, the link between funding and performance results isn’t that tight, so agencies’ concern to look good may also have been a matter of good public relations.

The problem here is that agencies were allowed to think up their own performance goals; that they weren’t required to meet those goals (and indeed, that often the performance information simply wasn’t used in decisionmaking); and that the goals were binary rather than continuous outcome measures, for example, that the EPA “will ‘achieve and maintain at least 95 percent of the maximum score on readiness evaluation criteria in each region’” or “‘complete an additional 975 Superfund-lead hazardous substance removal actions.’”

These problems have easy fixes, though perhaps they weren’t so easy in the context of the GPRA, where the problem was primarily giving performance incentives to public agencies. Prison contracts—or merit pay systems for public prison wardens—should be set by the Department of Corrections or the relevant contracting authority; goals shouldn’t be set by those who we want to comply with them. No one should be “required” to meet any performance standard, but compensation should be tied to these measures; providers’ self-interest should take care of the rest. And adopting continuous outcome measures, rather than binary goals, reduces the ability to choose easy goals: one can game “achieve x% recidivism” by setting an appropriately high level of x, but it’s harder to game the general effort of reducing recidivism where additional reductions are met with additional rewards.

2. Distortion Across Dimensions of Performance

Everyone agrees that, in most areas, performance has multiple dimensions. Each dimension, in a performance-based contract, will have its price, and the relative prices of different dimensions will determine how the contractor will allocate his effort among them.

So far, so good, as long as the set of performance measures is complete. But what if some dimensions of performance are unmeasurable? Just as cost–benefit analysis is accused of slighting the soft factors, so might performance measures be biased in favor of the measurable. The result is that the contractor’s work effort will be biased in the direction of increasing the measurable dimensions of performance.

Consider a hypothetical example involving education. Suppose there are two measures of educational quality: “hard” (e.g., knowledge of facts) and “soft” (e.g., citizenship, critical thinking, socialization). Without hard accountability, it might be hard to give teachers serious incentives, so they will slack in their overall work effort, but divide their time between hard and soft education in a balanced way. With hard accountability, teachers can get much higher-powered incentives, but these incentives will tend to be skewed toward the hard measures of education. Thus, the teachers will provide more overall work effort, but their time will be skewed toward hard education.

How serious is this problem? It depends how important it is to have a balance between hard and soft factors, how hard the soft factors really are to measure, and how harmful the status quo of low work effort is. It also depends on whether the one type of education makes the other type easier or harder for the teacher; an excessively high-powered accountability system focusing, say, on standardized test scores could easily promote a “teaching to the test” strategy that can be antithetical to critical thinking (at the very least by taking up class time that could be otherwise used); this isn’t necessarily so, but it may be likely. Providing high-powered but skewed accountability may be beneficial in severely dysfunctional school systems where neither hard nor soft factors are taught well, but it may be harmful in better school systems.

Analogously, in the prison context, one can imagine two dimensions of quality: humane in-prison conditions and low recidivism after prison. Suppose one of these is harder to measure than the other. In-prison conditions could be harder to measure if effective monitoring is difficult; or perhaps recidivism is harder to measure if there aren’t good databases of offenders, especially if released inmates often commit their crimes in other states. Whichever one turns out to be less measurable, we can expect effort to be skewed toward the more measurable one.

Would it make a difference if prison policies were skewed toward humane conditions or toward reducing recidivism? If the two go together—if humane conditions are, on balance, effective at reducing recidivism—then the inability to monitor both dimensions can be harmless. On the other hand, if bad prison conditions, on balance, reduce recidivism through a general deterrent effect, a focus on recidivism could lead to bad prison conditions—in which case there’s no guarantee that high-powered accountability would improve overall quality in the absence of effective in-prison monitoring. Since the precise determinants of recidivism aren’t well understood, this shows the importance of properly monitoring whatever is considered desirable in the prison.

In the extreme case, where some tasks remain completely unmeasurable and shirking on that task is highly detrimental to overall quality, we should junk the idea of high-powered incentives: the traditional input-and-output approach may then be optimal.

If an unmeasurable outcome is represented in the accountability scheme by some inputs or outputs as proxies, the possibilities for undesirable strategic behavior multiply. The previous examples involved ignoring the unmeasurable elements and maximizing the measurable component of performance, rather than maximizing overall performance. Replacing unmeasurable elements with proxies within the provider’s direct control leads to pursuing the proxies for their own sake—which one can uncharitably call “manipulating” the proxy measures.

For example, consider recidivism rates, which I’ve been treating throughout as a true outcome measure. In reality, no one knows true recidivism rates; we don’t know that a released inmate has committed a crime unless we catch him (and, depending on the recidivism measure we’re using, unless we convict him or reincarcerate him). So in reality, rather than using the unmeasurable dimension of recidivism, we’re using the measurable proxy of, say, rearrest rates. If the relationship between rearrest rates and true recidivism is stable, using this proxy can be harmless; but more important still is that the contractor not be able to manipulate the rates in ways that don’t correspond to true social improvements.

Thus, if in-prison misconduct is penalized, corrections officers will use their discretion very differently when deciding whether to write up an offense. If urinalysis tests based on suspicion are rewarded, we can magically expect more inmates to seem suspicious. Perhaps the output (drug tests based on suspicion) seems to have a straightforward correlation with the outcome (inmate drug use, if one chooses to consider that an outcome); but make it a subject of compensation, and you can’t rely on that correlation anymore. Administrators will start pursuing the output for its own sake. (Random drug tests unrelated to suspicion remove that gaming problem, even if they are more expensive for the same level of deterrence.)

Similarly, in the context of community corrections, Joan Petersilia criticizes the use of recidivism rates as an outcome measure: if the number of arrests increases, is that bad because more people are committing offenses? Or is it good because probation officers are better at detecting technical violations and sending released offenders back to prison? If we decided that increased arrest rates were bad and attached penalties to that variable, we might find arrest rates plummeting, but merely because probation officers stopped supervising their charges very closely.

Recidivism may thus be a bad measure for the accountability of probation officers. But it can be a good measure for the accountability of prisons, provided that prisons leave supervision and rearrest to entirely separate actors. This is a reason to insist on the separation of prisons and probation officers, not granting contracts to criminal justice providers that are too integrated, and more generally preventing prisons from giving any incentives at all, even subtle ones, to probation officers. Similarly, the results of drug testing can be an acceptable measure, but random testing is better than testing based on suspicion. In-prison misconduct can be an acceptable measure, but it should be the type of serious misconduct that’s least likely to be overlooked or characterized as something else.

We might even have to guard against other kinds of gaming: if prisons can affect where prisoners are released, for instance by partnering with post-release job placement programs that have good contacts in particular areas, they can try to have prisoners released in areas where policing is weaker. For understandable political economy reasons, a state Department of Corrections might choose to ignore the welfare of people in other states and tie compensation only to an in-state measure of recidivism; then, the prison does better by finding out-of-state jobs for its inmates. A prison might also try to prevent recidivism by “paying offenders to desist,” but this might be controversial.

Of course, even if we only use performance measures to reward providers, providers will inevitably have to translate these incentives into specific input- or output-based incentives to reward their own staff, at least in part—there are limits to the possibilities of stock options. And such incentives can sometimes backfire for the same reasons that input-based incentives can backfire at the prison level. At one CCA prison in Tennessee, the employee compensation policy discouraged “use-of-force incidents.” In general, this can be positive, but sometimes not: for nine straight months, CCA personnel stopped removing mentally ill inmate Frank Horton from his cell for showers, exercise, and mental health evaluations, because any attempt to do so would have been considered a “use of force” and could have affected their bonuses or pay raises. Presumably, though, a provider motivated by good performance measures will have better incentives and better ability to monitor its own staff than the government has to monitor the provider.

3. Distortion Across Types of Inmates

One common complaint about high-powered outcome-based incentives is that they’ll lead to two related phenomena: “creaming”—only taking the easiest inmates—and “parking”—not providing services to the most difficult inmates. There’s an easy way to prevent providers from taking the easiest inmates: insist that providers take all comers, limit opportunities for providers to transfer inmates they don’t like out of the prison, and have assigning agencies not discriminate either in favor of or against particular providers in assignment. And the bias toward treating easier inmates can be alleviated by mandating particular services for everyone. There remains, though, the concern that providers will be, for instance, more enthusiastic about providing rehabilitative services to those that can more likely benefit from them.

There are two lines of response to this concern. Clearly, paying the same rate, regardless of how hard the offender is to serve, will lead to parking; one can therefore provide payments that are inmate specific, where a harder-to-serve inmate’s desistance from crime is rewarded more generously than an easier-to-serve inmate’s. These payments can be based on the observable characteristics of the inmate; some characteristics might be illegal to consider while others can be better observed by the provider than by the government, so there will inevitably be some degree of mismatch. But a system of nonuniform rewards can generally alleviate parking.

The second line of response would question whether parking is even bad. Suppose some inmates are hard to rehabilitate, so prisons—in the presence of uniform rewards—will tend to spend less time trying to rehabilitate them. Is this bad? Some nonuniformity of rewards will be inevitable—presumably a murder by a released inmate will be penalized more heavily than a minor crime. But suppose there’s a group of inmates whose recidivism is equally harmful. Wouldn’t it be socially beneficial for the provider to concentrate its resources on the ones whose crimes can be prevented most cheaply, so that more inmates can be treated at the same cost? At least, so an efficiency framework might counsel. If one subscribes to a certain form of equity where everyone should have some amount of (even ineffective) rehabilitation, one might want to fall back on the solution I mentioned above: offering higher payments for the harder-to-treat inmates or, if that can’t be done reliably, mandating some amount of inputs or outputs.

4. Falsifying Performance Measures

Finally, when high-stakes compensation depends on numbers, there’s an obvious incentive to falsify the numbers themselves. Reports of school cheating scandals are commonplace. Similarly, in the prison context, private providers plausibly prefer to underreport incidents, at least if they wouldn’t inevitably become known. Failure to report is grounds for contract termination, which can cut in the other direction, but contract termination is a strong remedy that’s rarely used. Public prisons, on the other hand, might have an incentive to overreport to get more funds but they also might have an incentive to underreport to make themselves look better compared to private prisons. Misconduct data are thus somewhat unreliable, especially if one wants to use them to compare different prisons.

Whichever way the incentives cut, the fact that compensation will inevitably be to some extent based on variables reported by the provider means that it’s important to seriously invest in monitoring. Currently, monitoring practices vary quite a lot, “from minimal attention from a centrally located contract administrator to a combination of a contract administrator and one or more on-site monitors.” The monitors themselves may have responsibility for more than one facility, which puts them on site at any particular prison once a quarter, once a week, or daily. Instead, contracts should provide for a full-time, on-site monitor with “unlimited access to the correctional facilities and assigned correctional units,” who isn’t the provider’s employee (even if the contract might mandate that the provider pay his salary as part of the deal). When prisoners are sent out of state, monitoring is more likely to be “on paper” rather than “in person”—which is one reason to keep one’s prisoners in state.

Because the capture of monitors is an enduring concern, other forms of monitoring are possible: a public-interest group could be given inspection rights, the surrounding community might be designated as a third-party beneficiary, or the constitutional tort regime for prisons could be strengthened (rather than weakened, which is the current trend).

A strong disclosure regime is also probably a good idea.

One way of guaranteeing disclosure is to subject private prisons under contract with the federal government to the Freedom of Information Act, perhaps along the lines of the often-proposed Private Prison Information Act. Private prison firms themselves aren’t “agencies” for the purposes of FOIA, and the Bureau of Prisons isn’t covered if it hasn’t “created and retained” or doesn’t actually possess the documents. Even after these hurdles, much qualifying information, like contracts or incident reports, would be exempt under Exemption 4, which protects “trade secrets and commercial or financial information . . . [that is] privileged or confidential.” Exemption 4 could be applied either if “disclosure could impair the reliability of data,” or if “disclosure would cause substantial competitive injury to the provider.” The competitive injury justification could be fairly broad—knowing the terms of a contract, for instance, can reveal the terms of the winning proposal to the winning firm’s competitors. Indeed, FOIA has been criticized as “a lawful tool of industrial espionage.” On the other hand, says Cásarez, FOIA provides for the disclosure of “reasonably segregable portion[s]” of documents, which “should include monitoring and reporting requirements.” Logan counsels against “saddl[ing] private prison operators with expensive monitoring requirements ‘far beyond those that exist for government prisons,’” but FOIA applicability would cut in the direction of establishing parity.

Similar legislative fixes are possible in the states: for instance, in Florida and Georgia, open records acts “already apply to private organizations that act on behalf of state agencies.” All of this (as well as any relevant public-law value) could also be imposed on private contractors by contract; Jody Freeman calls this process “publicization” (pronounced “public-ization”, with a hard “c”).

Another possibility is to ensure access to the prison by the public and the press. Bentham, who had smart things to say about the bidding process two centuries ago, also argued for “essentially unrestricted public access” to (private) facilities. His prison design

enables the whole establishment to be inspected almost at a view, it would be my study to render it a spectacle, as persons of all classes would, in the way of amusement, be curious to partake of: and that not only on Sundays at the time of Divine service, but on ordinary days at meal times or times of work: providing therefore a system of inspection, universal, free, and gratuitous, the most effectual and permanent securities against abuse.

I don’t want to endorse watching prisoners as a source of amusement (and public access raises serious security and access-to-contraband issues), but the idea of at least some public access does seem to have some advantages in terms of accountability.


The failure of the comparative effectiveness studies, therefore, is completely understandable. Aside from the methodological problems, it’s quite plausible that the results of prison privatization have been inconclusive because the changes in prison management that would lead to better performance are often neither permitted nor rewarded.

Using performance measures would change this by helping us do valid comparative studies, enabling the fair public-private competitions that are a hallmark of competitive neutrality, and pushing policymakers to clearly formulate what we want out of prisons. Using performance measures directly to drive compensation has the potential to radically alter prison outcomes by rewarding good performance and penalizing bad performance; this definitely has applicability for private prisons but could possibly be used for public prison wardens as well.

The critiques are serious, but I don’t believe they undermine the experiment too seriously.

The information necessary to calculate the True Social Values in an efficiency framework may never be available, but we can approach the exercise with an air of humility, seeking only to improve incentives at the margins, not to achieve optimal social engineering.

The use of market incentives probably won’t alter the public-interestedness of those who work at private prison firms, but it might alter the mix of people who choose to work in the public sector; on the other hand, combined with social impact bonds, performance-based compensation can also spur the growth of nonprofit providers. Because small firms and nonprofits are particularly sensitive to risk, the incentives should only be moderately high-powered, to trade off incentives and risk tolerance.

Performance-based compensation will give rise to certain possibly undesirable strategic behavior. If providers can set their own goals, they’ll be inclined to set them in ways that are easy to meet; this is why providers shouldn’t set the goals at all, and in any event compensation should be based on the level of a continuous variable, not a binary goal. If some dimensions of quality are hard to measure, performance-based compensation will bias providers’ effort toward the more measurable aspects of performance; this means that some reliance on inputs and outputs will still be necessary, having due regard for the need to avoid choosing measures that can be easily and undesirably manipulated by providers. Compensation schemes might lead providers to concentrate on treating certain inmates and neglect others; even if this is bad (which isn’t clear), the problem can be alleviated by inmate-specific rewards. Finally, the levels of the measures themselves can be falsified, which points to the need for serious investments in monitoring and robust disclosure regimes.

These concerns are real, but the lesson to take from them is that more experimentation is required to see how much of a real-world effect they have and to what degree they really vitiate the promise of performance incentives. The status quo, where the level of experimentation is close to zero, is unlikely to be optimal.