On my last night of residency, a friend and I swapped our most memorable stories. We recalled the nights of panic and triumph caring for dozens of critically ill patients, about the day — okay, days — we fell asleep standing up, about the weeks subsisting on saltine crackers and ginger ale.
The conversation turned to a legendary co-resident: He made diagnoses others hadn’t even heard of. He rushed patients to lifesaving procedures at the faintest clinical change. He cajoled specialists to see his patients in the middle of the night.
We estimated he’d tallied more years of life saved than anyone else in our program — far beyond what other doctors could have mustered in the same position. He excelled in the medical equivalent of baseball’s “wins above replacement”: the number of victories that a player helps achieve beyond what would be expected from a substitute.
This is, I think, much of what patients want to know when choosing a doctor: Will my diabetes, heart condition or hip replacement be better managed with this doctor or that one? It’s also what insurers and governments have tried to measure and incentivize through legions of quality metrics.
The analogy is attractive but flawed: What constitutes a win in medicine — and who’s responsible for it — is often unclear. Patients want longer lives, but they also want healthier lives, care that is more compassionate and more convenient, and judicious use of tests and treatments. Doctors work in teams as just parts of an increasingly complex health system, and they are often less in control of outcomes than we think, especially when treating patients with challenging social circumstances and coexisting medical conditions.
Measurement techniques may grow increasingly sophisticated, but doctors are not ballplayers, and health-care statistics are not as simple as a batting average.
That hasn’t stopped public and private payers from using pay-for-performance incentives to try to improve how doctors care for patients. The most notable recent example is Medicare’s Value-Based Payment Modifier, or VM, which measures the quality and cost of care and offers bonuses or imposes penalties accordingly. How often were patients readmitted to the hospital? What percent had their cholesterol checked? Did it cost a doctor more or less to care for his or her patients than it cost other doctors to care for theirs? While the VM program ends this year, it’s similar to Medicare’s next attempt at pay-for-performance, the Merit-based Incentive Payment System, or MIPS.
Research suggests, however, that the VM program had no benefit in improving care or reducing costs, and the program’s failure adds to a body of evidence finding that financial incentives generally have not been shown to improve patient outcomes. An extensive effort in Britain to pay doctors to better manage hypertension, for example, found no improvement in blood pressure, strokes or heart attacks — and care for conditions not linked to bonuses may have gotten worse. In the United States, quality measurement places considerable burdens on physician practices, which spend more than 15 hours per doctor per week and $40,000 per doctor per year on such reporting.
Patients pay a price, too.
Financial incentives can encourage doctors to avoid sick or socially disadvantaged patients, who are harder to care for and who may negatively affect their quality ratings. Even within the same health system, doctors who care for more underinsured, minority and non-English-speaking patients have lower rankings. But many payment programs — including VM and MIPS — don’t adjust for illness severity or socioeconomic status. So if I’m trying to pad my stats, it makes sense to gerrymander my patient panel into the richest, healthiest, most-educated panel possible.
On a deeper level, it’s important to ask what these measures are really measuring. Doctors perform thousands of cognitively complex functions every day, but most payment programs evaluate a handful of basic process measures that say more about the systems they work in than the care they personally provide. Primary-care physicians, for example, manage 400 conditions every year requiring broad diagnostic and treatment expertise, but they may be judged primarily on whether patients get mammograms and flu shots.
The question answered, then, is not whether I’m a good doctor but whether I work in a system organized enough to check boxes. Sometimes they’re important boxes, no doubt, but they’re boxes nonetheless — and they capture a tiny fraction of what doctors do.
Some measurement, however imperfect, is important to keep doctors honest and patients informed. One step may be to solicit more physician input into the measures they feel most accurately capture the value of their care. Another is to ensure that all measures are carefully adjusted for patients’ medical and social complexity.
While doctors are generally skeptical of online reviews, such feedback will almost certainly be part of the solution. Already, more than three-quarters of patients use online reviews to find a new doctor, though concerns persist about their accuracy and representativeness. (They’re not always written by the patient receiving care, for example.)
A program at the University of Utah offers a possible path forward. In 2012, the university’s health system became the first major academic center to release patient-satisfaction data and unedited comments to the public. Reviews are presented in a consumer-friendly format — similar to Yelp or TripAdvisor. More than two dozen other health systems — including Stanford, the Cleveland Clinic, the University of Pittsburgh and Duke — have followed suit.
Some worry that this approach overemphasizes patient satisfaction and undervalues objective outcomes, but research suggests that good patient experience is often correlated with better quality.
More generally, while hospital- and health-system-level evaluations also have problems, there’s some evidence to suggest that measuring quality at the organizational level is more feasible and reliable than it is for individual physicians. It may be more effective for payers to experiment with measurement and payment reforms for health organizations — bundled payments, per capita payments, funding for behavioral health and social support — and allow organizational leaders to review doctor performance internally, which is how some leading health systems already assess and compensate their clinicians.
Despite prolonged and costly attempts, insurers and governments have not been able to accurately measure a doctor’s value and won’t be able to anytime soon. No currently available measures, for example, would have captured my co-resident’s worth as a physician. And yet, every doctor in our program knew of his worth, and every patient he cared for benefited.
Khullar is a physician at NewYork-Presbyterian Hospital, a researcher at the Weill Cornell Department of Healthcare Policy and Research, and director of policy dissemination at the Physicians Foundation Center for Physician Practice and Leadership. Follow him on Twitter: @DhruvKhullar