My Washington Post colleague Spencer Hsu continues his great reporting on the continuing crisis in the world of forensic science. Over the weekend, Hsu broke the story that Justice Department officials now concede that “an elite FBI forensic unit gave flawed testimony in almost all trials in which they offered evidence against criminal defendants over more than a two-decade period before 2000.”
This is far from the first such story. It isn’t even the first such story from the FBI’s crime lab, long considered one of the most elite labs in the world. In fact, over the last several years there seems to be a new crime lab in crisis about once a month. A big part of the problem is misplaced incentives. A couple years ago I reported on a study which found that in many states, crime lab analysts are actually paid per conviction. And few years before that study came out, one of its authors — Roger Kopple of Fairleigh Dickinson University — and I wrote a piece about how we could institute some meaningful reforms to get those incentives pointing in the right direction.
But Hsu’s story seems like a good opportunity to post a bit of a history of forensics that I wrote a few months ago. This was initially part of my four-part series on the use of bite mark evidence in the courts. We cut it because the series was already pretty long. But I think it provides some useful context for these alarming stories we’re seeing today.
A quick history of forensic science
In 1911, prosecutors for the state of Illinois won a murder conviction against Thomas Jennings. They did so by convincing a jury that Jennings’s fingerprint matched the fingerprint left on a freshly painted window sill at the house where the victim was killed. By that time, fingerprint matching had been used in Europe for a few decades. It had been introduced to U.S. law enforcement officials by Scotland Yard officials at the 1904 World’s Fair in St. Louis. The Illinois Supreme Court would later uphold Jennings’s conviction, ruling that “the [fingerpint] evidence in question does not come within the common experience of all men of common education in the ordinary walks of life.” Therefore, the justices found, “the court and jury were properly aided by witnesses of peculiar and specialized experience on this subject.”
It’s somewhat appropriate that modern forensics would have been introduced to America at that World’s Fair in St. Louis. The early 20th century saw a wave of innovation, reform and social upheaval, along with some exciting new technology. Mass electrification was underway. The dawn of what would become the American Century saw landmark advances in science and discovery, and the 1904 exposition was an early and prescient celebration of American achievement, both heralding the advances that had already occurred and anticipating those to come.
But such celebrations of American exceptionalism could quickly bleed into chauvinism about American superiority, ugly demonstrations of alleged racial dominance and the championing of crank theories too easily passed off as science. Among the fair’s many exhibits, for example, were disturbing living dioramas of native “savages” collected from all over the globe — basically a human zoo. Paired with the exhibits celebrating America’s technological advances, the dioramas were intended to show that America was ascending due to a sort of evolutionary superiority. There were also exhibits on pseudo-sciences like phrenology and physiognomy, which posited that trained experts could make broad generalizations about intelligence, criminal proclivity and morality based on physical characteristics like skull size, skull shape and the sizes and relative ratio of body parts.
Perhaps the best example of how easily good science could quickly go terribly wrong was Sir Francis Galton, a Victorian-era statistician, mathematician and meteorologist who is also considered the father of modern fingerprint identification. Galton’s interest in fingerprints was inspired by his admiration of the work of Alphonse Bertillon, a Paris police officer who pioneered the use of anthropometry — taking and recording careful measurements of body parts for the purpose of identification. Bertillon’s methodology was sound, and it vastly improved identification of suspects and convicts and helped law enforcement officials identify repeat offenders. By the end of the 19th century, it had been adopted by police agencies across the U.S. and Europe. It was one of the first examples of scientific classification in law enforcement.
But Galton’s enthusiasm for anthropometry not only inspired his work on fingerprint analysis, he soon came to believe that certain physical traits were indicative and predictive of criminality, intelligence, virtue, morality and other traits. This belief that people of a certain nose size, skull shape, or skin tone were inherently more criminal, immoral, or less intelligent quickly took Galton to the ugly places one would expect it might. In his autobiography, he advocated for the forced sterilization of entire groups of people. “Stern compulsion ought to be exerted to prevent the free propagation of the stock of those who are seriously afflicted by lunacy, feeble-mindedness, habitual criminality, and pauperism,” he wrote. In fact, the father of modern fingerprint analysis not only became a champion of the ugly field we now know as eugenics, he actually coined the term.
This embrace of science and charlatanism was also present in the Progressives, the ascendent political movement of the early 20th century. Progressive reformers saw themselves as champions of empiricism and intellectualism. They sought to replace the corruption, cronyism and patronage they saw in politics and public service with expertise, altruism and virtue. But the Progressives too could sometimes let their enthusiasm lead them astray. The desire to build a better society often included the advocacy of immigration controls, the sterilization of “undesirables” and policy prescriptions based on broad generalizations about entire racial and ethnic groups.
American police departments were often at the center of these debates. In his book “Popular Justice,” criminologist and historian Samuel Walker notes that in the early 20th century, policing in urban America was largely controlled by political machines. A police officer was an appointed, patronage position. The Progressives sought to professionalize law enforcement by transforming it from a temporary perk into a career. The reforms went a long way to rid policing of corruption and patronage, but they also wanted police entrusted with more paternalistic responsibilities — basically, to enforce virtue on immigrant populations whom progressive leaders thought lacked morality, discipline and industriousness.
It’s within this movement toward professionalism that we see the birth of modern forensics. Pioneers of modern policing like Berkeley, Calif. Police Chief August Vollmer emphasized standardization, the adaptation of new technology and specialization with law enforcement agencies in areas like homicide investigation, narcotics investigation and vice units. Forensics was also an area of specialization. In fact, Volkmer is credited with inventing the crime lab.
The professionalism movement promoted a more analytical approach to fighting crime, but as with other segments of American society at the time, the rush to embrace new theories and new technologies also opened the door to charlatans, hucksters and frauds. U.S. courts were now faced with the challenge of how to distinguish expertise from artifice. “Peculiar and specialized experience” could be useful, as the Illinois Supreme Court put it in People v. Jennings, but that couldn’t be the only standard. Someone could have peculiar and specialized knowledge in Tarot card reading, for example, but it wouldn’t be appropriate to let that person testify in court.
“The standard at the time was that if someone had specialized knowledge, and that knowledge seemed to be helpful to investigators, then the court would allow the testimony,” says Jonathan Koehler, a behavioral scientist and law professor at Northwestern University. “The problem was that there was no attempt to check the validity of what these witnesses were actually claiming.”
And the problem with that is that most forensic disciplines weren’t invented in labs, then subjected to peer review in scientific journals. Instead, most were invented by people in law enforcement, not in the quest for knowledge, but as an aide to help them solve crimes. Scientists within the same field have strong incentives to poke holes in others’ theories, to find flaws in a peer’s experiments. This isn’t the case in forensics. A fingerprint analyst testifying for the defense might disagree with a fingerprint analyst for the prosecution, but he isn’t going to call into question the premises on which the entire field of fingerprint analysis is based. He’d be undermining his own legitimacy. It was only after the onset of DNA testing, which did come from the world of science, that we began to understand just how profound these divergent incentives really are.
Twelve years after Jennings, the first federal appeals court took a first stab at setting some standards in expert testimony. As I wrote in part two of the bite-mark series, “In the 1923 case Frye v. United States, the U.S. Court of Appeals for the D.C. Circuit rejected testimony from a polygraph instructor who claimed that a rise systolic blood pressure indicated that a suspect was lying. The appeals court ruled that in order to be admissible in federal court, scientific evidence or testimony must have ‘gained general acceptance in the particular field in which it belongs.'” Though it wasn’t a Supreme Court decision, Frye was soon adopted by other federal circuits and eventually by most of the states. “That put some teeth in the law,” Koehler says. “It looked beyond a witness’ qualifications to evaluate the content of the witness’ testimony.”
But the decision had an ancillary effect that ended up being much more important than its holding: It made judges the gatekeepers of scientific evidence.
“Judges have no scientific training,” says Michael Saks, a law professor at Arizona State University. “They’re trained in legal analysis, not scientific analysis. The fundamental problem with forensics and the criminal justice system is that legal thinking and scientific thinking just aren’t compatible.”
Remarkably, the U.S. Supreme Court didn’t weight in on any of this until 1993, with three rulings collectively called the Daubert cases. According to Daubert, judges must consider two criteria: the relevance of expert testimony and if the testimony itself is reliable. Under Daubert, an opposing attorney can request a hearing in which the judge will rule on the admissibility of scientific evidence, based on factors such as whether the claims are testable, whether the conclusions on offer are subject to peer review, whether the methods are governed by standards and protocol, the error rate of those methods and whether a witness’s general testimony has been accepted within a particular scientific community. In Kumho Tire v. Carmichael, the court also applied this new standard to all expert testimony, not just testimony explicitly claiming to be “scientific.”
Daubert is now the law in federal court and in most states. But neither standard has done much to keep bad science out of criminal trials. In the 1970s, for example, the FBI tinkered with “voice printing”: the idea that every human voice gives off distinctive patterns that make them uniquely identifiable and that these patterns can be measured and quantified. After critics began to raise questions about the science behind the methodology, the FBI asked the National Academy of Sciences to create a working group to investigate. The NAS group concluded that the methodology wasn’t grounded in sound science. The FBI dropped voice printing, but not until it had already been admitted as evidence in dozens of courts across the country.
One notorious area of junk forensic science to come under scrutiny in recent years is arson investigation. In a landmark 2009 New Yorker investigation, investigative journalist David Grann delved into the peculiar professional culture of arson investigators. Many, Grann explained, had only a high-school education, and learned the trade on the job from “old-timers.” As early as 1977, a study had warned that there was no scientific research to validate the field’s dominant theories about the allegedly foolproof signs of arson. Yet the courts continued to allow those theories to be heard by juries, producing countless convictions. It’s only in recent years that some of those guilty sentences have been revisited. Many wrongful convictions are just now being discovered by journalists and legal advocacy groups. Many may never be.
One of those cases was Cameron Todd Willingham, the subject of Grann’s story who was later executed by the state of Texas. In his investigation, Grann made a persuasive case that Willingham was innocent. One longtime critic of arson forensics, Gerald Hurst, told Grann, “People investigated fire largely with a flat-earth approach. It looks like arson — therefore, it’s arson. My view is you have to have a scientific basis. Otherwise, it’s no different than witch-hunting.”
And the examples of dubious science keep coming:
- In 2003, an NAS research group exposed a long-held FBI theory that every batch of gun ammunition ever made carries its own unique chemical fingerprint. The FBI had long maintained that this signature meant an analyst could definitively state that a bullet found at a crime scene could only have come from the box of ammunition found in the suspect’s home. Again, the courts had accepted the theory without any scientific research to back it up. It took years for the science to disprove it, and only after it had been used to convict countless people all over the country.
- A more recent Washington Post investigation identified more than 2,500 cases in which convictions may have been tainted by exaggerations made by hair and fiber analysts at the FBI — and the dozens of state and city crime lab employees across the country that those analysts have trained.
- Over the last several years, the “shaken baby syndrome” diagnosis has come under fire as researchers have cast doubt on its underlying assumption — that the presence of three symptoms in a deceased infant or toddler could only be caused by violent shaking.
Other forensic specialties that haven’t held up to scientific scrutiny include “ear print” matching, footprint patching and blood splatter analysis. Even fingerprinting now has its critics, particularly after the false match of a partial print led to the wrongful arrest of Oregon attorney Brandon Mayfield for the 2004 train bombings in Madrid, Spain.
Only a few critics would argue that none of these methods of analysis have any probative value at all. What most argue is that there has never been any effort to determine in a scientific manner precisely what that value is. Koehler says that’s because while science is a search for truth, many forensic disciplines were developed as a means to an end: helping police solve crimes. “Once a new field has been accepted by the courts, there’s no incentive for the practitioners in that field to subject themselves to scientific testing,” he says. “Why would they? They can only lose.”
Michael Saks says it all goes back to asking judges to be the gatekeepers of science. He says the main reason that unproven but scientific-sounding claims get into court is that we’re still asking judges to referee good science from bad, and judges just aren’t very good at that. They aren’t trained as scientists, and we shouldn’t ask them to play scientists, especially when lives are on the line.
Saks suggests a sort of national forensics panel that would evaluate new and existing forensic specialties and decide which have sufficient scientific support to be allowed in the courtroom. “We need to move outside the courts,” he says. “Look at these forensic areas that even the government now admits have been discredited. Bullet lead composition, voice print analysis and so on. The courts had been letting this stuff in for years. It took declarations from the scientific community to put at an end to it. What does that tell us? It tells us that these decision shouldn’t be made by judges.”
In science, theories are revised, abandoned and corrected all the time. Wrongness is a critical part of scientific inquiry. Scientists can be and often are wrong while maintaining their professional credibility. Being wrong in forensics is another matter. If a method of analyzing blood or carpet fibers is shown to be faulty, everyone who uses that method is tainted as a witness. That means there is a huge incentive for an analyst to be extremely protective of his field.
Moreover, while science is a collaborative endeavor, the justice system is adversarial. Scientists work together toward the pursuit of knowledge. Forensic analysts often give mutually exclusive testimony.
“Disagreements between scientists operate at a higher level,” says Saks. “Everyone has the same goal. In court, the goal of a forensic analyst is to convince the jury that you’re right and the other guy is wrong. So it becomes less about content, and more about discrediting the other expert.”
The most important trait to be an effective expert witness, then, isn’t sound and careful analysis, but the ability to be persuasive. And the less a particular forensic specialty relies on science, the more important it is to be persuasive.
Bite mark matching is an excellent example. As part of its credentialing exam, the American Board of Forensic Odontology asks test takers to match bite marks to the dental mold of the person who created them. But it’s notable that the test doesn’t require actually matching the correct bite marks to the correct biter. Instead, test takers are evaluated on their analysis.
This came out during the 2011 Washington D.C. murder trial of Roderick Ridley. Michael Bowers, the Ventura Co., California deputy medical examiner and critic of bite-mark matching whom I profiled in my series, discovered that bite mark analyst and prosecution witness Dr. Douglas Arendt improperly used crime scene photos from the case in an ABFO credentialing exam. Four students had taken the exam. The problem: Ridley had yet to be tried.
That in itself was troubling. But more interesting were the answers the test takers gave. Arendt himself had initially concluded from the bite marks that Ridley was the biter. He later changed his own opinion to say that Ridley was merely “included” as the possible biter. According to Arendt’s testimony in a hearing for the trial, of the the four students who took the exam, three stated that there was inconclusive data to form an opinion, and a fourth said that Ridley couldn’t be included. All four still passed the exam, despite the fact that none of them reached the same conclusion as the instructor who gave it.
When all of this was revealed, prosecutors in the case (at the request of the ABFO) attempted to block defense attorneys from seeing the exam. In a letter to Ridley’s attorneys arguing to keep the exam confidential, two former ABFO presidents explained that actual results of the test weren’t relevant. “No ‘right’ or ‘wrong’ conclusions existed,” they wrote. “The candidates were not graded on the absolute ‘correctness’ of their opinions but rather on the processes they utilized to reach those conclusions.”
In other words, to get credentialed as a bite mark specialist by the ABFO, it isn’t important that a candidate be accurate, only that he sound accurate. A current member of the ABFO board of directors confirmed in an interview for my recent series that this is still how the certification exam is given.
Once the courts let bad forensics in, it becomes extremely difficult to overturn a conviction when science later calls the forensic analysis into doubt. That’s in part because it can take decades for good science to catch up to bad. But it’s also because once a verdict has been issued, the criminal justice system’s priorities change. A guilty verdict does away with the “presumption of innocence”; the system switches to a premium on finality.
“Appeals courts look at process, not at the correctness of the verdict,” Koehler says. So long as the defendant had a chance to cross-examine an expert witness, or to put his own expert witness on the stand, appellate courts are contented that the jury had the chance to hear “both sides,” even if one side is later shown — or even was known at the time — to have been without scientific merit.
This is why it often takes DNA evidence to convince an appeals court to reopen a case in post-conviction. But DNA testing is only dispositive of guilt in a small number of cases. Without DNA testing, the bar to a new trial is high, even in cases that turned on testimony from fields that have been discredited by DNA testing in other cases — in fact, even in cases involving the same expert witnesses who have been discredited in other cases.
Jennifer Mnookin, a law professor at UCLA who specializes in scientific evidence, is currently heading up a research team funded by the National Institute of Justice that is evaluating the scientific merit of fingerprint matching. In an Los Angeles Times op-ed, Mnookin expained that real science “deals in probabilities, not certainty.” Yet fingerprint analysts, bite mark analysts and other forensic specialists routinely testify about their certainty.
Mnookin notes that the one area of forensic science in which you will see experts testifying about probability is DNA Testing. Not coincidentally, DNA testing is one area of forensics that was born and developed in the scientific community, rather than a police lab. DNA probabilities are usually extreme — error rates are often stated as one in a billion or more — but that’s because we have precise knowledge of how DNA markers are distributed across the human population. We can calculate the numbers precisely because the numbers are calculable. With pattern matching disciplines like fingerprints or hair fiber analysis, we don’t really know how the distinguishing characteristics are distributed across the population.
With bite marks, we know even less: We don’t even know if there are any distinguishing markers. There has yet to be any scientific research to support the notion that the marks we make when we bite with our teeth our unique. But even if we could somehow know that they are, we still wouldn’t know how those unique characteristics are distributed across all of humanity. And even if we knew those things, we still don’t know if human skin is capable of recording and preserving a bite in a way that would allow those markers to be identified. These are the questions at the heart of the bite mark debate and of the use of forensic analysis in general. They’re questions that should have been answered long before this evidence was ever put before a jury for the first time. Instead, the scientific community is just now starting to answer them, decades later. But answering them only matters if those answers are taken to heart by the courts. So far, the courts haven’t shown much interest.
Perhaps it’s time to reconsider whether judges should be the ones making these decisions in the first place.