The Washington PostDemocracy Dies in Darkness

A secret algorithm is transforming DNA evidence. This defendant could be the first to scrutinize it.

An Exxon station in Annandale, Va., was the scene of an armed robbery in 2014. (Justin Jouvenal/The Washington Post)

The Exxon clerk never got a good look at the assailants who robbed him at gunpoint in Fairfax County, so investigators hoped to bolster their case with the smallest of clues: the minuscule number of skin cells one perpetrator left behind when he grabbed the victim’s shirt.

Crime labs that have long pulled DNA from blood or semen have been pushing the frontiers of forensics by teasing genetic material from ever tinier and more challenging samples, such as sneaker sweat. But this time, the Virginia crime lab could not make a match because of a ubiquitous problem: DNA from too many people was on the shirt.

So authorities turned to advanced software that its creator promises will sort complex DNA mixtures much like a prism breaks down white light. The software doesn’t provide a direct “match” but assesses the probability that a suspect’s DNA is included in the sample. In the Fairfax case, police had a tip about one suspected robber’s name, and the software backed it up.

TrueAllele is reshaping DNA analysis, providing key evidence in thousands of homicides, rapes and other crimes like the armed robbery in Virginia in which genetic material was too complex to interpret. But that comes with a major caveat: Not a single prosecutor, government crime lab or defendant has had a meaningful look at how it works.

Its maker, Cybergenetics, says the software code is a trade secret that it has spent decades and millions of dollars developing. Defendants nationwide have for years fruitlessly waged legal battles to access it, arguing that they have a constitutional right to examine the evidence against them for potential errors.

But now, this little-noticed Fairfax County case — along with another in Pennsylvania — could become the first in the nation in which defense experts are allowed to peer inside.

Judges in these cases have sided with defense requests to see TrueAllele’s code, raising hopes among advocates that a sea change may be underway in how courts handle complex algorithmic evidence.

If the reviews go forward, they will help determine the future of DNA evidence, how courts handle cutting-edge technology and perhaps the most crucial question of all: Is TrueAllele a quantum leap in forensics or another in a line of flawed tools, such as bite-mark matching and hair analysis, that have resulted in wrongful convictions in recent decades?

A New York appellate judge wrote in one case that the debate about whether TrueAllele’s source code should be disclosed touches on novel issues that are likely to intensify in coming years.

“This argument raises legitimate and substantial questions concerning due process as impacted by cutting-edge science,” Justice Stanley L. Pritzker said. “Given the exponential growth of technologies such as artificial intelligence, to embrace the future we must assess, and perhaps reassess, the constitutional requirements of due process that arise where law and modern science collide.”

An unsolved robbery gets a boost

The robber stepped behind the counter of an Exxon station in Fairfax County on Nov. 30, 2014, and grabbed the clerk by the shirt, according to a police report. “Get on the ground,” the robber told him and then put what the clerk thought was a gun to his head.

A second perpetrator pulled $475 from the cash register before the first ripped the gas station’s phone from the counter and smashed it. The men fled with the clerk’s cellphone.

The crime lasted less than a minute, and it would take years to generate a lead.

Court documents say the robbery was captured on surveillance video, but the suspects’ faces are not seen in full. The victim’s stolen iPhone could not be tracked. And the initial DNA test on the shirt yielded no results.

It wasn’t until 2018 that a federal convict identified a D.C. man named Clark Watson as his co-conspirator in the heist in Annandale, Va., court records show. At the time, the convict was seeking a reduction of his 30-year sentence.

Watson was charged with robbery and gun counts.

He had prior convictions for robbery, drug dealing and other crimes, but he told authorities that he “was never in Fairfax County ever,” according to a police report. Bryan Kennedy, a public defender in the county, said his client maintains his innocence in the robbery.

A fresh DNA test on the victim’s shirt in 2019 revealed that there was not enough genetic material at eight of 24 locations used to make a DNA match and that the piece of clothing contained at least three DNA profiles.

Such re-examinations are common, as investigators know that advances in forensics may turn up new clues.

When genetic profiling began in the 1980s, forensic laboratories needed a blood or semen sample the size of a quarter to test, experts said. These samples usually contained one — or maybe two — DNA profiles, so it was relatively easy to make a match with a suspect’s profile. Such matches remain the gold standard of evidence in criminal trials.

The unlikely crime-fighter cracking decades-old murders? A genealogist.

But as tests became more sensitive and the ability to amplify tiny amounts of DNA became possible, investigators broadened their search for trace amounts of DNA. Today, experts said, that includes touch DNA, the genetic material a suspect leaves behind in skin cells on a gun barrel, steering wheel or clerk’s shirt.

This DNA is often degraded, present in low amounts or mixed with other DNA profiles because it comes from surfaces that are frequently handled. Samples are also easily contaminated.

In one famous case, European authorities thought a serial killer might be at work after finding a similar pattern of female DNA at crime scenes in multiple countries. It turned out that the swabs used to take samples contained genetic material from the women who made them at a factory.

Amid this genetic murkiness, DNA matching is often no longer possible.

Jeanna Matthews, a computer science professor at Clarkson University who has studied TrueAllele and other “probabilistic genotyping” programs, compared the challenge of making sense of samples with multiple DNA profiles to radios playing over one another.

“What if I have one song playing loud and another song playing loud and then three songs kind of quiet and asked you to name one of the quiet ones?” Matthews said. “That’s a lot harder.”

Enter TrueAllele.

The software attempts to unmix the mixture. In the Fairfax robbery case as in others, the software looked at the jumble of genetic material from multiple people and proposed tens of thousands of possible individual DNA profiles, said Mark Perlin, TrueAllele’s inventor and co-founder of Cybergenetics.

The proposed profiles that best fit the evidence sample have the highest probability of being accurate, Perlin explained. Those proposed profiles are then compared to the suspect’s profile to assess similarities.

The end result is a “likelihood ratio” that expresses the chance the suspect’s DNA is in the evidence sample, relative to a random person in the population.

The result is not as definitive as a match but can still be compelling. TrueAllele concluded that the DNA on the clerk’s shirt was 180 quadrillion times more likely to come from Watson than another African American, according to court documents.

TrueAllele was first used in a criminal trial in 2009, but it has exploded in popularity over the past decade. Perlin said Cybergenetics has directly worked on nearly 1,000 cases and leased its software to 10 crime labs, including some in Virginia and Baltimore. It has also been joined by competitors in the field such as STRmix.

Perlin said that nearly 30 courts have deemed the software admissible and that about 40 crime labs nationwide have studied and approved the software for use without looking at the source code. That includes eight studies that have appeared in peer-reviewed journals, seven of which Perlin co-wrote.

Perlin said the science has been “tested and shown to be reliable” and pointed out that TrueAllele has recently been used to help exonerate some falsely accused defendants. Some other experts in DNA analysis agree with him.

“Probabilistic genotyping software is a game changer for forensic DNA testing,” Michael Coble, an associate professor of genetics at the University of North Texas, wrote in an email. “It has . . . produced results in a wide range of cases involving violent crime, sexual assault, and other criminal cases that just a few years ago would have been considered too complex to interpret.”

A quest for the code

But as the public defender dug into the case against Watson and TrueAllele, a fundamental question began to take shape: How much did Virginia really know about the software it was using to build a case against his client?

Kennedy issued a subpoena to the state’s Department of Forensic Science (DFS) and traveled to its lab in Richmond in 2019 to find out.

DFS is among the crime labs that have studied TrueAllele and found that it accurately identified donors and excluded non-donors by running test samples through the program. Nevertheless, Kennedy did not find the complicated mathematical formulas that determine TrueAllele’s results or computer code among the materials at the lab, according to a court filing.

In fact, a judge at a previous trial in Virginia asked a DFS scientist trained on TrueAllele if she could independently reproduce the results of the program that sits at an advanced intersection of forensics, statistics and computer programming. Lisa Schiermeier-Wood replied: “It would take me years to try, and I don’t know that I could do it.”

Schiermeier-Wood went on to testify that she wouldn’t be able to detect low-level errors in TrueAllele’s analysis either. DFS said it could not comment on Watson’s case and sent its studies of the software in response to questions about TrueAllele.

Kennedy wrote in court filings that it would be impossible to assess whether TrueAllele had correctly identified Watson as the likely perpetrator without the program’s source code and other materials.

He pointed out in filings that errors in STRmix’s code, which has been disclosed in some cases, had thrown off results in cases in Australia and New Zealand. (STRmix did not respond to a request for comment but called the errors so minor that they did not change the outcomes of analyses.) Likewise, when New York City was forced to disclose the source code of its own probabilistic genotyping program, a defense expert found a bug that tended to overestimate a defendant’s likelihood of guilt.

A draft report by the National Institute of Standards and Technology posted online in June found that “there is not enough publicly available data to enable an external and independent assessment of the degree of reliability of DNA mixture interpretation practices, including the use of probabilistic genotyping software (PGS) systems.”

Kennedy said in an interview that “it’s a new evolving technology.” He added: “We shouldn’t be using the criminal justice system as a proving ground for new technologies, especially when the makers of these technologies are keeping how they work secret.”

Cybergenetics responded to Kennedy’s subpoena for TrueAllele’s source code by telling him that a defense expert could review it, but that it might take 8½ years to plow through its 170,000 lines that contain dense mathematics, according to court documents in the case. It also sent Kennedy its standard nondisclosure agreement, which also was filed in court.

To review the code, it required Watson to pay $15,000, according to defense filings. Kennedy and the defense expert would also have to obtain $1 million in liability insurance, agree to take only handwritten notes and travel to the company’s Pittsburgh headquarters for the review, among other restrictions. All told, Kennedy estimated that it would cost at least $50,000 to comply with the nondisclosure agreement, which also might bar his expert witness from testifying at trial.

In July 2020, Kennedy asked a Fairfax County judge to throw out the TrueAllele evidence, arguing that the nondisclosure agreement was so onerous for his indigent client that it effectively barred him from his constitutional rights to review the evidence against him. Cybergenetics said in a later filing that it was open to negotiating the terms of the review and waiving fees.

Groups including the American Civil Liberties Union and the Electronic Frontier Foundation have made similar points in other cases where Cybergenetics has fought granting access to its code, saying defendants are being accused by “secret evidence.” They say corporate interests should not trump the rights of defendants potentially facing the loss of their freedom.

Those arguments largely have been ignored in recent years. Nearly 20 courts have rejected defense efforts to gain unfettered access to TrueAllele’s source code. Perlin said that’s as it should be: Trade secrets protect the hard work of companies like Cybergenetics and foster innovation.

But that could be changing.

Fairfax County Circuit Judge Dontae L. Bugg ordered Cybergenetics to comply with Kennedy’s subpoena for TrueAllele’s source code without any conditions in August, but restricted the code from being publicly released.

In January, a federal court in Pennsylvania also ordered Cybergenetics to release TrueAllele’s source code to defense teams under protective order in the case of a Pittsburgh felon charged with illegally possessing a firearm, although a judge is still deciding what form the review will take.

A state court in New Jersey also ruled this year that TrueAllele’s source code had to be disclosed to the defendant in a murder case, but prosecutors have decided to withdraw the TrueAllele evidence rather than go forward with the review.

“Without scrutinizing its software’s source code — a human-made set of instructions that may contain bugs, glitches, and defects — in the context of an adversarial system, no finding that it properly implements the underlying science could realistically be made,” the judge in the New Jersey case wrote.

Cybergenetics did eventually allow Kennedy to review TrueAllele’s source code on an iPad, but he has no expertise in computer programing. A defense expert is slated to review the code, but the coronavirus pandemic has prevented him from traveling. A review could happen in the coming months.

Bugg has yet to rule on the defense motion to exclude the software’s evidence in the case. Watson’s trial is scheduled for February 2022.

The Office of Fairfax County Commonwealth’s Attorney Steve Descano, who ran on a platform of bringing transparency to prosecutions, said in a statement that his office is committed to fair trial rights in response to questions about its use of TrueAllele in keeping with that stance.

“This office cannot speak to TrueAllele’s internal deliberations regarding how they share their source code and has not taken steps to prevent the defense from obtaining this information,” spokesman Ben Shnider said.

What will the expert find in the vast reams of code? That is subject to debate.

Perlin said he doubts that any errors could be found by simply poring over it. He said issues with probabilistic genotyping software come to light by running test samples and examining the results. Perlin allows defendants to test TrueAllele.

“We don’t have the source code for the brake systems of our cars, but we test them,” he said.

But Matthews, the computer science professor, said a review could reveal issues, provided that defense experts get the code in a format conducive to testing and manipulating it — something that has been subject to wrangling in the Virginia, New Jersey and Pennsylvania cases.

“Software frequently has errors,” Matthews said. “It’s a complex system and requires a process of iterative debugging. You do a certain amount of testing before something is released, but it can encounter things in the real world that can stimulate unexpected problems.”

Regardless of what’s found, Rebecca Wexler, a law professor at the University of California at Berkeley, said the fight reveals a major shift as the courts come to rely on proprietary software for evidence on everything from ballistics to bail.

“The whole justice system is becoming automated,” she said. “Any automated technology that’s coming in the justice system has the potential that someone is going to claim there’s trade secrets in it.”

Eyewitness contradicts police account of fatal shooting of Black man

While vowing police reform, a majority-Black county has spent $17.6 million fighting officers who allege racism

D.C. police recruits are learning about Black history, go-go music and half-smokes. Leaders think it will make them better officers.