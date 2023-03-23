Listen 5 min Comment on this story Comment Gift Article Share

On March 18, 2020, a guest on “Tucker Carlson Tonight” promoted the “remarkable” results of an unpublished French trial testing the effectiveness of hydroxychloroquine against covid-19: The study, Gregory Rigano claimed to Carlson, showed the anti-malaria drug having a “100 percent cure rate against coronavirus.” He urged President Donald Trump to authorize the use of the drug against the emerging pandemic “immediately.”

A day later, Trump endorsed the drug. The study — an open-label, non-randomized trial with a very small sample size — was published in the International Journal of Antimicrobial Agents on March 20. The International Society of Antimicrobial Chemotherapy, which produces the journal, released an early-April statement saying it no longer believed that the study was up to its standards. Multiple large, randomized trials would later show that hydroxychloroquine has no meaningful use in treating covid-19.

“It is important to help the scientific community by publishing new data fast,” ISAC said in its statement. But, the group added, “this cannot be at the cost of reducing scientific scrutiny and best practices.” This tension between speed and accuracy — and its disastrous effects on public trust in scientific research — is at the heart of Gary Smith’s “Distrust: Big Data, Data-Torturing, and the Assault on Science.”

Smith, an economist whose work often examines the misuse of data and statistics in a variety of disciplines, argues that the current crisis of trust in science falls at the intersection of three forces: disinformation, data torturing and data mining. Disinformation, as Smith writes, is “as old as the human race,” but accelerated in speed and reach alongside social media. Data torturing describes the practice of manipulating data until it yields the desired result — for instance, by simply throwing out results that contradict a study’s argument. And data mining, driven by the abundance of available data and the speed with which computer algorithms can comb through it, involves pulling correlations from data that could be coincidental and imbuing them with meaning. Drawing on examples ranging from bitcoin to weight loss to artificial intelligence, Smith explains how “science’s hard-won reputation is being undermined by tools invented by scientists.”

People are often tempted to trust statistics and algorithms as neutral arbiters. But algorithms are incapable of independently understanding the worth of what they’re generating. They’re also very good at producing the appearance of meaning, which makes it that much easier to trawl through data sets in search of the conclusions you want to see in them. When a scientist uses an algorithm to look for a statistically significant relationship in a huge trove of data, they are going to find something. It might well be random nonsense. As Smith notes, “The explosion in the number of things that are measured and recorded has magnified beyond belief the number of bogus statistical relationships waiting to deceive us.”

And yet statistical significance can be enough to drive publication in a reputable journal. If the correlation is catchy enough, it’ll get a lot of media coverage: The initial conclusions of the French hydroxychloroquine study didn’t stand up as more researchers investigated the drug’s effect on covid-19. And yet, the afterlife of all that initial attention was long and consequential, as anti-science grifters and conspiracy theorists continued to push the drug as a secret cure for covid-19 that scientists were covering up.

“Distrust” is most compelling when it is examining how data manipulation affects lives. Algorithmic data mining has been used to evaluate potential hires, predict criminal behavior and approve loans, even when those relying on this information, Smith writes, have “no way of knowing whether it’s sensible” to draw such conclusions.

“Distrust” makes a strong argument for teaching “quantitative literacy” alongside media literacy, the latter of which has become a popular rallying cry for those trying to fight the spread of mis- and dis- information. “We need to recognize how data, statistics, and graphs can be used and abused — and the learning can begin in elementary school and continue through college and even grad school,” Smith writes.

This call for quantitative literacy in science and in the real-world deployment of algorithmic data analysis feels even more urgent now than it did last summer, when Smith completed “Distrust.” Generative AI tools like ChatGPT and DALL-E are creeping into the daily online lives of more and more people. Smith devotes a chapter to the overpromises of AI, arguing presciently that “the real danger today is not that computers are smarter than us, but that we think computers are smarter than us and consequently trust them to make important decisions they should not be trusted to make.”

Smith’s chapter on disinformation — and subsequent recommendations for addressing it — feels, at times, less nuanced than the rest of the book. His analysis of the role of bots in spreading misinformation may rely too much on the number of bots, rather than the impact they may have on the spread of the disinformation they amplify, for instance. And many of his recommendations for addressing disinformation sound like approaches that major companies have already tried. His suggestion that an army of “public-spirited volunteers” could act as essentially Wikipedia editors for social media falsehoods is a good one. It’s also what Twitter’s Birdwatch was doing before Elon Musk took over the platform.

That said, the lessons of “Distrust” are very much needed. Smith’s recommendations for reforming how data is used — including providing more support for reproducibility and replication research, statistical literacy courses, and the prioritization of studies that provide detailed descriptions of their research plans before actually starting — are well taken.

There’s plenty of nonsense out there, and we’re all perfectly able to recognize it if we have the right tools. As Smith writes, “Humans know better.” Or, at any rate, we should.

Distrust

Big Data, Data-Torturing, and the Assault on Science

By Gary Smith

Oxford University Press. 336 pp. $32.95

