A data broker shared billions of “highly sensitive” phone location records with the D.C. government last year that revealed how people moved about the city during the pandemic, public records show.
The company, Veraset, provided the data as part of a free trial, according to internal emails obtained through a Freedom of Information Act request by the Electronic Frontier Foundation. District officials reviewed the data but ultimately declined to renew the partnership after the trial finished. The emails show that the shared data was authorized for coronavirus-tracking purposes only and did not include names or other details. EFF researchers said they found no evidence the data was misused.
But EFF technologist Bennett Cyphers said the emails show how data brokers tried to “covid-wash” their controversial work during the health crisis and forge new relationships with government authorities. He also questioned how anonymous the data truly is.
“A lot of these data brokers’ existence depends on people not knowing too much about them because they’re universally unpopular,” Cyphers said. “Veraset refuses to reveal even how they get their data or which apps they purchase it from, and I think that’s because if anyone realized that the app you’re using” also “opts you into having your location data sold on the open market, people would be angry and creeped out.”
He noted that Veraset’s location data includes sequences of code, known as “advertising identifiers,” that can be used to pinpoint individual phones. Researchers have also shown that such data can be easily “deanonymized” and linked to a specific person. Both Apple and Google announced changes earlier this year that would allow consumers to block their identification numbers from being used for tracking.
“If you look at a map of where a device spends its time, you can learn a lot: where you sleep at night, where you work, where you eat lunch, what bars and parks you go to,” Cyphers said. Because of that, he added, it’s simple “to associate one of these location traces to a real person.”
Veraset and other data brokers have worked to improve their public image and address privacy concerns by sharing their records with public health agencies, science researchers and news outlets, claiming that the raw data could provide an indispensable way to monitor potentially risky crowd movements and public gatherings. The Washington Post, the New York Times and other news outlets also have made use of the location data in reporting on potential health risks during the pandemic.
Veraset and other data brokers pay software developers to include snippets of code in their apps that then share location data back to the firm. Some companies have folded their code into games and weather apps. Veraset does not say which apps it works with, and critics have questioned whether users are truly aware their data is being shared in that way.
The company is a spinoff of SafeGraph, which Google banned earlier this year as part of an effort to restrict covert location tracking. Officials with Veraset and SafeGraph did not respond to requests for comment.
Sam Quinney, director of The Lab @ DC, a science and technology team in the District government, said in a statement that District officials reviewed the data to determine if it could help with the local pandemic response but “did not find suitable insights for our use cases” and declined to renew the access. The data, Quinney said, was never shared with anyone other than authorized officials and is scheduled for deletion at the end of the year.
SafeGraph said last year it had shared data with the Centers for Disease Control and Prevention and state and city officials. Its website says the company strives to “be the source of truth about the physical world.”
Its investors include Peter Thiel, the billionaire founder of data miner Palantir, and Prince Turki Al Faisal, one former Saudi ambassador to Washington who led the Saudi intelligence agency from 1979 to 2001.
The CDC used SafeGraph data as part of a year trial starting in the first weeks of the pandemic and, in April, awarded a contract to the company for another year of “social mobility” data, a spokeswoman told The Post.
The data is used in the public CDC coronavirus tracker to estimate what percentage of the population is staying home. The CDC has also published at least two scientific reports using SafeGraph data covering how lockdown orders as well as the timing of public policy changes during the pandemic changed population movement along with “community mobility.”
Some public health groups and news outlets have argued that the data can offer important insights and should be handled carefully so as to limit risks to people’s privacy. The Post last year used SafeGraph data to visualize changes in attendance and potential risk at bars, churches, workplaces and restaurants, and the New York Times used SafeGraph and Veraset data to illustrate the differences in safety between specific gyms, coffee shops and restaurants, based on how long people visited and how crowded they got.
A Post spokeswoman said in a statement that the aggregated data did not include any personally identifiable information and offered “an important way to give readers a sense of what was happening around the country in a time of so much uncertainty.” A Times spokeswoman said in a statement that their reporting relied on aggregated location data that was securely stored and erased after publication of news stories.
Veraset had D.C. officials sign a “data access agreement” prohibiting the use of the data for nonresearch purposes and allowing the company to “choose to remain anonymous as the source” of the location data at the “sole discretion” of the company. That agreement, Cypher said, could help Veraset ensure its work is cast in a positive light. The city’s refusal to pay for the data, he added, suggested that raw location data may be less useful for public health than the company has claimed.
A D.C. government official said in the emails that the records included more than 12 billion data points. One phone can produce many data points because its movements are tracked over time.
The emails have been redacted so as to not disclose how many people had their location data gathered, but a Veraset listing on the data marketplace Datarade said the company’s records cover roughly 10 percent of the U.S. population, indicating that the D.C. location data could have detailed the movements of hundreds of thousands of residents.
The Datarade listing also advertised “billions of daily precise location data observations” taken from thousands of apps. Besides governments, the firm advertises its data to advertising, real estate and investment firms interested in tracking movement at certain locations. “Our core population human movement data set delivers the most granular and frequent GPS signals available in a third-party data set,” it states.
The pandemic has fueled a nationwide debate over whether public health uses are valuable enough to justify an open market in data drawn from tracking people’s movements without their knowledge.
Sens. Ron Wyden (D-Ore.) and Rand Paul (R-Ky.) introduced a bill this spring, the Fourth Amendment Is Not for Sale Act, that would prohibit government and law enforcement agencies from buying location data and other personal information without a warrant. The bill would not prohibit the sale of location data to government agencies for public health purposes, but it would prevent such data from being shared by public health agencies with any law enforcement or otherwise intelligence officials.
Wyden’s office attempted to contact SafeGraph multiple times last year but never received a response, an aide told The Post, adding that Wyden cited the company to Google as a “data broker of concern” shortly before the technology giant banned SafeGraph’s location-tracking code.
“It’s no surprise that shady data brokers want to exploit the pandemic to put a positive spin on their sale of Americans’ private information to the government,” Wyden said. “The unregulated trade in detailed location data creates serious safety risks for American families. The United States needs a comprehensive federal privacy law to stop these shady data sales.”
In April 2020, a Veraset representative emailed a D.C. government official with an offer of “highly sensitive data” that “must be treated with extreme care,” according to the public records. The offer included two data sets: “Movement” for GPS location coordinates tied to a phone’s advertising identification number and other information, and “Visits” for showing when individual phones had visited stores or other “points of interest” over time. (In another listing on Datarade, Veraset said the “Visits” data covers roughly 6 million places visited by 20 million people every day.)
A D.C. official responded that the phone data could help the Department of Health determine whether social distancing and lockdown orders had been effective. Over the course of the next six months, according to the emails, Veraset officials routinely passed along new data of phone locations of people that had been recorded within the last 24 to 72 hours.
Although the data did not include the movement of all residents, a Veraset official wrote, company tests had indicated that the data was representative enough that it could be used to “infer population movement.”
The redacted emails said that the location data covered an unidentified portion of the metropolitan area that included both the District and nearby neighborhoods in Virginia and Maryland. District officials said that they worked to safeguard such location data, marking it for encryption and designating it as “classified” to block it all from public view.
When the trial period ended in late September, a D.C. official wrote that the Veraset data had been “an excellent baptism by fire” for data scientists working to expand the city’s centralized information database.
“Having such massive regularly updating tables forced us to make leaps forward” and allowed officials to “learn about the strengths and weaknesses of using mobility data” for the local pandemic response, the official wrote. But the District, he continued, never found a use for the data due to “the limitations of app-based data and competing priorities.”
The emails do not reveal a price that D.C. was expected to pay to extend their data access. In a separate 2019 agreement obtained by EFF, the state of Illinois paid SafeGraph $50,000 for access to two years of raw phone location data totaling roughly 50 million GPS “pings” a day.
Democratic senators last year called for an investigation into U.S. Customs and Border Protection officials’ use of location data sold by the data broker Venntel, which the agency had used to track people without a warrant. The senators said that the agency “should not be able to buy its way around the Fourth Amendment,” which protects against unreasonable searches. CBP officials said they were allowed to “obtain access to commercially available information relevant to its border security mission.”