TAIPEI, Taiwan — Biographies and service records of aircraft carrier captains and up-and-coming officers in the U.S. Navy. Real-time tweets originating from overseas U.S. military installations. Profiles and family maps of foreign leaders, including their relatives and children. Records of social media chatter among China watchers in Washington.
The cache, called the Overseas Key Information Database, or OKIDB, purports to offer insights into foreign political, military and business figures, details about countries’ infrastructure and military deployments, and public opinion analysis. The database contains information on more than 2 million people, including at least 50,000 Americans and tens of thousands of people who hold prominent public positions, according to Zhenhua’s marketing documents and a review of a portion of the database.
Although there is no evidence showing that the OKIDB software is currently being used by the Chinese government, Zhenhua’s marketing and recruiting documents characterize the company as a patriotic firm, with the military as its primary target customer.
U.S. experts who have reviewed the database offer conflicting assessments of its value. Swaths of the database appear to be raw information copied wholesale from U.S. providers such as Factiva, LexisNexis and LinkedIn and contain little human analysis or finished intelligence products. Much of the social media trove appears to be scraped from public accounts accessible to anyone.
“There might be gold in there, but this is not something that’s useful enough for military or intelligence targeting,” said one cybersecurity contractor for the U.S. government who has reviewed the data and spoke on the condition of anonymity to avoid being publicly associated with a sensitive cache. Zhenhua’s claims, the contractor said, are “totally aspirational.”
But the database, combined with Zhenhua’s digital trail — marketing materials, patents and employees’ résumés — provides a small window into the firm’s ambitions, if not actual capabilities, to glean insights by aggregating and analyzing publicly available, or open-source, data. The potential power of big data has been a long-standing concern for privacy advocates and governments alike, and its use is not exclusive to China. Large-scale open-source collection is undertaken by U.S. government agencies and American companies — the source of much of Zhenhua’s data.
Robert Potter, founder of the Australia-based Internet 2.0 cybersecurity company, and Christopher Balding, an independent researcher, provided an incomplete copy of the underlying database that feeds into the OKIDB software to several news organizations, including The Washington Post. Potter and Balding said they downloaded and reconstructed about 10 percent of the full database, which is estimated to be about 1 terabyte of text. (Potter worked for The Post as a cybersecurity consultant in 2019.)
“Open liberal democracies must consider how best to deal with the very real threats presented by Chinese monitoring of foreign individuals and institutions outside established legal limits,” Balding said.
Zhenhua declined requests for comment. An employee at the company said speaking to reporters would reveal trade secrets. China’s Ministry of Defense did not respond to faxed questions seeking comment.
Researchers and current and former U.S. officials say OKIDB appears consistent with a years-long push by the Chinese government to expand the country’s ability to harvest vast amounts of data for strategic purposes, even if that data is not immediately revelatory.
In 2018, Pentagon officials were alarmed when a fitness-tracking app revealed the locations of overseas U.S. bases.
“We know the Chinese Communist Party seeks to promote bulk data collection now, with the intent that the ability to process and use it will follow in the future,” said Samantha Hoffman, a researcher at the Australian Strategic Policy Institute’s Cyber Center. “This data set proves that they’re targeting individuals and that social media is an important tool.”
Little is known about Zhenhua, which operates out of a technology incubator in Shenzhen and an office park in northwest Beijing. Corporate records show the company was founded in 2017 and is majority-owned by a former IBM engineer named Wang Xuefeng, who could not be reached for comment.
The records do not offer any indication that Zhenhua is controlled by the government, but the company positions itself among a constellation of data and security firms in the government’s close orbit.
One of the corporate partners listed on Zhenhua’s website, a big-data firm called TRS, prominently advertises clients such as the Chinese military and the Ministry of Public Security, for which it claims to offer big-data analysis tools that can connect “biographies, vehicles and telecommunications” — and visualize them — with “one click.”
Another partner is Huarong. The big-data and security hardware firm’s website includes references to Palantir, the Silicon Valley-based U.S. military contractor, but advertises itself as a party-linked, “Red-blooded” company spun off from an unnamed People’s Liberation Army enterprise. Huarong co-hosted a “military-civil fusion” trade conference last year in Beijing, where companies seeking business opportunities mingle with military officials.
Another of Zhenhua’s partners is Global Tone Communication Technology, the subsidiary of a state-owned enterprise owned by the central propaganda department that claims to analyze 10 terabytes of social media and Web content a day for government and business clients.
In a 2017 speech, an executive of the company said 90 percent of military-grade intelligence could be derived from open sources, according to a photo retrieved by Hoffman.
Anna Puglisi, a former U.S. national counterintelligence officer for East Asia who is now at Georgetown University’s Center for Security and Emerging Technology, said vast, meticulous open-source collection was a hallmark of Chinese information gathering.
U.S. counterintelligence vis-a-vis China is “traditionally focused on what’s illegal, what’s directly tied to what military or intelligence officer, the spy-on-spy stuff like what we had with the Soviet Union,” Puglisi said. But in reality, massive open-source collection “fits into the much more holistic way that China goes about acquiring information,” she added. “Things like LinkedIn, social media — this seems like an evolution of that methodology.”
In 2015, China’s government issued its first high-level strategy paper on big data and made it a pillar of an industrial development plan called Made in China 2025. Also in 2015, an essay in the Communist Party’s International Liaison Department’s influential world affairs journal suggested that China could conduct automated Web scraping or legally purchase proprietary databases as its governmental and commercial dealings expand.
In 2017, China passed an inaugural national intelligence law that required Chinese organizations and citizens to assist with state intelligence work in accordance with the law.
A U.S. official said it was “not a surprise” that a Chinese company was scraping information for strategic gain. Law enforcement and intelligence officials have been warning various agencies for years about digital hygiene, and Congress has also been reviewing social media best practices to minimize espionage risk from China in particular, the official said.
Rep. Jim Himes (D-Conn.), a member of the House Intelligence Committee, said the present-day ubiquity of individual data is such a significant concern that it is now difficult, for example, to recruit and protect intelligence officers. But open-source data is universally used for spying, he added.
“If there’s a silver lining here, it’s we can do to China what they do to us,” Himes said.
Facebook spokeswoman Liz Bourgeois said the company has banned Zhenhua from its platform and sent it a cease-and-desist letter.
“Scraping public data, as this company appears to have done to a number of services including Facebook, is against our policies,” Bourgeois said.
A Twitter spokesman said the company had no data-sharing agreements with Zhenhua. A LinkedIn spokeswoman said the company does not permit the use of “software that scrapes or copies information” under its user agreement and that the company is constantly working to improve its defenses to prevent such collection.
Although The Post did not have access to the OKIDB software interface, and much of the OKIDB’s underlying data retrieved by Potter and Balding was in raw form, a review of data entries offers clues about the company’s interests.
Navy vessels such as the USS Dwight Eisenhower and Nimitz carriers are tagged with ID numbers, against which relevant social media posts and websites are catalogued. The database assigned hashes and collated information on officers including former chief of naval operations John M. Richardson. There were cursory markups in Chinese about Navy officers’ service history or whether they completed training for prospective commanding officers.
Entries on former acting secretary of the Navy Thomas Modly, for example, named his wife and four children, and educational and private-sector background. The entry included a field for a psychological profile, which was filled with a generic placeholder.
Images of the OKIDB software taken by Potter, who accessed it through an open server, show a user interface that displays tweets posted from U.S. military installations laid over a map with time stamps. One Facebook post sucked into the OKIDB was from the USS George Washington urging sailors’ families to refrain from posting publicly about where the aircraft carrier was going.
On LinkedIn, one of Zhenhua’s engineers, Zhou Peng, describes building a “demonstration system for military deployment simulation.”
Aside from military figures, the database seemed to scoop up tweets from influential China watchers in Washington. Tweets from Scott Kennedy, a China trade expert at the Center for Strategic and International Studies, frequently surface in the database, as do missives from Bill Bishop, publisher of the Sinocism newsletter, and Lyle Morris, who studies the PLA at the Rand Corp.
Part of the company’s ambitions appear to be offensive.
Public corporate records show the company filed patents between late 2018 and April related to scraping news and information, managing data and processing video, but also social media manipulation. The company in September 2019 patented a tool that “simulates social media interaction.”
“Social media can manipulate reality and weaken a country’s administrative, social, military or economic forces, and may also lead to internal conflicts, social polarization and radicalism in a country,” Zhenhua said on its recently deactivated page, china-revival.com.
Zhenhua maintains a company blog on WeChat with a possibly tongue-in-cheek name — “Bureau 99” — that is reminiscent of the numbered divisions within the Chinese military.
On the blog, an unnamed author posts takes on intelligence, U.S.-China relations and how social media influences U.S. presidential campaigns. In one post in August, the author said Chinese open-source intelligence was historically “minimally effective” and relegated to institutions such as the Academy of Military Sciences.
That changed with the passage of China’s national intelligence law in 2017, the author wrote: The law “promoted the healthy development of the intelligence industry.”
The company also posts recruitment ads, seemingly aimed at veterans.
“Bureau 99: we specialize in researching and deploying open-source intelligence to serve the great rejuvenation of the Chinese people,” reads a Sept. 10 ad for positions in Beijing. “We only need your passion and expertise!”