Hard Lessons From the National Public Data Hack

The sheer scale of the breach and the way in which the data was first collected should be a national security wake-up call.

National Public Data, a data broker branding itself as a background check company, has been hacked, as a newly proposed class-action lawsuit and media reports recently revealed.

The hacker group calling itself “USDoD” stole and posted online 2.9 billion records—including individuals’ full names, Social Security numbers, dates of birth, current addresses, past addresses, names of parents, names of siblings, and phone numbers. Details are still emerging about the National Public Data hack and its impacts. But it appears that the data implicates, at a minimum, hundreds of millions of people in the U.S., Canada, and the U.K.—making it one of the largest-ever publicly disclosed hacks of personal data.

The sheer scale of the breach and the way in which the data was first collected raise four important lessons for lawmakers and the national security community. The market for data incentivizes this kind of ever-expanding data collection to increase profits. Data aggregation at scale adds new dimensions to privacy risks, and when the data is poorly secured, the harms to privacy and national security are magnified. Data brokerage, despite some recent legislative discussions, remains highly unregulated. And while regulatory actions from agencies such as the Federal Trade Commission and Department of Justice are essential, playing whack-a-mole against privacy-invasive and national security-threatening data sales is not a viable, long-term strategy.

The first lesson from this breach is that the market incentivizes data brokers to collect ever-more data to increase their profits. Data brokerage is a large industry, and there is variance company to company. Some data brokers, for instance, may be incentivized to collect more data points on the same customer base—such as expanding the breadth and granularity of their data repositories on people living in the U.S. Other data brokers may be incentivized to collect more data by adding on different types of data, much like how geolocation data brokers such as X-Mode have added lists like “Cancer” and “Special Needs Kids” over time to supplement their data offerings. Others may be incentivized to keep collecting data on more and more people around the world, to the tune of billions of individuals. In National Public Data’s case, the company compiled billions of records, which it then allowed customers to search for “instant results.” (The data broker describes its customers as “private investigators, consumer public record websites, human resources, staffing agencies, and more.”) Whether the collection of data on billions of people would be seen as an outlier in other areas of the tech sector, it is not uncommon for a number of data brokers. National Public Data took advantage of the availability of data from “various public record databases, court records, state and national databases, and other repositories worldwide” to build its database covering people on multiple continents.

Second, the aggregation of data at scale adds new dimensions to privacy risks. Discussions and scholarship exploring collective privacy rights and harms (including the limits of an individual rights-centered approach in law), the privacy risks of big data sets, and related issues expand in detail on this point. Gathering enormous amounts of data on people and combining it together makes it easier to identify or “reidentify” specific individuals within the data. It makes it easy for ill-intentioned or simply careless actors, whether cybercriminals or U.S. companies, to identify correlations between data points and infer additional, sensitive data about people in the data set. Aggregating data at scale makes it more attractive for hackers, from independent cybercriminals to nation-states, to target and steal those big piles of data. Take the Equifax hack as a prime example: The data broker and credit reporting agency had poor security practices, and in 2017, Chinese military hackers stole its data on about 145 million Americans.

Here, the mere fact that National Public Data aggregated billions of individually identified records on people, including sensitive data such as Social Security numbers, underscores each of these points. One can imagine a data buyer, or the company itself, using the names of people’s family members to attempt to infer race or nationality (whether accurately or not) and then combining that with a history of a person’s ZIP codes, which can be used to infer income brackets, to come up with a data set both prone to abuse and more attractive to a hacker than smaller, scattered databases with the same information. This creates additional privacy risks, including giving the data set holder or purchaser the ability to combine identified data points together across time. It poses national security risks too, such as through the ongoing fallout of the theft of, ostensibly, millions of people’s Social Security numbers and other data.

Third, data brokerage remains a highly unregulated industry, even despite recent legislative discussions. It did not take a corporation with multiple sprawling offices and thousands of staff to compile this sensitive data on, at the very least, hundreds of millions of people. It appears to have been fewer than 25 people (possibly just one person) at a company in Florida called Jerico Pictures, Inc. (doing business as National Public Data), who decided to aggregate and sell all this information. Public records, meanwhile, are often considered by U.S. law to be “publicly available information” despite the fact that digitizing public records, aggregating them, linking them to specific people, and posting them online for search and sale is incredibly invasive and meaningfully changes the privacy, security, and safety risks facing individuals.

Zooming way out beyond National Public Data—and looking at the lack of protections around companies collecting, aggregating, inferring, and selling data on people’s mental health conditions, geolocations, finances, and even children—it is clear that current laws are failing to protect Americans, who have no way of meaningfully objecting or consenting to this kind of data collection and sale. The risk to vulnerable people across the country, from elderly individuals to targets of domestic and gendered violence, and to government employees and military personnel on the national security front, points to glaring legislative gaps.

Lastly, the whack-a-mole approach will not suffice. Class-action lawsuits (and private rights of action) are important options for consumers to seek redress when companies misuse their data and violate their privacy. Regulatory actions are also important, including the Federal Trade Commission’s work against telehealth companies and antivirus companies selling personal data; the Consumer Financial Protection Bureau’s work to protect military service members and older Americans from scams and targeting; the Justice Department’s efforts to prosecute elder fraud and scams fueled by data brokers; and state-level actions such as California’s settlement with a company that illegally collected and shared children’s data. The need for strong federal and state privacy regulatory powers, and substantive investments in regulatory agencies, will not go away, even with a strong, comprehensive federal privacy law.

But absent a comprehensive privacy law, or even a smaller one targeting the most egregious of data practices (e.g., selling contact information for elderly people suffering from Alzheimer’s, the addresses of stalking survivors, the online activity of government contractors), these harms will continue. They will continue to threaten Americans’ privacy, civil liberties and freedoms, and national security. And when breaches like this one occur, consumers will face even greater risks of fraud and scams—and they, along with many other businesses, will bear the financial and economic costs. Foreign actors could leverage the data to scam, phish, impersonate, and otherwise target government personnel, too (e.g., by having access to their Social Security numbers, compiled data on past addresses, and readily searchable information on family members’ names). While National Public Data may be in the headlines today, many other companies have collected data on people at such a scale and will continue to do so. Congress cannot sit back and expect regulatory agencies to do its job.

Instead of Congress just holding yet another hearing about the technical details of a cyber incident (which is sometimes important), or writing yet another bill like the American Privacy Rights Act, which is riddled with industry-written loopholes, it should use these four lessons to understand why the data brokerage industry’s continued large-scale data collection is such a threat to consumers, privacy, and national security.

More information is still coming out about the National Public Data hack. The details themselves, such as the threat actor USDoD attempting to sell the data for $3.5 million, are interesting case studies on breaches and dark web sales of large-scale data sets. In the big picture, though, law- and policymakers should take these four lessons and view this kind of data aggregation and sale—and then compromise—as part of the much bigger problem it is.

– Justin Sherman is a contributing editor at Lawfare. He is also the founder and CEO of Global Cyber Strategies, a Washington, DC-based research and advisory firm; a senior fellow at Duke University’s Sanford School of Public Policy, where he runs its research project on data brokerage; and a nonresident fellow at the Atlantic Council. Published courtesy of Lawfare.

Hard Lessons From the National Public Data Hack

Hard Lessons From the National Public Data Hack

Leave a Reply

Autonomous AI Agents Developed to Detect Early Signs of Cognitive Decline

Priorities for U.S. Participation in International AI Capacity-Building

New AI-Driven Tool Could Help Find Heart Disease Drugs Faster

Are AI Companies Actually Ready to Play God?

Large Language Models Unleash AI’s Potential for Autonomous and Explainable Materials Discovery

AI Regulation’s Champions Can Seize Common Ground—or Be Swept Aside

Researchers Attempted to Emulate a Clinical Trial Using Data from Real Patients

Leave a Reply