A Major Milestone for Osano...and the Industry
When we founded Osano, our goals were ambitious. We wanted to...Read Now
February 12, 2021
The definition of personal data varies depending on which law you're reading. But it's important to know how to recognize which data is considered "personal" under the law that governs your organization.
It's a thankless job, but determining what kind of personal data your company collects is a business imperative now. EU regulators increasingly fine companies for violating data privacy rules, and U.S. states are passing laws faster than ever that place guardrails on data collection and use. Following the laws means understanding their terms. So what counts as personal data?
Classifying data is a crucial step because it dictates a cascade of operations to follow. While most organizations prefer to deem data "non-personal" whenever possible for use as desired, there are benefits to casting a large net. Sure, that means taking special care of much data, but the return on investment can be worth it.
The definition of personal data can vary from one legal jurisdiction to another. For example, European law defines it differently than California's privacy law. Classifying data can often be troublesome for organizations because every case is unique. But it's essential that you do because you must determine what you have to know how to treat it.
The European Commission, which proposes EU law, defines personal information as "… any information that relates to an identified or identifiable living individual."
The phrases "relates to" and "identifiable" are tricky. The language's purpose is to protect data that might not, on its face, seem like it's "personal." While a person's first name generally isn't considered personal, if your first name links to your company, that's probably enough to identify you. So that data has to be treated as "personal" and protected accordingly.
Here's what the U.K. Information Commissioner's Office instructs in its guidance on personal data under the U.K. General Data Protection Regulation: "You should take into account the information you are processing together with all the means reasonably likely to be used by either you or any other person to identify that individual."
It adds, "When considering whether information 'relates to' an individual, you need to take into account a range of factors, including the content of the information, the purpose or purposes for which you are processing it and the likely impact or effect of that processing on the individual."
California's Consumer Privacy Act (CCPA), currently the most stringent privacy law in the U.S., calls personal data "personal information." It defines personal information as information that "identifies, relates to, describes, is capable of being associated with, or may reasonably be linked, directly or indirectly, with a particular consumer or household." The CCPA does not consider publicly available information from federal, state or local government to be personal information. A driver's license, for example, would not count as personal information, as the California Attorney General's Office states in its CCPA FAQs.
However, the California Privacy Rights Act (CPRA) passed the ballot in November, and it creates a new category of "personal information" called "sensitive personal information." It broadens the CCPA's definition to include data like Social Security numbers and driver's license numbers, as well as biometric sexual orientation data. The CPRA becomes effective in 2023.
Gabe Maldoff, a privacy attorney at Covington & Burling, said determining what is and is not personal data is different for every organization.
"It definitely depends, because the factors you take into account are not only the nature of the data itself but also what else might be out there in the universe that could be associated with this database, which would allow it to be linkable to an individual."
The technology an organization uses is what matters; its computing power determines how "linkable" one piece of data might be to another.
"Yes, there is a legal concept of personal data, and yes, that means there is information that falls outside of it. But exactly where you draw that line becomes really, really difficult," Maldoff said.
Every legal framework sets the threshold at a different place, said Maldoff. Still, they all come back to the same struggle for organizations: identifying the threshold for "personal data."
Catherine Dawson is Osano's privacy attorney. She said, "For most compliance or privacy professionals, getting a handle on how data is used throughout the organization is challenging, in part, because it can change quickly."
She added, "It's easy for new or evolving practices to fall through the cracks and not get vetted properly."
Under any framework, anonymized and pseudonymized data have different rules. Anonymized data, where a data subject could not be re-identified," is not considered "personal" under the EU GDPR.
Under the CCPA, businesses are exempt from treating consumer data as personal if it is "de-identified." That means the data can't be linked to, identify or describe the consumer.
There's been much debate about anonymized and de-identified data. There are differing opinions on when data can be said to be effectively anonymized or de-identified. Kelsey Finch, Jules Polonetsky and Omer Tene in a blog post for the IAPP, "Although academics, regulators and other stakeholders have sought for years to establish common standards for de-identification, they have so far failed to adopt even a common terminology."
Here's a good example: In 2006, Netflix ran an experiment to personalize better its recommendations on what customers would like to watch. It released 10 million movie rankings by 500,000 customers. The data was anonymized; personal data replaced with random numbers. But researchers were able to de-anonymize some of the data by comparing two databases, proving that very little information is needed to re-identify someone.
The point: If you say certain data is "anonymized" and then treat it as such under the law, it's best to take a cautious approach. You must be sure neither your organization nor a third party could ever link the data you're using in a way that could re-identify an individual.
In the end, Maldoff said, it's better to overprotect your entire data pool.
"Where I usually go with my clients, where we ultimately usually end up is that it doesn't matter that much whether it's personal data or not, although the law only applies to personal data, and there are clear benefits to finding the data falls outside the scope," Maldoff said. "In reality, at least when you're planning a compliance program, being over-inclusive and treating more things as personal data than they are doesn't usually come with huge downsides to the business' ability to use the information."
And remember it's not a one-and-done kind of job, Dawson said.
"It's tempting to think about classifying data solely at the point of collection," Dawson added. "In reality, that's just the beginning."
The Osano staff is a diverse team of free thinkers who enjoy working as part of a distributed team with the common goal of working to make a more transparent internet. Occasionally, the team writes under the pen name of our mascot, “Penny, the Privacy Pro.”