The ultimate guide to data discovery

  • by Osano Staff
  • · posted on March 29, 2021
  • · 8 min read
The ultimate guide to data discovery

Data mapping can feel like a daunting task. Here are some tips on how to approach the process. 

Data mapping sounds kind of dreadful, doesn't it? Overwhelming, at least. When you imagine the trails of data stretching for miles at even small companies, trying to map where it all leads can feel like an arduous task. And, to be honest, it is. 

Here's the good news: There are significant kickbacks to data mapping beyond legal compliance. 

Let's start at the beginning, though: Data mapping is the roadmap to your compliance program. Upon it, you build everything else. Even if the EU General Data Protection Regulation didn't require documentation on the records you process -- and it does -- data mapping would still be an essential process for you to understand the data you're collecting across multiple systems and databases. It comes down to this: If you don't know where your data is, you don't know where the threats live, nor where your opportunities are.  

Often, the problem results from the fact that the data landscape used to be the Wild Wild West of sorts. Before there were strict rules on what organizations could and couldn't do with personal data, the general approach was a data land grab. If data is money, the more data you have, the more profitable you'll be, right? 

But since 2018 and increasingly so as states and countries pass data privacy laws, there are rules. Many of them. And it's your responsibility to ensure you're following them. The likely decision at the end of taking inventory is to delete much of the data you'd once collected and don't need. If you don't collect it, you don't have to protect it, which can be a significant risk mitigator. 

From a legal compliance standpoint, under the GDPR, companies are required to maintain "records of processing activities." If the GDPR covers your company, you're required to be able to demonstrate the reason you're processing personal data, the categories of recipients you'll disclose the data to, transfers of data to "third countries" and the length of time before you'll delete the data. 

Those are important and complicated questions. If there's a breach at your company or with one of your vendors, you're going to want to have that documentation ready for disclosures on where it lived, with whom you shared it, etc. Starting an investigation from scratch while the regulator taps their fingers will not only quickly reveal you are out of compliance with Article 30 of the GDPR, but the process will be a significantly more stressful endeavor.

The best method is to take a bird's-eye view. 

How do you do data mapping?

When you set out to data map, you're looking for every place you interact with customers within the business. You must determine what data you collect from them and with whom you share it. If you're data mapping for GDPR-compliance purposes, it makes sense to use the GDPR's definition: "Any information which are related to an identified or identifiable natural person." So that's data like customer credit card numbers, telephone numbers or addresses. 

But be careful because privacy laws are increasing rapidly now, and the GDPR's definition of personal data will not necessarily be the default. Increasingly, privacy laws incorporate broader understandings of what personal data means. California's privacy law has its own definition, for example. 

Speaking of the GDPR and the California Consumer Privacy Act, this point in the process is a great time to figure out the laws and regulations that cover your organization in general. While this article focuses on privacy laws, there are other compliance requirements to consider. If you're processing payment card information, for example, you're going to want to look at the Payment Card Industry Data Security Standard. If you're processing children's data, you should know your obligations under the Children's Online Privacy Protection Act (COPPA). 

Curious about privacy? Find out how Osano automates compliance & saves you time! Learn more

Heads-up: You're going to have way more personal data than you thought. But recognizing the problem is the first step in addressing it.

Which data sources should I involve?

Data mapping tools that will automate the process are increasingly available, though most at early development and deployment stages. Anyone who's been through the process will tell you: The more you can automate the process, the healthier you'll be in the end. 

The priority is finding those "data stewards" within the organization. That can be difficult for a couple of reasons. First, those folks have got jobs to do, too. If "privacy compliance" isn't something their bonus depends on at the end of the year, chasing down every piece of personal data they use in the course of business is a thankless and time-consuming job. 

If you can find a way to incentivize the strategic players here, this will go much easier. 

The process requires many bodies to get involved because different departments within the organization are using customer data for different purposes, and often those departments aren't talking to each other. The sales, marketing and engineering team all need to process data for various, but often different, purposes. 

It's essential the data analytics team -- if there is one -- is part of this process. That team knows where the bodies are buried. Get them in the room.

Frequently, data mapping reveals data floating within company databases that nobody seems to own. It was collected and stored, but there's no documentation of its origins, why it's needed or who's responsible for its safekeeping. Often, various functions within your organization may be using the one data set for different purposes, but the two roles aren't communicating with each other. That practice is dangerous in the case that a data subject files a data subject access request, as permitted under the GDPR and California's Consumer Privacy Act. 

If a data subject requests the data you've collected and processed about them, it's imperative to have a comprehensive report available. If you give them the data the marketing team collected about them and didn't realize the sales team was using some of that data, as well, you've got a problem. You have to be able to tell them for what purposes you're using their data and under which legal allowance. 

While the idea that data might be floating around the hallways of your systems like teenagers at the mall might give you hives, that's partly why this exercise exists. If you know you have a data set and no one claims it, it's a great time to talk about whether you should retain the data and assign it owners or if you can delete it entirely. 

Data mapping for GDPR: controller or processor?

As mentioned, the GDPR requires data "controllers" to maintain data records management. A controller is an entity that makes decisions about what happens to the information within its system. If you're receiving information directly from a user, you're more likely the controller. A processor is an entity a controller sends the data to, and different rules apply. 

For our purposes here, we'll focus on data controllers. 

As a controller, within your data map, you need to answer upon which legal basis you're collecting the information. You can find the right reasons to do so here, under Article 6 of the GDPR

Try Osano Free!

Is vendor monitoring important? 

Once you determine the data your organization collects, It's essential to identify which vendors or sister organizations you contracted with and what data you share with them. In looking at vendors, you should look at their security and privacy practices to be sure they align with your own. 

Identifying what data is going to which vendors then allows you to decide how to treat each data set. 

There are several software vendors offering tools to automate vendor-risk monitoring (including Osano, to be transparent), which is a safe bet for a couple of reasons: First, depending on the business' size, there could be hundreds of vendors processing data, and those vendors could be using vendors of their own. The responsibility to vet how your vendors treat the data you've collected and passed on is yours. Second, vendors frequently make changes to their policies according to shifting imperatives or in response to new laws and regulations. It would take a full-time dedicated worker to manually go to each vendor and sub-vendor associated with your business and check if they've changed anything in their policy since you signed a contract with them. Software solutions that can alert you to those changes can be the difference between high- and low-risk partnerships. 

Privacy Shield is dead: What about data flows?

There's never been a more critical time to know when your data flows across borders. In July 2020, in its Schrems II decision, the European Court of Justice invalidated the Privacy Shield, the data transfer mechanism many companies relied on to transfer data from the EU to the U.S. The EU was not satisfied with the U.S.'s ability to adequately protect European data; there's not even a privacy law in the U.S., and law enforcement agencies can conduct mass surveillance on a broad swath of communications thanks to allowances in post-911 anti-terrorism rules. 

The European Court also said that standard contractual clauses, another mechanism companies used to transfer data, should be examined on a case-by-case basis. 

The European data protection authorities are aware of the precarious situation companies find themselves in. Companies still need to transfer data across borders; that didn't change when Privacy Shield sunk. The EU and the U.S. are in negotiations to develop a new deal, but reports indicate talks could take years. 

That's why it's essential to check on your vendors. If data is crossing borders, it's essential to know where it's going and what contracts you're using to get it there. If you realize you're transferring data without a legal agreement that's currently legit, it might be time to decide where to store that data. 

After a data mapping exercise, some companies will decide to move the data to a storage center elsewhere. Suppose you can't legally transfer the data overseas. In that case, it's likely possible to physically store the data in a cloud located in the same jurisdiction as data subjects about whom you've collected data.

How are data mapping and data subject access requests related? 

The beauty of doing the data mapping work is that it's there for you when you need it. Specifically, under the GDPR and California privacy law, data subjects have rights to the data you store on them. When an individual files a data subject access request, the data controller must respond with the data stored on that individual and the purpose for processing their data. As the data controller, you must also disclose the parties with whom you share it and any third countries to which the data is sent. Users also have the right to request their data be deleted.

Under California privacy law -- currently the California Consumer Privacy Act, which the California Privacy Rights Act will replace in 2023 -- data subjects can also require an organization to disclose the information an organization stores about them, the reason it was collected and who else has access to it. 

Privacy compliance is just the beginning

Once you see the big picture, understanding what data you collect and process, with whom you share it and where it flows, it's time to make some decisions.

If there's an entire silo of data that no business unit has claimed and you can't come up with a reason for having it, maybe it's time to delete some data.

But this is also an opportunity to take the data you keep and make strategic decisions about developing and growing your product. And that can be a good selling point to the C-suite if you need to fight for resources to conduct the data map.Yes, this is going to be a long and tough endeavor, made easier by however you can automate it and how many bodies you can get in the room with you. 

But in the end, you're standing at a virtual hilltop, able to see a clear picture of your data landscape. Now you can make decisions based on risks, threats and opportunities rather than blind guesses. 

About The Author · Osano Staff

The Osano staff is a diverse team of free thinkers who enjoy working as part of a distributed team with the common goal of working to make a more transparent internet. Occasionally, the team writes under the pen name of our mascot, “Penny, the Privacy Pro.”