With modern data stores decreasing the price of storage, it’s now possible to collect more data and keep every scrap. Yet, it has become increasingly difficult to know what type of data you’re collecting and which data is subject to data privacy laws. This lack of visibility is a challenge for privacy teams struggling to manage Subject Rights Requests.
Simply put, if you can’t identify and monitor where an individual’s information resides and how it’s used, you won’t be able to produce it for a Subject Rights Request (SRR). You also won’t be able to modify it or remove it to meet SRR requirements. To locate data related to an individual making a request, you need the capacity to search files across vast repositories and group data together. That calls for data classification.
Security and IT teams may already have data classification schemes in place to rank data according to risk categories and make operations more efficient. Unfortunately, most data classification approaches don’t address the practical needs of privacy teams.
As part of our series on managing Subject Rights Requests, we’ll take a look at how data classification schemes that log and validate repositories of personal data can provide privacy teams with the granular supervision they need to do their job. With a coordinated approach, privacy, IT, and security teams can design and manage data classification processes to match privacy requirements.
A coordinated approach to data classification for privacy management
1.Classify data to flag information subject to privacy laws
What’s in your data stores? Did you know that 21% of files in the cloud contain sensitive data that could be regulated by privacy laws? This includes personal data, protected health information, information about minors, and other types of personally identifiable information (PII).
PII is any information about an individual that can distinguish or trace that individual’s identity. GDPR has loosened the definition of PII so that it now includes more than personal data such as name, email address, and social security number. The scope of PII has expanded to include related information that can be linked to an individual. As a result, you need to follow all of the links from personal data to additional information which could be analyzed and connected back to the source and classify that data as well. This includes information provided by supplemental data sources or observed through automation or analysis (a user went to a certain webpage, purchased a product, etc.) or inferred (such as preferences based on behavior).
Unstructured data, such as customer comments, reviews, blog posts, customer service notes, account management emails, even internal messaging, also needs to be classified as potential sources of personal information. Especially when combined with personal data, this type of unstructured information represents behavior-based insights that can become personal information.
By labeling, grouping, and classifying data, you’ll be able to identify personal data that represents the highest risk and must be handled with care to meet privacy obligations. You’ll also have more fine-grained control over access rights for personal data because you’ll know what’s inside.
2.Automate processes to classify high volumes of data
Make sure you classify data you store, as well as data you process or compute. When data lives in multiple places, it may need to be classified differently, depending on how it is used.
Automated classification and tagging solutions save time and energy to surface, categorize, and prioritize data.
When databases are managed in the cloud, it’s much easier to classify data automatically. Technology can provide insights into database schemas, which can then be analyzed at the top level to identify personal and sensitive personal information. Data can be tagged into key categories related to data protection requirements. Schema analysis can also be run automatically to collect data in a way that keeps information evergreen.
3.Verify automated classification
As great as automated classification is, it can’t do the job alone. Even with comprehensive data discovery and classification, data labels and tags within a data store can’t tell you all the information you need to know. Data may look innocuous on the surface, but actually be sensitive, personal information. Privacy management requires human judgment to confirm categories, provide context, and review results for false positives and negatives.
For example, it’s clear that names, email addresses, and social security numbers are personal data. But what if your database includes a series of numbers for each account? On the surface, these are just numbers. But, if each number represents what political party a person belongs to, that is personal information that should be protected. An automated classification scan will result in false negatives, and you may never know until an audit or data breach uncovers the problem.
To protect personal information and reduce risk, IT and security leaders need to collaborate with privacy professionals as well as business functions to understand the process and intention behind data usage and ascribe meaning to data to create a complete picture. Technologies and processes must support collaboration so that everyone involved in privacy management – from Privacy Offers to legal, compliance and security teams – shares the same information and can adjust quickly.
4.Adapt classifications as needed to meet changing definitions and data lifecycle stages
Any system that classifies and tracks personal data needs to be flexible enough to adapt to new requirements.
Privacy laws are evolving and definitions are changing. For example, we expect that data categories noted in CCPA will be further refined and may require new types of classifications.
Additionally, data doesn’t stay in a static state. Through its lifecycle, data may be moved, amended, appended, redacted, etc. Classification schemes need to adjust and continuously update as data changes.
WireWheel Data Discovery classifies data for privacy management
WireWheel’s data privacy management solution incorporates data classification directly into a central, accessible platform that lets you respond accurately and rapidly to SRRs.
WireWheel connects to Amazon Web Services, Google Cloud Platform, and Microsoft Azure via our API for a rapid scan of your data, including data-related integrations with vendors and partners. We parse structured and unstructured data to find patterns, label and group information according to risk and privacy categories you define. Continuous scanning keeps information current as data is added and processed so you always have the most complete, up-to-date information.