Practitioners’ Corner: What Is Personal Information?
Or, Understanding Anonymization, De-Identification, and Aggregation
Practitioners' Corner is a monthly focus on topics of interest to in-house counsel in the implementation of their privacy programs.
Pop quiz: Is an IP address personal information?
Answer: It depends, of course!
The DWT Privacy team frequently encounters the concepts of “anonymous,” “de-identified,” and/or “aggregated” information. For example, a privacy policy might state that the company can use anonymized information for any purpose, a contract might exclude de-identified information from the scope of the parties’ obligations, or a product team might describe a data set as anonymous because it does not contain “personally identifiable information.”
De-identification is an important concept because “de-identified information” is not personal information and thus not within the scope of most data privacy laws. However, determining what constitutes de-identification has become significantly more difficult because lawmakers and regulators have expanded the scope of what they consider to be “personal information” (i.e., information that is or is capable of being related to an individual).
In-house counsel should be vigilant when they hear the terms “de-identified,” “anonymized,” “aggregated,” “not personally identifiable,” or similar terms.
The first issue to consider is what constitutes de-identification. Generally, it is the process of transforming information that is or can be linked to a person into information that cannot be linked to a person. There are degrees of de-identification, however, and privacy laws and regulations can require different levels of de-identification before they treat information as out of scope.
For instance, if a database with 100 customers contains my name, phone number, randomly assigned customer ID, and the fact that I bought a pair of socks, and the sock company removes everyone’s name and phone number from that database (i.e., the sock company knows only that there was a sale of a pair of socks to certain customers represented only by customer IDs), the sock company will have removed the information that directly identifies the customers and thus achieved one level of de-identification: “pseudonymization.”
The customers’ identities have not been fully removed from the database, however, because the customer ID represents each individual as distinct from other customers. For this reason, pseudonymized data is still considered personal information.
If the sock company goes further by removing the customer IDs, the information in the database will become “aggregated.” At that point, the sock company can use the data only to generate statistics about how many socks were sold, and the sock company does not know which purchase was attributable to which individual. (If you want to explore the spectrum of de-identification methods, the Future of Privacy Forum’s A Visual Guide to Practical Data De-Identification is a useful point of reference.)
Consider these terms in the context of the California Consumer Privacy Act’s (CCPA) definition of personal information:
“[I]nformation that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household….Personal information includes… unique personal identifier…Internet Protocol address…Internet or other electronic network activity information, including, but not limited to, browsing history, search history, and information regarding a consumer’s interaction with an Internet Web site, application, or advertisement…Audio, electronic, visual, thermal, olfactory, or similar information…[and] Inferences drawn from any of the information identified in this subdivision to create a profile about a consumer reflecting the consumer’s preferences, characteristics, psychological trends, predispositions, behavior, attitudes, intelligence, abilities, and aptitudes.”
This is a huge range of information. In our sock company example, the database containing only my customer ID and purchase history would still be personal information, and subject to the CCPA, because the customer ID is designed to be a “unique personal identifier.” Similarly, cookie data capturing only device IDs or IP addresses will be considered personal information, even if the company never knows the names of the individuals behind those identifiers.
Another factor in determining whether data has been effectively “de-identified” is the acceptable risk of re-identification. One element that increases the risk of re-identification is the possibility that a supposedly de-identified data set can be matched with other data maintained by the same organization or a downstream recipient.
If the sock company creates the pseudonymous database but keeps the original records containing my name and telephone number, the sock company could easily re-identify the pseudonymous database by matching customer IDs to the separately stored names.
We are not suggesting that de-identification is impossible under modern data privacy law, but if you intend to rely on de-identification to mitigate your compliance obligations, you should:
- understand what degree of de-identification is necessary under applicable law to transform personal information into de-identified information; and
- scrutinize the proposed method of de-identification carefully and consider all of the ways (regardless of their likelihood) that the information might be re-identified—including within your own organization or by third parties.
Finally, beware of claims that information will be de-identified based simply on the notion that direct identifiers like name and address will be removed. The sock example above illustrates that merely removing direct identifiers likely will not suffice under the CCPA and similar statutes.