Data anonymization (nonfiction)
Data anonymization is a type of information sanitization whose intent is privacy protection. It is the process of either encrypting or removing personally identifiable information from data sets, so that the people whom the data describe remain anonymous.
The European Union's new General Data Protection Regulation demands that stored data on people in the EU undergo either an anonymization or a pseudonymization process.
Overview
Data anonymization has been defined as a "process by which personal data is irreversibly altered in such a way that a data subject can no longer be identified directly or indirectly, either by the data controller alone or in collaboration with any other party." Data anonymization enables the transfer of information across a boundary, such as between two departments within an agency or between two agencies, while reducing the risk of unintended disclosure, and in certain environments in a manner that enables evaluation and analytics post-anonymization.
In the context of medical data, anonymized data refers to data from which the patient cannot be identified by the recipient of the information. The name, address, and full post code must be removed, together with any other information which, in conjunction with other data held by or disclosed to the recipient, could identify the patient.
De-anonymization is the reverse process in which anonymous data is cross-referenced with other data sources to re-identify the anonymous data source. Generalization and perturbation are the two popular anonymization approaches for relational data. The process of obscuring data with the ability to re-identify it later is also called pseudonymization and is one way companies can store data in a way that is HIPAA compliant.
In the News
Fiction cross-reference
Nonfiction cross-reference
External links:
- Data anonymization @ Wikipedia
- A generalized method for re-identifying people in "anonymized" data-sets @ Boing Boing