De-Identification, Anonymization, Redaction Toolkit Services, published by HL7 International / Cross-Group Projects. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/fhir-darts/ and changes regularly. See the Directory of published versions
| Page standards status: Trial-use |
This section clarifies the basic service definitions specified in the IG and provides the context in which they need to be used.
Pseudonymization is the process by which PII/PHI can no longer be attributed to a specific patient without the use of additional information. Pseudonymization does not remove PII/PHI but rather it translates the information into a token. Pseudonymized data is still considered PII/PHI as it can be re-identified with a mapping key.
For example, Patient John Doe may be referred to as Patient_12345, which is produced by using a mapping key or algorithm, such as a hashing algorithm and is always linked back to John Doe.
Under the HIPAA privacy rule in 45 CFR §164.514(a), de-identified information is, “Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual.”
Alternately, de-identification is the process of removing or transforming identifiers so that an individual cannot be readily identified, with a very low risk of re-identification. Note that there are de-identification use cases outside the scope of this IG that may require re-identification with re-identification performed in controlled circumstances; re-identification is not part of this IG.
For example, patient John Doe’s data will not include any patient identifier or name.
According to HHS Safe Harbor Guidance, 18 different patient-related attributes should be removed for the data to be called de-identified data. These attributes that need to be removed are specified in the HHS Safe Harbor De-identification Standard and are listed here for convenience. See the standard for the full legal requirements.
HHS guidance for de-identification also includes the Expert Determination method which can also be used to create a de-identification service.
Re-identification uses a unique code assigned to the set of de-identified health information during the de-identification process to enable re-identification. The entity submitting the de-identified information is responsible for protecting the re-identification information as per organization, local, state and federal policies
Anonymization is the irreversible process of transforming data so that individuals cannot be identified by any reasonably likely means, now or in the future. In other words, data is anonymous if individuals are not identifiable. Taking into account all means “reasonably likely” to be used to identify the person.
For example, John Doe, a male patient aged 56 with diabetes, would be included in an age group between 50 and 60 consisting of many individuals with diabetes who are males. Location information would be generalized from a specific address to a large geographical area such as one or more states or the entire country. Rare data points would be combined into a group such as “Other” to avoid identification.
The above definitions are used to create the necessary services to transform identifiable data into de-identified or anonymized data.
The table below summarizes the differences between the above three services
| Characteristic | Psuedonymization | De-identification | Anonymization |
| Identifiability | Moderate | Low | Extremely Low |
| Ability to link to original record | Always | Possible if needed | No |
| Reversible process | Yes | Sometimes based on need | No |
| Legal status | Considered PHI | Not considered PHI | Not considered PHI |
| Goal | Mask Identity | Reduce Identification risk | Eliminate Identification risk |
The section identifies the business needs and specific user stories for DARTS IG.
Use Case 1: Federal Agency Reporting:
Currently there are many reporting programs across the US where health care organizations submit data to state and federal agencies in aggregate form. While these aggregate reports meet current mandates, many agencies desire to obtain more detailed line-level information instead of aggregate data. Federal and state agencies may not have the authorities necessary to receive PII/PHI data and therefore requires the data to be de-identified to receive more granular data. The following are the examples of such reporting
Use Case 2: Clinical Research Reporting:
There are many research programs that require researchers to track specific diseases and treatments. For example, a researcher studying diabetes outcomes will require labs, medications, approximate timelines but will not require the identity of the individual. This requires PII/PHI to be de-identified and then submitted to the researcher.
Use Case 3: AI/ML Model to Predict Hospital Readmissions
As the use of AI/ML models increase, new models are being created which need data to predict outcomes such as the number of hospital readmissions expected for the month. These AI models do not require identifiable information but require the clinical encounter information and the context of these encounters which can be achieved by de-identifying the information before submitting to the AI model.
Use Case 4: Quality Improvement Initiatives
Quality improvement initiatives that focus on patient outcomes require line level granular data without identifiable information. De-identification is necessary before submission to these quality improvement programs.
Use Case 1: Federal agencies publish disease Prevalence Statistics:
Federal agencies collect data to perform analysis and publish common data sets such as disease prevalence statistics in open data sets that can be used by the public. This requires anonymization of the data without re-identification risk.
Use Case 2: Datasets Released to Global Researchers:
Hospitals, aggregators and agencies may wish to release certain datasets to global researchers to enable innovations and studies. These data sets cannot have any re-identification risk and will require anonymization of the data before compiling the data set and releasing it to the global research community.
Use Case 3: Publications in Journals
Publishing of data in journals requires only aggregated insights and does not require granular PHI/PII details. This can be achieved by anonymizing the data sets without risk of re-identification for studies and publications.
Use Case 4: Data Sharing with Third-Party Analytics Vendors
Many entities share data with third-party analytics vendors to examine population analytics and trends and create interventions based on analytics etc. These situations do not require PHI/PII as these data sets should avoid re-identification making anonymized data the appropriate form to submit to these vendors for analytics.
The table below summarizes the what services to consider for the different kind of use cases
| Use Case | Applicable Services | Additional Information |
| Clinical Care | None | Identity is needed for clinical care |
| Analytics within Enterprise | Psuedonymization | Linkage to records maintained for follow-up studies |
| Multi-site analytics | Psuedonymization + Deidentification | Linkage to avoid duplicates, but reduce identification risk |
| Federal Reporting when receiver cannot receive PHI | Deidentification | Line Level data needed for analysis |
| AI Model Training within Enterprise | Deidentification | Line Level data needed for training |
| AI Model Training outside enterprise | Anonymization | Eliminate identification risk |
| Public DataSet Release | Anonymization | Eliminate identification risk |
This section contains a list of actors based on the above use cases.
A DARTS Service Provider is an actor that implements the pseudonymization, de-identifciation and anonymization services defined in this implementation guide. Examples of these actors include cloud service providers such as Amazon, Microsoft.
A DARTS Consumer is an actor who uses the services provided by the DARTS Service Provider. Examples of these actors include data submitters such as Health centers, Hospitals and their EHR systems. Data Receivers such as Federal agencies are also DARTS consumers as they use the DAPL IG and profiles to validate the information being received from data submitters.
The risk of identification when using the services must be ascertained by the health care organization releasing or providing the data based on the use case. The following are links that provide valuable industry regulations and guidance for risk assessments