De-Identification, Anonymization, Redaction Toolkit Services, published by HL7 International / Cross-Group Projects. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/fhir-darts/ and changes regularly. See the Directory of published versions
| Page standards status: Trial-use |
This section clarifies the basic service definitions specified in the IG and provides the context in which they need to be used.
Pseudonymization is the process by which PII/PHI can no longer be attributed to a specific patient without the use of additional information. Pseudonymization does not remove PII/PHI but rather it translates the information into a token. Pseudonymized data is still considered PII/PHI as it can be reidentified with a mapping key.
For example, Patient John Doe may be called as Patient_12345, which is produced by using some kind of mapping key or algorithm and is always linked back to John Doe.
Under the HIPAA privacy rule in 45 CFR §164.514(a), de-identified information is, “Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual.”
Alternately, de-identification is the process of removing or transforming identifiers so that an individual cannot be readily identified, with a very low risk of re-identification. Note that there are de-identification use cases that may require re-identification with re-identification performed in controlled circumstances; re-identification is not part of this IG.
For example, patient John Doe’s data will not have any patient identifier or name.
According to HHS Safe Harbor Guidance, 18 different patient-related attributes should be removed for the data to be called de-identified data. These attributes that need to be removed are specified in the HHS Safe Harbor De-identification Standard and are listed here for convenience.
During the de-identification process, the assignment of a unique code to the set of de-identified health information to permit re-identification is the prescribed method of reidentifying information. The entity that is submitting the de-identified information is responsible to protect the reidentified information as per organization, local, state and federal policies.
Anonymization is the irreversible process of transforming data so that individuals cannot be identified by any reasonably likely means, now or in the future. In other words, data is anonymous if individuals are not identifiable, taking into account all means “reasonably likely” to be used to identify the person.
For example, John Doe a male patient, 56 and with diabetes needs to be included in a data set. John Doe's record would be included in a data set without any identifying information such as names or date of birth; instead John Doe's record would be included in an age group between 50 and 60 consisting of many individuals with Diabetes who are males. Location information would be generalized from a specific address to a large geographical area such as one or more states or the entire country. The greater number of individuals with similar characteristics the lower the risk of identification. Rare data points would be combined into a group such as "Other" to avoid identification.
The table below summarizes the differences between the above three services
| Characteristic | Psuedonymization | De-identification | Anonymization |
| Identifiability | Moderate | Low | Extremely Low |
| Linkability to original record | Always | Possible if needed | No |
| Reversible process | Yes | Sometimes based on need | No |
| Legal status | Considered PHI | Not considered PHI | Not considered PHI |
| Goal | Mask Identity | Reduce Identification risk | Eliminate Identification risk |
The section identifies the business needs and specific user stories for DARTS IG.
Use Case 1: Federal Agency Reporting:
Currently there are many reporting programs across the US where health care organizations submit data to state and federal agencies in aggregate form. While these aggregate reports meet current mandates, there is a desire to obtain more detailed line-level information instead of aggregate data. However, in many instances the federal agency may not have the authorities necessary to receive PII/PHI data and hence requires the data to be de-identified in order to receive more granular data. The following are the examples of such reporting
Use Case 2: Clinical Research Reporting:
There are many research programs that require researchers to track a specific disease and treatment. For example, a researcher studying diabetes outcomes will require labs, medications, approximate timelines but does not require the identity of the individual. This requires PII/PHI to be de-identified and then submitted to the researcher.
Use Case 3: AI/ML Model to predict hospital readmissions
As the use of AI/ML models increase new models are being created which need data to predict outcomes such as the number of hospital readmissions expected for the month. These AI models do not require identifiable information but require the clinical encounter information and the context of these encounters which can be achieved by de-identifying the information before submitting to the AI model.
Use Case 4: Quality Improvement Initiatives
Quality improvement initiatives different from clinical research programs that are measuring treatment effectiveness, patient outcomes require line level granular data without identifiable information and de-identification is necessary before submission to these quality improvement programs.
Use Case 1: Federal Agency published Disease Prevalance statistics:
Federal agencies collect data from the states and perform analysis and publish common data sets such as disease prevalence statistics that are an open data set and can be used by the public. This requires anonymization of the data without reidentification risk.
Use Case 2: Datasets Released to Global Researchers:
Hospitals, aggregators, agencies may wish to release certain datasets to global researchers to enable innovations and studies. These data sets cannot have any reidentification risk and will require anonymization of the data before making the data set and releasing it to the global research community.
Use Case 3: Journal Publications
Publishing of data in journals requires only aggregated insights and does not require granular PHI/PII details. This can be achieved by anonymizing the datasets without risk of reidentification for studies and publications.
Use Case 4: Data Sharing with Third Party Analytics Vendors
Many entities share data with third party analytics vendors to examine population analytics and trends and create interventions based on analytics etc. These situations do not require PHI/PII and the data set should avoid reidentification and hence anonymized data is required to be submitted to these vendors for analytics.
The table below summarizes the what services to consider for the different kind of use cases
| Use Case | Applicable Services | Additional Information |
| Clinical Care | None | Identity is needed for clinical care |
| Analytics within Enterprise | Psuedonymization | Linkage to records maintained for follow-up studies |
| Multi-site analytics | Psuedonymization + Deidentification | Linkage to avoid duplicates, but reduce identification risk |
| Federal Reporting when receiver cannot receive PHI | Deidentification | Line Level data needed for analysis |
| AI Model Training within Enterprise | Deidentification | Line Level data needed for training |
| AI Model Training outside enterprise | Anonymization | Eliminate identification risk |
| Public DataSet Release | Anonymization | Eliminate identification risk |
This section contains a list of actors based on the above use cases.
A DARTS Service Provider is an actor that implements the psuedonymization, de-identifciation and anonymization services defined in this implementation guide.
A DARTS
The risk of identification when using the services defined in this IG has to be ascertained by the health care organization releasing or providing the data based on the use case. The following are links that provide valuable industry regulations and guidance for risk assessments