Use Cases and Workflows

De-Identification, Anonymization, Redaction Toolkit Services
1.0.0-ballot - STU 1 Ballot United States of America flag

De-Identification, Anonymization, Redaction Toolkit Services, published by HL7 International / Cross-Group Projects. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/fhir-darts/ and changes regularly. See the Directory of published versions

Page standards status: Trial-use

Service Definitions
Summary of Differences between the Various Services
Business Need and Use Cases
Use Cases Mapped to Services
Actors and Definitions
Identification Risk

Service Definitions

This section clarifies the basic service definitions specified in the IG and provides the context in which they need to be used.

Pseudonymization

Pseudonymization is the process by which PII/PHI can no longer be attributed to a specific patient without the use of additional information. Pseudonymization does not remove PII/PHI but rather it translates the information into a token. Pseudonymized data is still considered PII/PHI as it can be re-identified with a mapping key.

For example, Patient John Doe may be referred to as Patient_12345, which is produced by using a mapping key or algorithm, such as a hashing algorithm and is always linked back to John Doe.

De-identified data

Under the HIPAA privacy rule in 45 CFR §164.514(a), de-identified information is, “Health information that does not identify an individual and with respect to which there is no reasonable basis to believe that the information can be used to identify an individual.”

Alternately, de-identification is the process of removing or transforming identifiers so that an individual cannot be readily identified, with a very low risk of re-identification. Note that there are de-identification use cases outside the scope of this IG that may require re-identification with re-identification performed in controlled circumstances; re-identification is not part of this IG.

For example, patient John Doe’s data will not include any patient identifier or name.

HHS Safe Harbor Guidance for De-identification

According to HHS Safe Harbor Guidance, 18 different patient-related attributes should be removed for the data to be called de-identified data. These attributes that need to be removed are specified in the HHS Safe Harbor De-identification Standard and are listed here for convenience. See the standard for the full legal requirements.

Names
Addresses and geographic locations
Dates
Telephone numbers
Fax numbers
Email addresses
Social Security numbers
Medical Record numbers
Health plan beneficiary numbers
Account numbers
Certificate/license numbers
Vehicle identifiers and license plate numbers
Device Identifiers and serial numbers
Web Universal Resource Locators (URLs)
Internet Protocol (IP) addresses
Biometric identifiers including finger and voice prints
Full-face photographs and comparable images
Any other unique identifying characteristic, code of the individual

HHS Expert Determination Guidance for De-identification

HHS guidance for de-identification also includes the Expert Determination method which can also be used to create a de-identification service.

Re-identification of de-identified information

Re-identification uses a unique code assigned to the set of de-identified health information during the de-identification process to enable re-identification. The entity submitting the de-identified information is responsible for protecting the re-identification information as per organization, local, state and federal policies

Anonymization

Anonymization is the irreversible process of transforming data so that individuals cannot be identified by any reasonably likely means, now or in the future. In other words, data is anonymous if individuals are not identifiable. Taking into account all means “reasonably likely” to be used to identify the person.

For example, John Doe, a male patient aged 56 with diabetes, would be included in an age group between 50 and 60 consisting of many individuals with diabetes who are males. Location information would be generalized from a specific address to a large geographical area such as one or more states or the entire country. Rare data points would be combined into a group such as “Other” to avoid identification.

The above definitions are used to create the necessary services to transform identifiable data into de-identified or anonymized data.

Summary of Differences between the Various Services

The table below summarizes the differences between the above three services

Characteristic	Psuedonymization	De-identification	Anonymization
Identifiability	Moderate	Low	Extremely Low
Ability to link to original record	Always	Possible if needed	No
Reversible process	Yes	Sometimes based on need	No
Legal status	Considered PHI	Not considered PHI	Not considered PHI
Goal	Mask Identity	Reduce Identification risk	Eliminate Identification risk

Business Need and Use Cases

The section identifies the business needs and specific user stories for DARTS IG.

Use Cases for De-identification

Use Case 1: Federal Agency Reporting:

Currently there are many reporting programs across the US where health care organizations submit data to state and federal agencies in aggregate form. While these aggregate reports meet current mandates, many agencies desire to obtain more detailed line-level information instead of aggregate data. Federal and state agencies may not have the authorities necessary to receive PII/PHI data and therefore requires the data to be de-identified to receive more granular data. The following are the examples of such reporting

Reporting from Federally Qualified Health Centers (FQHCs) to Federal Agencies

Use Case 2: Clinical Research Reporting:

There are many research programs that require researchers to track specific diseases and treatments. For example, a researcher studying diabetes outcomes will require labs, medications, approximate timelines but will not require the identity of the individual. This requires PII/PHI to be de-identified and then submitted to the researcher.

Use Case 3: AI/ML Model to Predict Hospital Readmissions

As the use of AI/ML models increase, new models are being created which need data to predict outcomes such as the number of hospital readmissions expected for the month. These AI models do not require identifiable information but require the clinical encounter information and the context of these encounters which can be achieved by de-identifying the information before submitting to the AI model.

Use Case 4: Quality Improvement Initiatives

Quality improvement initiatives that focus on patient outcomes require line level granular data without identifiable information. De-identification is necessary before submission to these quality improvement programs.

Use Cases for Anonymization

Use Case 1: Federal agencies publish disease Prevalence Statistics:

Federal agencies collect data to perform analysis and publish common data sets such as disease prevalence statistics in open data sets that can be used by the public. This requires anonymization of the data without re-identification risk.

Use Case 2: Datasets Released to Global Researchers:

Hospitals, aggregators and agencies may wish to release certain datasets to global researchers to enable innovations and studies. These data sets cannot have any re-identification risk and will require anonymization of the data before compiling the data set and releasing it to the global research community.

Use Case 3: Publications in Journals

Publishing of data in journals requires only aggregated insights and does not require granular PHI/PII details. This can be achieved by anonymizing the data sets without risk of re-identification for studies and publications.

Use Case 4: Data Sharing with Third-Party Analytics Vendors

Many entities share data with third-party analytics vendors to examine population analytics and trends and create interventions based on analytics etc. These situations do not require PHI/PII as these data sets should avoid re-identification making anonymized data the appropriate form to submit to these vendors for analytics.

Use Cases Mapped to Services

The table below summarizes the what services to consider for the different kind of use cases

Use Case	Applicable Services	Additional Information
Clinical Care	None	Identity is needed for clinical care
Analytics within Enterprise	Psuedonymization	Linkage to records maintained for follow-up studies
Multi-site analytics	Psuedonymization + Deidentification	Linkage to avoid duplicates, but reduce identification risk
Federal Reporting when receiver cannot receive PHI	Deidentification	Line Level data needed for analysis
AI Model Training within Enterprise	Deidentification	Line Level data needed for training
AI Model Training outside enterprise	Anonymization	Eliminate identification risk
Public DataSet Release	Anonymization	Eliminate identification risk

Actors and Definitions

This section contains a list of actors based on the above use cases.

DARTS Service Provider

A DARTS Service Provider is an actor that implements the pseudonymization, de-identifciation and anonymization services defined in this implementation guide. Examples of these actors include cloud service providers such as Amazon, Microsoft.

DARTS Consumer

A DARTS Consumer is an actor who uses the services provided by the DARTS Service Provider. Examples of these actors include data submitters such as Health centers, Hospitals and their EHR systems. Data Receivers such as Federal agencies are also DARTS consumers as they use the DAPL IG and profiles to validate the information being received from data submitters.

Identification Risk

The risk of identification when using the services must be ascertained by the health care organization releasing or providing the data based on the use case. The following are links that provide valuable industry regulations and guidance for risk assessments

IG © 2026+ HL7 International / Cross-Group Projects. Package hl7.fhir.us.darts#1.0.0-ballot based on FHIR 4.0.1. Generated 2026-05-07
Links: Table of Contents | QA Report | Version History | | Propose a change