Bulk Data Access IG, published by HL7 International / FHIR Infrastructure. This guide is not an authorized publication; it is the continuous build for version 4.0.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/bulk-data/ and changes regularly. See the Directory of published versions
| Page standards status: Trial-use |
The Bulk Publish operation is intended to be used by developers at organizations that aim to interoperate by sharing large FHIR datasets. It defines the application programming interfaces (APIs) through which a Data Consumer may retrieve FHIR bulk data files from a Data Provider. These files may be provided at an open endpoint, or may require the Data Consumer to authenticate and authorize access to retrieve the data.
The Bulk Publish API does not require a FHIR server implementation, and Data Providers may implement it using a simple HTTP server that returns a Bulk Publish manifest in response to GET requests at a path that ends in /$bulk-publish, and a set of HTTP endpoints that serve the bulk data files referenced from that manifest.
For a high-level comparison of Bulk Export, Bulk Submit, and Bulk Publish, see Choosing a Bulk Operation.
In contrast to the Bulk Export operation, the Bulk Publish operation returns static manifests and bulk data files, and does not provide a mechanism for a Data Consumer to retrieve a filtered subset of the available data. Systems that return infrequently updated reference information may wish to use the Bulk Publish operation instead of the Bulk Export operation to reduce the complexity and cost involved in hosting and providing this information.
Expected use cases include the publication of provider directory information, formulary information and open scheduling slots.
All exchanges described herein between a Data Consumer and a Data Provider SHOULD be secured using Transport Layer Security (TLS) Protocol Version 1.2 (RFC5246) or a more recent version of TLS. Use of mutual TLS is OPTIONAL. With each of the requests described herein, implementers MAY implement OAuth 2.0 access management in accordance with the SMART Backend Services Authorization Profile.
There are two primary roles involved in a Bulk Publish transaction:
Data Provider: Server that hosts the Bulk Publish manifest and files listed in the manifest.
Data Consumer: Client that retrieves the Bulk Publish manifest and bulk data files and attachments.
Request for a fully static or periodically updated dataset in FHIR format. For a visual overview of how a Data Consumer processes a Bulk Publish manifest, see the Data Consumer Workflow diagram below.
GET [base]/$bulk-publish
$bulk-publish segment.If-None-Match with each request to avoid retrieving data when nothing has changed since the last request. Data Providers SHOULD support the use of this header.If-None-Match value matches the current ETag, a Data Provider MAY return 304 Not Modified.The Data Provider SHALL return an error response with HTTP status 4XX or 5XX.
The body of the response SHOULD be a FHIR OperationOutcome resource in JSON format. If this is not possible (for example, the infrastructure layer returning the error is not FHIR aware), the Data Provider MAY return an error message in another format and include a corresponding value for the Content-Type header.
When the body is a FHIR OperationOutcome resource, the response SHALL include a Content-Type header of application/fhir+json.
The Data Provider SHALL return a manifest response with:
200 OKContent-Type header of application/jsonThe response SHALL include an ETag header. The ETag value SHALL change when the manifest body changes.
The output manifest is a JSON object providing metadata and links to the generated FHIR Bulk Data files. These files SHALL be accessible to the Data Consumer at the URLs advertised. The manifest and these URLs MAY be served by file servers other than the Data Provider's FHIR-specific server.
The Data Provider MAY update the manifest at any time and SHALL use the transactionTime element to indicate when the files were generated. The response SHOULD NOT include any FHIR resources modified after this instant, and SHALL include any matching resources modified up to and including this instant. File URLs SHALL NOT be reused between updates unless their contents have remained the same, and files that no longer appear in a manifest SHOULD remain available for a grace period following an update to avoid interrupting downloads that are in progress.
The Data Provider SHOULD populate the updateCadence element to indicate the frequency with which the Data Provider expects to update the manifest.
Data Providers SHOULD set a reasonable Cache-Control header on the manifest (e.g., public, max-age=10) and SHOULD serve immutable files with long-lived caching headers (e.g., public, max-age=31536000, immutable).
| Field | Cardinality | Type | Description |
|---|---|---|---|
manifestType |
0..1 | canonical | Canonical URL of the OperationDefinition for the operation associated with the provision of this manifest. E.g., |
transactionTime |
1..1 | instant | Indicates the Data Provider's time when the files in this published manifest were generated. The published files referenced in this manifest SHOULD NOT include any resources modified after this instant, and SHALL include any matching resources modified up to and including this instant. |
epochStartTime |
0..1 | instant | The timestamp when the current epoch began, used to support incremental manifest updates. When the epoch changes, epochStartTime and transactionTime SHALL be identical. Within an epoch, file lists in output, deleted, and error are append-only and file contents are immutable; an epoch reset establishes a new baseline by regenerating a complete snapshot. Data Providers that incrementally update a manifest and periodically reset to a snapshot SHALL populate this element. Data Providers that always return a complete snapshot MAY populate or omit this element. |
updateCadence |
0..1 | string | ISO 8601 duration indicating the typical rate at which new files will be added to the manifest (e.g., "PT1H"). When provided, Data Consumers SHOULD use this value to choose a polling interval for subsequent requests. |
requiresAccessToken |
1..1 | boolean | Indicates whether downloading the files referenced in this manifest requires the same authorization mechanism as access to the manifest itself. Value SHALL be true when both the manifest endpoint and published file endpoints control access using OAuth 2.0 bearer tokens. Value MAY be false when files are exposed through other access-control schemes such as capability URLs or verifiable file servers within an organization's firewall. |
outputFormat |
0..1 | string | MIME type of the published files referenced in this manifest. Defaults to application/fhir+ndjson when omitted. Describes the expected format of the published files. |
outputOrganizedBy |
0..1 | string | When resources in the output files are organized by instances of a resource type, that resource type is specified here. When each output file contains a single resource type, this element SHALL be omitted and an individual type element SHALL be included for each file in the output array. |
outputOrganizedByDetail |
0..1 | string | Narrative text providing detail on the organizing resource listed in outputOrganizedBy. SHALL NOT be populated in the absence of the outputOrganizedBy element. |
output |
0..* | BackboneElement | An array of file items with one entry for each generated file. |
↳ url |
1..1 | url | The absolute path to the file. The format of the file SHOULD match the outputFormat element in this manifest when that element is populated. |
↳ type |
0..1 | string | The FHIR resource type contained in the file. When the manifest does not include an outputOrganizedBy value, this element SHALL be populated. When the manifest includes the outputOrganizedBy element, this element SHALL NOT be populated. |
↳ continuesInFile |
0..1 | url | URL of the next output file when resources for an organizing resource span multiple files. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
deleted |
0..* | BackboneElement | References to files containing pointers to deleted resources in the form of FHIR Transaction Bundles. Each line in the output files SHALL contain a FHIR Bundle with a type of transaction which SHALL contain one or more entry items that reflect a deleted resource. In each entry, the request.url and request.method elements SHALL be populated and request.method SHALL be set to DELETE. |
↳ url |
1..1 | url | The absolute path to the file. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
error |
0..* | BackboneElement | Files containing OperationOutcome resources. Error, success, warning, information and other messages related to the operation SHOULD be included here (not in output). This element will be renamed to status in a future release of this IG. |
↳ url |
1..1 | url | The absolute path to the file. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
↳ countSeverity |
0..* | BackboneElement | Count of OperationOutcome resources grouped by severity level. |
↳ code |
1..1 | code | Severity level from OperationOutcome.issue.severity (fatal, error, warning, information, success) |
↳ count |
1..1 | integer | The number of OperationOutcome resources in the file with this severity level. |
Implementation notes:
transactionTime, to properly meet the inclusion constraints above, a Data Provider might need to wait for pending updates in its publishing pipeline or source systems to resolve before publishing a new manifest.error and not in output. If there are no relevant messages, the Data Provider SHOULD return an empty array.The Data Provider MAY incrementally update a manifest by adding data files to the output array element that contain new resources and/or resources that replace versions of the resources in earlier files in the output array that have the same resource id. Additionally, the Data Provider MAY add files with Bundle resources indicating resources that have been deleted to the deleted array element (see details below), and MAY add files to the error array element. When generating a manifest that will be subsequently updated with these incremental changes, the Data Provider SHALL populate an epochStartTime element. When initially published, this value SHALL have the same value as the transactionTime element. Subsequently, adding files to the output, deleted, and error arrays of a manifest SHALL cause the transactionTime element for that manifest to advance, and the epochStartTime value SHALL remain the same. If a Data Provider is refreshing the manifest and no resources have been added, deleted, or updated since the transactionTime in the current manifest, the Data Provider SHOULD advance the transactionTime to the current time to indicate that the Data Provider is regularly publishing updates. Periodically, the Data Provider MAY generate a manifest that is a complete snapshot of the data (a new epoch), updating the output array and error array, emptying the deleted array, and setting new epochStartTime and transactionTime values. When a manifest is incrementally updated, apart from when it is reset to a new epoch, the order of files in the output, deleted, and error arrays in the manifest SHALL NOT change, the file contents SHALL not change, and the files SHALL remain retrievable.
Data Providers SHALL structure the manifests such that a Data Consumer can obtain a complete data set when processing a manifest by (1) inserting or updating all FHIR resources in files in the output array that have not been previously processed, followed by (2) deleting all resources listed in files in the deleted array that have not been previously processed.
Minimal, non-incremental manifest:
This content is an example of the Bulk Publish Manifest Logical Model and is not a FHIR Resource
{
"resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkPublishManifest",
"id": "BulkPublishManifestMinimalExample",
"manifestType": "http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish",
"transactionTime": "2021-01-01T00:00:00Z",
"requiresAccessToken": false,
"output": [
{
"type": "Organization",
"url": "https://example.com/output/organization_1.ndjson"
},
{
"type": "Organization",
"url": "https://example.com/output/organization_2.ndjson"
}
]
}
Example manifest at the epoch start:
This content is an example of the Bulk Publish Manifest Logical Model and is not a FHIR Resource
{
"resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkPublishManifest",
"id": "BulkPublishManifestEpochStartExample",
"manifestType": "http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish",
"epochStartTime": "2021-01-01T00:00:00Z",
"updateCadence": "PT1H",
"transactionTime": "2021-01-01T00:00:00Z",
"requiresAccessToken": false,
"output": [
{
"type": "Organization",
"url": "https://example.com/output/organization_1.ndjson"
},
{
"type": "Organization",
"url": "https://example.com/output/organization_2.ndjson"
}
]
}
Manifest after first incremental update:
This content is an example of the Bulk Publish Manifest Logical Model and is not a FHIR Resource
{
"resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkPublishManifest",
"id": "BulkPublishManifestIncrementalUpdateExample",
"manifestType": "http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish",
"epochStartTime": "2021-01-01T00:00:00Z",
"updateCadence": "PT1H",
"transactionTime": "2021-01-01T01:00:00Z",
"requiresAccessToken": false,
"output": [
{
"type": "Organization",
"url": "https://example.com/output/organization_1.ndjson"
},
{
"type": "Organization",
"url": "https://example.com/output/organization_2.ndjson"
},
{
"type": "Organization",
"url": "https://example.com/output/organization_3.ndjson"
}
],
"deleted": [
{
"url": "https://example.com/output/deleted_org_1.ndjson"
}
]
}
Deleted resource bundle (represents one line in an output file):
{
"resourceType" : "Bundle",
"id" : "deleted-resource-transaction-bundle-example",
"meta" : {
"lastUpdated" : "2020-04-27T02:56:00Z"
},
"type" : "transaction",
"entry" : [
{
"request" : {
"method" : "DELETE",
"url" : "Patient/123"
}
}
]
}
Using the URLs supplied by the Data Provider in the manifest, a Data Consumer MAY download the referenced output, deleted, and error files.
If the requiresAccessToken element in the manifest is set to true, the request SHALL include a valid access token. See Security Considerations above.
If the requiresAccessToken element is set to false and no additional authorization-related extensions are present in the relevant manifest entry, then the referenced URLs SHALL be dereferenceable directly (a "capability URL"). A Data Consumer SHALL NOT provide a SMART Backend Services access token when dereferencing a URL from a manifest entry where requiresAccessToken is false.
A single data file SHALL include only the most recent version of any resource, though manifests that are updated incrementally MAY include an updated version of the resource in a subsequent file. Inclusion of the Resource.meta information in the resources is at the discretion of the Data Provider (as it is for all FHIR interactions).
A Data Consumer SHOULD provide an Accept-Encoding header when requesting output files and SHOULD include gzip compression as one of the encoding options in the header. A Data Provider SHALL provide output files as uncompressed, with gzip compression, or with another compression format from the Accept-Encoding header. When compression is used, a Data Provider SHALL communicate this to the Data Consumer by including a Content-Encoding header in the response. A Data Consumer SHALL accept files that are uncompressed or encoded with gzip compression, and MAY accept files encoded with other compression formats.
GET [url from manifest output, deleted, or error element]
Accept (optional, defaults to application/fhir+ndjson)Specifies the format of the file being requested.
The Data Provider SHALL return a successful file response with:
200 OKContent-Type header that matches the file format being deliveredFor files in NDJSON format, the Content-Type header SHALL be application/fhir+ndjson.
The Data Provider SHALL return an error response with HTTP status 4XX or 5XX.
Output files may be organized by resource type, or by instances of a resource type specified in the outputOrganizedBy element.
When the outputOrganizedBy element in the manifest is not populated, each output file SHALL contain resources of only one type, and a Data Provider MAY create more than one file for each resource type returned. The number of resources contained in a file MAY vary between Data Providers and files.
When the outputOrganizedBy element is populated with a resource type, the output files SHALL be populated with blocks consisting of a header Parameters resource containing a parameter named header with a reference to a resource of the type specified by outputOrganizedBy, followed by the resource referenced in this header and resources that reference the resource referenced in the header (together a "resource block"). Each output file MAY contain multiple resource blocks and, when possible, a single resource's block SHOULD NOT be split across files. If a resource block does span more than one file, the header SHALL be repeated at the start of each file where the block continues, and the association between these files SHALL be documented in the manifest using the continuesInFile element in the relevant output array items.
Resources that would otherwise be included in the data set, but do not have references to the resource type specified in the outputOrganizedBy element, MAY be included in resource blocks that contain resources they reference, MAY be repeated in every resource block, or MAY be omitted from the data set.
outputOrganizedBy element is set to Patient, Data Providers SHOULD use the Patient Compartment Definition to determine a base set of related resources to include in a resource block, though other resources may also be included.
For other resource types, we are soliciting feedback on the best approach for documenting the set of resources in a resource block. Implementation Guides MAY reference a Compartment Definition, populate a GraphDefinition Resource, include narrative text, or use another approach.
Example NDJSON file when the manifest does not include outputOrganizedBy:
{"id":"p-1","resourceType":"Patient", "name":[{"given":["Brenda"],"family":"Jackson"}],"gender":"female", ...}
{"id":"p-2","resourceType":"Patient", "name":[{"given":["Bram"],"family":"Sandeep"}],"gender":"male", ...}
{"id":"p-3","resourceType":"Patient", "name":[{"given":["Sandy"],"family":"Hamlin"}],"gender":"female", ...}
{...}
Example NDJSON file when outputOrganizedBy is set to Patient:
{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-1"}}]}
{"id": "p-1", "resourceType": "Patient", ...}
{"id": "c-1", "resourceType": "Condition", "subject":{"reference": "Patient/p-1"}, ...}
{"id": "o-1", "resourceType": "Observation", "subject":{"reference": "Patient/p-1"}, ...}
{...}
{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-2"}}]}
{"id": "p-2", "resourceType": "Patient", ...}
{"id": "c-101", "resourceType": "Condition", "subject":{"reference": "Patient/p-2"}, ...}
{"id": "o-102", "resourceType": "Observation", "subject":{"reference": "Patient/p-2"}, ...}
{...}
If resources in an output file contain elements of the type Attachment, the Data Provider SHOULD populate the Attachment.contentType code as well as either the data element or the url element. If the data element is not populated and the url element is populated, the url element SHALL be an absolute URL that can be dereferenced to the attachment's content.
When the url element is populated with an absolute URL and the requiresAccessToken element in the manifest is set to true, the URL location SHALL be accessible by a Data Consumer with a valid access token, and SHALL NOT require the use of additional authentication credentials. When the url element is populated and the requiresAccessToken element in the manifest is set to false, the URL location SHALL be accessible by a Data Consumer without an access token.
Note that if a Data Provider copies files to the Bulk Data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the Attachment.url element when generating the Bulk Data output files.
epochStartTime has not changed, referenced files SHALL remain retrievable and SHALL NOT return 404 or 410. If this invariant is violated, Data Consumers MAY retry and/or alert.updateCadence.