Bulk Data Access IG, published by HL7 International / FHIR Infrastructure. This guide is not an authorized publication; it is the continuous build for version 4.0.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/bulk-data/ and changes regularly. See the Directory of published versions
| Page standards status: Trial-use |
The FHIR Asynchronous Bulk Interaction Pattern, described below, is a FHIR request and response flow that servers can implement for any Operation or Defined Interaction that needs to return a large dataset. This pattern is described in the FHIR R4 and FHIR R5 versions of the FHIR specification, and has been moved into this Implementation Guide going forward.
The Bulk Export Operation and the Bulk Submit Status Operation in this IG build on this pattern.
Use cases that return small amounts of data but may take a lot of time to process may prefer to use the related Asynchronous Interaction Request Pattern.
Servers SHALL support the HTTP methods, URLs, headers, and other parameters that normally apply to the interaction being invoked. Servers SHALL also support the Prefer header described below, and SHOULD support the Accept header and _outputFormat parameter described below.
Accept (string)
Specifies the format of the optional FHIR OperationOutcome resource response to the kick-off request. A client SHOULD provide this header. A server may support any subset of the Serialization Format Representations. If omitted, the server MAY return an error or MAY process the request and return a format selected by the server.
Prefer (string)
Specifies whether the response is immediate or asynchronous. Setting this to respond-async triggers this asynchronous bulk pattern, though operations that can only be invoked asynchronously MAY default to this behavior or MAY return an error when this header is not provided.
A client MAY also provide a second Prefer header value of separate-export-status, so the combined Prefer header for the kick-off request is Prefer: respond-async,separate-export-status. If this header value is included by a client and is supported by a server, the server SHALL return the header Preference-Applied with values of respond-async and separate-export-status in its response. These may be provided as comma-delimited values or the header may be repeated for each value.
When a Prefer header value of separate-export-status is provided in the kick-off request and supported by the server, the HTTP status code in the response to a Bulk Data Status Request SHALL reflect the status request itself, and not the asynchronous job. In this case, when the HTTP status code of the Bulk Data Status Request is 200 OK, the response SHALL also include an X-Export-Status header with an HTTP status code that reflects the status of the asynchronous job.
| Parameter | Cardinality | Type | Description |
|---|---|---|---|
_outputFormat |
0..1 | string | The format for the generated bulk data files. Defaults to For request types where the server supports either the FHIR Asynchronous Bulk Interaction Pattern or the Asynchronous Interaction Request Pattern, requests that include the |
_minimumFileSize |
0..1 | positiveInt | Specifies the minimum size in bytes for generated NDJSON files. The value SHALL be a positive integer. If a server supports this parameter, it SHOULD construct files that meet or exceed this size unless doing so would violate the |
_maximumFileSize |
0..1 | positiveInt | Specifies the maximum size in bytes for generated NDJSON files. The value SHALL be a positive integer and SHALL be greater than |
View OperationDefinition for FHIR Asynchronous Bulk Interaction Pattern
Implementation notes:
_minimumFileSize nor _maximumFileSize is specified, servers use their default file size behavior._minimumFileSize and _maximumFileSize when necessary, for example for the last file in a sequence or when a resource is larger than _maximumFileSize.The server SHALL return a successful kick-off response with:
202 AcceptedContent-Location header with the absolute URL of an endpoint for subsequent status requestsWhen a Prefer header value of separate-export-status is provided in the kick-off request and supported by the server, the response SHALL include the header Preference-Applied with values of respond-async and separate-export-status. These may be provided as comma-delimited values or the header may be repeated for each value.
The server MAY include a FHIR OperationOutcome resource in the body.
The server SHALL return an error response with:
4XX or 5XXOperationOutcome resource in the bodyAfter an asynchronous bulk request has been started, the client MAY poll the status URL provided in the Content-Location header.
When polling for status, clients SHOULD follow an exponential backoff approach. A server SHOULD supply a Retry-After header with a delay time in seconds (for example, 120 to represent two minutes) or an HTTP-date (for example, Fri, 31 Dec 1999 23:59:59 GMT). When provided, clients SHOULD use this information to inform the timing of future polling requests. The server SHOULD keep an accounting of status queries received from a given client, and if a client is polling too frequently, the server SHOULD respond with a 429 Too Many Requests status code in addition to a Retry-After header, and optionally a FHIR OperationOutcome resource with further explanation. If excessively frequent status queries persist, the server MAY return a 429 Too Many Requests status code and terminate the session. Other standard HTTP 4XX and 5XX status codes MAY be used to identify errors as mentioned below.
When requesting status, the client SHOULD use an Accept header indicating a content type of application/json. In the case that errors prevent the asynchronous operation from completing, the server SHOULD respond with a FHIR OperationOutcome resource in JSON format.
When a Prefer header value of separate-export-status was provided in the kick-off request and is supported by the server, the HTTP status code in the response to this request SHALL reflect the status request itself, and not the asynchronous job. In this case, when the HTTP status code of this request is 200 OK, the response SHALL also include an X-Export-Status header with an HTTP status code that reflects the status of the asynchronous job.
GET [polling content location]
The server SHALL indicate an in-progress asynchronous job with the following response status and headers:
| Kick-off request | HTTP status | X-Export-Status |
|---|---|---|
No separate-export-status |
202 Accepted |
Not present |
separate-export-status |
200 OK |
202 Accepted |
The server MAY also return an X-Progress header with a text description of the status of the request that is less than 100 characters. The format of this description is at the server's discretion and MAY be a percentage complete value, or MAY be a more general status such as "in progress". The client MAY parse the description, display it to the user, or log it.
The server SHALL indicate an asynchronous job failure with the following response status and headers:
| Kick-off request | HTTP status | X-Export-Status |
|---|---|---|
No separate-export-status |
4XX or 5XX |
Not present |
separate-export-status |
200 OK |
4XX or 5XX |
The body of the response SHOULD be a FHIR OperationOutcome resource in JSON format. If this is not possible, such as when the infrastructure layer returning the error is not FHIR aware, the server MAY return an error message in another format and include a corresponding value for the Content-Type header.
When the body is a FHIR OperationOutcome resource, the response SHALL include a Content-Type header of application/fhir+json.
In the case of a polling failure that does not indicate failure of the asynchronous job, a server SHOULD use a transient code from the IssueType valueset when populating the FHIR OperationOutcome resource's issue.code element to indicate to the client that it should retry the request at a later time.
Note: Even if some of the requested or generated resources cannot successfully be returned, the overall asynchronous operation MAY still succeed. In this case, the response error array of the completion manifest SHALL be populated with one or more files containing FHIR OperationOutcome resources to indicate what went wrong. In the case of a partial success, the server SHALL use a 200 status code instead of 4XX or 5XX. The choice of when to determine that a job has failed in its entirety, as opposed to returning a partial success, is left to the server.
The server SHALL indicate a completed asynchronous job with the following response status and headers:
| Kick-off request | HTTP status | X-Export-Status |
|---|---|---|
No separate-export-status |
200 OK |
Not present |
separate-export-status |
200 OK |
200 OK |
The response SHALL include a Content-Type header of application/json and a body containing the operation-specific manifest described below.
The server SHOULD return an Expires header indicating when the files listed will no longer be available for access.
The output manifest is a JSON object providing metadata and links to the generated Bulk Data files. The files SHALL be accessible to the client at the URLs advertised. These URLs MAY be served by file servers other than the server that accepted the asynchronous request.
| Field | Cardinality | Type | Description |
|---|---|---|---|
transactionTime |
1..1 | instant | Indicates the Data Provider's time when the query is run or files were generated. The bulk data files referenced in this manifest SHOULD NOT include any resources modified after this instant, and SHALL include any matching resources modified up to and including this instant. |
requiresAccessToken |
1..1 | boolean | Indicates whether downloading the files referenced in this manifest requires the same authorization mechanism as the operation that resulted in the manifest. Value SHALL be true if both the Data Provider's file server and the Data Provider's FHIR API server control access using OAuth 2.0 bearer tokens. Value MAY be false for file servers that use access-control schemes other than OAuth 2.0, such as downloads from Amazon S3 bucket URLs or verifiable file servers within an organization's firewall. |
manifestType |
0..1 | canonical | Canonical URL of the OperationDefinition for the operation associated with the provision of this manifest. E.g., |
request |
0..1 | string | Deprecated - this element SHOULD NOT be used and will be removed in a future release of this IG. When populated for backward compatibility, it contains the full URL of the original Bulk Data kick-off request. In the case of a POST request, this URL does not include the request parameters. |
outputFormat |
0..1 | string | MIME type of the referenced bulk data output files. Defaults to application/fhir+ndjson when omitted. Corresponds to the _outputFormat parameter in a Bulk Export operation. |
outputOrganizedBy |
0..1 | string | When resources in the output files are organized by instances of a resource type, that resource type is specified here. When each output file contains a single resource type, this element SHALL be omitted and an individual type element SHALL be included for each file in the output array. |
outputOrganizedByDetail |
0..1 | string | Narrative text providing detail on the organizing resource listed in outputOrganizedBy. SHALL NOT be populated in the absence of the outputOrganizedBy element. |
output |
0..* | BackboneElement | An array of file items with one entry for each generated file. |
↳ url |
1..1 | url | The absolute path to the file. The format of the file SHOULD reflect that requested in the _outputFormat parameter of the initial kick-off request and the outputFormat element in this manifest. |
↳ type |
0..1 | string | The FHIR resource type contained in the file. When the manifest does not include an outputOrganizedBy value, this element SHALL be populated. When the manifest includes the outputOrganizedBy element, this element SHALL NOT be populated. |
↳ continuesInFile |
0..1 | url | URL of the next output file when resources for an organizing resource span multiple files. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
deleted |
0..* | BackboneElement | References to files containing pointers to deleted resources in the form of FHIR Transaction Bundles. Each line in the output files SHALL contain a FHIR Bundle with a type of transaction which SHALL contain one or more entry items that reflect a deleted resource. In each entry, the request.url and request.method elements SHALL be populated and request.method SHALL be set to DELETE. |
↳ url |
1..1 | url | The absolute path to the file. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
error |
0..* | BackboneElement | Files containing OperationOutcome resources. Error, success, warning, information and other messages related to the operation SHOULD be included here (not in output). This element will be renamed to status in a future release of this IG. |
↳ url |
1..1 | url | The absolute path to the file. |
↳ count |
0..1 | integer | The number of resources in the file. |
↳ fileSize |
0..1 | integer | The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file. |
↳ countSeverity |
0..* | BackboneElement | Count of OperationOutcome resources grouped by severity level. |
↳ code |
1..1 | code | Severity level from OperationOutcome.issue.severity (fatal, error, warning, information, success) |
↳ count |
1..1 | integer | The number of OperationOutcome resources in the file with this severity level. |
link |
0..* | BackboneElement | Link to related manifest. |
↳ relation |
1..1 | string | The relationship type. A value of 'next' indicates the URL points to the location of another manifest containing additional output files. |
↳ url |
1..1 | url | URL pointing to the location of another manifest. All fields in the linked manifest SHALL be populated with the same values as this manifest, apart from the contents of output, deleted, and link. |
Implementation notes:
transactionTime, to properly meet the inclusion constraints above, the server might need to wait for any pending transactions to resolve in its database before starting the asynchronous operation process.error and not in output. If there are no relevant messages, the server SHOULD return an empty array. If the request contained invalid or unsupported parameters along with a Prefer: handling=lenient header and the server processed the request, the server SHOULD include a FHIR OperationOutcome resource for each of these parameters._since is supported by the request type and supplied in the kick-off request, the deleted array SHOULD be populated with files containing FHIR transaction Bundles for resources that match the kick-off request criteria but were deleted after the supplied time. If no resources have been deleted, if no such timestamp was supplied, or if the server has other reasons to avoid exposing these data, the server MAY omit this key or return an empty array. Resources that appear in deleted SHALL NOT also appear in output.Example manifest:
This content is an example of the Bulk Data Manifest Logical Model and is not a FHIR Resource
{
"resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkDataManifest",
"id": "BulkDataManifestMinimalExample",
"transactionTime": "2021-01-01T00:00:00Z",
"requiresAccessToken": true,
"output": [
{
"type": "Patient",
"url": "https://example.org/output/patient_file_1.ndjson"
}
]
}
Using the URLs supplied by the server in the manifest, a client MAY download the referenced files within the time period specified in the Expires header, if present. A client MAY re-fetch the manifest if file links have expired, and a server MAY provide updated links or an updated Expires timestamp in response.
As long as a server is following relevant security guidance, it MAY generate manifests where the requiresAccessToken field is true or false, including for servers available on the public internet.
If the requiresAccessToken field in the manifest is set to true, the request SHALL include a valid access token.
If the requiresAccessToken field is set to false and no additional authorization-related extensions are present in the relevant manifest entry, then the URLs SHALL be dereferenceable directly as capability URLs. A client SHALL NOT provide a SMART Backend Services access token when dereferencing a URL from a manifest entry where requiresAccessToken is false.
Returned content SHALL include only the most recent version of any returned resources unless the client explicitly requests different behavior in a fashion supported by the server. Inclusion of the Resource.meta information in the resources is at the discretion of the server, as it is for all FHIR interactions.
A client SHOULD provide an Accept-Encoding header when requesting files and SHOULD include gzip compression as one of the encoding options in the header. A server SHALL provide files as uncompressed, with gzip compression, or with another compression format from the Accept-Encoding header. When compression is used, a server SHALL communicate this to the client by including a Content-Encoding header in the response. A client SHALL accept files that are uncompressed or encoded with gzip compression, and MAY accept files encoded with other compression formats.
GET [url from manifest output, deleted, or error field]
Accept (optional, defaults to application/fhir+ndjson)Specifies the format of the file being requested.
The server SHALL return a successful file response with:
200 OKContent-Type header that matches the file format being deliveredFor files in NDJSON format, the Content-Type header SHALL be application/fhir+ndjson.
The server SHALL return an error response with HTTP status 4XX or 5XX.
Output files may be organized by resource type, or by instances of a resource type specified in the outputOrganizedBy element of the output manifest.
When the outputOrganizedBy element in the manifest is not populated, each output file SHALL contain resources of only one type, and a server MAY create more than one file for each resource type returned. The number of resources contained in a file MAY vary between servers and files.
When the outputOrganizedBy element is populated with a resource type, the output files SHALL be populated with blocks consisting of a header Parameters resource containing a parameter named header with a reference to a resource of the type specified by outputOrganizedBy, followed by the resource referenced in this header and resources that reference the resource referenced in the header (together a "resource block"). Each output file MAY contain multiple resource blocks and, when possible, a single resource's block SHOULD NOT be split across files. If a resource block does span more than one file, the header SHALL be repeated at the start of each file where the block continues, and the association between these files SHALL be documented in the manifest using the continuesInFile element in the relevant output array items.
Resources that would otherwise be included in the returned data set, but do not have references to the resource type specified in the outputOrganizedBy element, MAY be included in resource blocks that contain resources they reference, MAY be repeated in every resource block, or MAY be omitted from the data set.
outputOrganizedBy element is set to Patient, servers SHOULD use the Patient Compartment Definition to determine a base set of related resources to include in a resource block, though other resources may also be included.
For other resource types, we are soliciting feedback on the best approach for documenting the set of resources in a resource block. Implementation Guides MAY reference a Compartment Definition, populate a GraphDefinition Resource, include narrative text, or use another approach.
Example NDJSON file when the manifest does not include outputOrganizedBy:
{"id":"p-1","resourceType":"Patient", "name":[{"given":["Brenda"],"family":"Jackson"}],"gender":"female", ...}
{"id":"p-2","resourceType":"Patient", "name":[{"given":["Bram"],"family":"Sandeep"}],"gender":"male", ...}
{"id":"p-3","resourceType":"Patient", "name":[{"given":["Sandy"],"family":"Hamlin"}],"gender":"female", ...}
{...}
Example NDJSON file when outputOrganizedBy is set to Patient:
{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-1"}}]}
{"id": "p-1", "resourceType": "Patient", ...}
{"id": "c-1", "resourceType": "Condition", "subject":{"reference": "Patient/p-1"}, ...}
{"id": "o-1", "resourceType": "Observation", "subject":{"reference": "Patient/p-1"}, ...}
{...}
{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-2"}}]}
{"id": "p-2", "resourceType": "Patient", ...}
{"id": "c-101", "resourceType": "Condition", "subject":{"reference": "Patient/p-2"}, ...}
{"id": "o-102", "resourceType": "Observation", "subject":{"reference": "Patient/p-2"}, ...}
{...}
If resources in a returned file contain elements of the type Attachment, the server SHOULD populate the Attachment.contentType code as well as either the data element or the url element. If the data element is not populated and the url element is populated, the url element SHALL be an absolute URL that can be dereferenced to the attachment's content.
When the url element is populated with an absolute URL and the requiresAccessToken field in the manifest is set to true, the URL location SHALL be accessible by a client with a valid access token and SHALL NOT require the use of additional authentication credentials. When the url element is populated and the requiresAccessToken field in the manifest is set to false, the URL location SHALL be accessible by a client without an access token.
Note that if a server copies files to the Bulk Data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the Attachment.url element when generating the Bulk Data output files.
After an asynchronous bulk request has been started, a client MAY send a DELETE request to the URL provided in the Content-Location header to cancel the request. If the request has been completed, a server MAY use the request as a signal that the client is done retrieving files and that it is safe for the server to remove those from storage. Following the delete request, when subsequent requests are made to the polling location, the server SHALL return a 404 Not Found error and an associated FHIR OperationOutcome resource in JSON format.
DELETE [polling content location]
The server SHALL return a successful delete response with HTTP status 202 Accepted.
The server MAY include a FHIR OperationOutcome resource in the body in JSON format.
The server SHALL return an error response with:
4XX or 5XXOperationOutcome resource in the body in JSON format