Bulk Data Access IG
4.0.0 - STU 4 International flag

Bulk Data Access IG, published by HL7 International / FHIR Infrastructure. This guide is not an authorized publication; it is the continuous build for version 4.0.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/bulk-data/ and changes regularly. See the Directory of published versions

Export

Page standards status: Trial-use

Audience and Scope

The Bulk Export operation is intended to be used by developers at organizations that aim to interoperate by sharing large FHIR datasets. It defines the application programming interfaces (APIs) through which an authenticated and authorized system (Data Consumer) may request a FHIR Data Export from another system (Data Provider), receive status information regarding progress in the generation of the requested files, and retrieve those files. The data consumer can control the data being returned by optionally selecting the cohort, resource types, filters, data elements, and time window.

Many Bulk Export workflows are cohort-driven. A Data Provider may expose one or more FHIR Group resources representing payer rosters, research cohorts, quality-measure populations, care management panels, or other recurring populations. As described on the Group page, implementations may expose read-only groups managed by the Data Provider, member-based groups managed by the Data Consumer, or criteria-based groups whose membership is computed from characteristics. Some Data Providers may also support the Bulk Cohort API described in this guide for asynchronous creation of characteristic-based cohorts by a Data Consumer. A group-level export, described below, provides a standard way to request data on patients in any of these types of groups.

For a high-level comparison of Bulk Export, Bulk Submit, and Bulk Publish, see Choosing a Bulk Operation.

Privacy and Security Considerations

All exchanges described herein between a Data Consumer and a Data Provider SHALL be secured using Transport Layer Security (TLS) Protocol Version 1.2 (RFC5246) or a more recent version of TLS. Use of mutual TLS is OPTIONAL.

The Data Provider SHOULD implement OAuth 2.0 access management in accordance with the SMART Backend Services Authorization Profile. When SMART Backend Services Authorization is used, Bulk Data Status Requests and Bulk Data Output File Requests with requiresAccessToken=true SHALL be protected the same way as the Bulk Data Kick-off Request, including an access token with scopes that cover all resources being exported. A Data Provider MAY additionally restrict Bulk Data Status Requests and Bulk Data Output File Requests by limiting them to the Data Consumer that originated the export. Implementations MAY include endpoints that use authorization schemes other than OAuth 2.0, such as mutual TLS or signed URLs.

For Group level exports, in addition to requiring authorization to access the resources included in the export, a Data Provider SHOULD restrict Data Consumers from exporting data for Group resources they are not authorized to read (e.g., via system/Group.rs in SMART on FHIR v2). A Data Provider SHALL also restrict access to specific groups based on underlying business rules.

This implementation guide does not address protection of a Data Provider from potential compromise. An adversary who successfully captures administrative rights to the Data Provider will have full control over that system and can use those rights to undermine its security protections. In the Bulk Data Export workflow, the Data Provider's file server will be a particularly attractive target, as it holds highly sensitive and valued PHI. An adversary who successfully takes control of a file server may choose to continue to deliver files in response to Data Consumer requests, so that neither the Data Consumer nor the Data Provider's FHIR server is aware of the take-over. Meanwhile, the adversary is able to put the PHI to use for its own malicious purposes.

Healthcare organizations have an imperative to protect PHI persisted in file servers in both cloud and data-center environments. A range of existing and emerging approaches can be used to accomplish this, not all of which would be visible at the API level. This specification does not dictate a particular approach at this time, though it does support the use of an Expires header to limit the time period a file will be available for Data Consumer download. Removal of the file from the Data Provider is left to the implementer. A Data Provider SHOULD NOT delete files from a Bulk Data response that a Data Consumer is actively in the process of downloading regardless of the pre-specified expiration time.

Data access control obligations can be met with a combination of in-band restrictions (e.g., OAuth scopes) and out-of-band restrictions, where the Data Provider limits the data returned to a specific Data Consumer in accordance with local considerations such as policies or regulations. The Data Provider's FHIR server SHALL limit the data returned to only those FHIR resources for which the Data Consumer is authorized. Implementers SHOULD incorporate technology that preserves and respects an individual's wishes to share their data with desired privacy protections. For example, some Data Consumers are authorized to access sensitive mental health information and some are not; this authorization is defined out of band, but when a Data Consumer requests a full data set, filtering is automatically applied by the Data Provider, restricting the data that the Data Consumer receives.

Bulk Data Export can be a resource-intensive operation. Data Providers SHOULD consider and mitigate the risk of intentional or inadvertent denial-of-service attacks, though the details are beyond the scope of this specification. For example, transactional systems may wish to provide Bulk Data access to a read-only mirror of the database or may distribute processing over time to avoid loads that could impact clinical operations.

Roles

There are two primary roles involved in a Bulk Data transaction:

  1. Data Provider:

    a. Authorization Server: Issues access tokens in response to valid token requests from the Data Consumer.

    b. FHIR Resource Server: Accepts kick-off requests and provides job status and completion manifests.

    c. Output File Server: Returns FHIR bulk data files and attachments in response to URLs in the completion manifest. This may be built into the Data Provider's FHIR Resource Server, or the files may be independently hosted.

  2. Data Consumer:

    a. Export Client: Requests the export and polls job status.

    b. File Retrieval Client: Retrieves bulk data files and attachments from the Data Provider.

Sequence Overview

Data ProviderData ConsumerData ConsumerAuthorization ServerAuthorization ServerFHIR Resource ServerFHIR Resource ServerOutput File ServerOutput File Serveropt[Precondition: SMART Backend Services Registration]Registrationclient_idopt[Precondition: SMART Backend Services Authorization]Signed token requestShort-lived tokenKick-off requestStatus polling locationloop[Check export status (repeat 1..n)]Status requestIn-progress statusGenerated filesStatus requestComplete status (JSON manifest)loop[Retrieve files and errors (repeat 0..n)]Bulk Data output file requestBulk Data fileloop[Retrieve attachments (repeat 0..n)]Bulk Data attachment file requestAttachment file
Overview of the Bulk Data Export request flow.

Kick-off Request

The Bulk Data Export Operation initiates the asynchronous generation of a requested export data set, whether that be data for all patients, data for a subset (defined group) of patients, or all FHIR data available from the Data Provider.

As discussed in Privacy and Security Considerations above, a Data Provider SHALL limit the data returned to only those FHIR resources for which the Data Consumer is authorized.

The Data Provider's FHIR Resource Server SHALL support invocation of this operation using the FHIR Asynchronous Bulk Interaction Pattern. A Data Provider SHALL support GET requests and MAY support POST requests that supply parameters using the FHIR Parameters Resource.

If a parameter has a cardinality of greater than one, a Data Consumer MAY repeat the kick-off parameter multiple times or MAY include a single instance of the parameter with multiple values delimited by commas. The Data Provider SHALL treat comma-delimited values within a single instance of the parameter as if the parameter was repeated. The use of comma-delimited values within a parameter is deprecated in favor of repeating parameters and will be removed in a future version of this IG.

For Patient-level requests and Group-level requests associated with groups of patients, the Patient Compartment SHOULD be used as a point of reference for recommended resources to be returned and, where applicable, Patient resources SHOULD be returned. Other resources outside of the patient compartment that are helpful in interpreting the patient data (such as Organization and Practitioner) MAY also be returned.

Binary Resources whose content is associated with an individual patient SHALL be serialized as DocumentReference Resources with the content.attachment element populated as described in the Attachments section below. Binary Resources not associated with an individual patient MAY be included in a System Level export.

References in the resources returned MAY be relative URLs with the format <resource type>/<id>, or MAY be absolute URLs with the same structure rooted in the base URL for the Data Provider's FHIR server from which the export was performed.

Endpoint - All Patients

[fhir base]/Patient/$export

View table of parameters for Patient Export

FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all patients.

Endpoint - Group of Patients

[fhir base]/Group/[id]/$export

View table of parameters for Group Export

FHIR Operation to obtain a detailed set of FHIR resources of diverse resource types pertaining to all members of a specified Group.

If a Data Provider's FHIR server supports Group-level data export, it SHOULD support reading and searching for the Group resource. This enables Data Consumers to discover available groups based on stable characteristics such as Group.identifier.

As described on the Group page, implementations may expose read-only groups managed by the Data Provider, member-based groups managed by the Data Consumer, or criteria-based groups whose membership is computed from characteristics. Some Data Providers may also support the Bulk Cohort API described in this guide for asynchronous creation of characteristic-based cohorts by a Data Consumer.

Endpoint - System Level Export

[fhir base]/$export

View table of parameters for Export

Export data from a Data Provider's FHIR server, whether or not it is associated with a patient. This supports use cases like backing up a Data Provider's FHIR server, or exporting terminology data by restricting the resources returned using the _type parameter.

Headers
  • Accept (string)

    Specifies the format of the optional FHIR OperationOutcome resource response to the kick-off request. Currently, only application/fhir+json is supported. A Data Consumer SHOULD provide this header. If omitted, the Data Provider MAY return an error or MAY process the request as if application/fhir+json was supplied.

  • Prefer (string)

    A Data Consumer SHOULD include this header with a value of respond-async to indicate that the export will be processed asynchronously. If omitted, the Data Provider MAY return an error or MAY process the request as if respond-async was supplied.

    A Data Consumer MAY also provide a second Prefer header value of separate-export-status, so the combined Prefer header for the kickoff request is Prefer: respond-async,separate-export-status. If this header value is included by a Data Consumer and is supported by a Data Provider, the Data Provider SHALL return the header Preference-Applied with values of respond-async and separate-export-status in its response. These may be provided as comma-delimited values or the header may be repeated for each value.

    When a Prefer header value of separate-export-status is provided in the kickoff request and supported by the Data Provider, the HTTP status code in the response to a Bulk Data Status request SHALL reflect the status request itself, and not the export job. In this case, when the HTTP status code of the Bulk Data Status request is 200 OK, the response SHALL also include an X-Export-Status header with an HTTP status code that reflects the status of the export job.

Query Parameters
Query Parameter Optionality for Data Provider Optionality for Data Consumer Cardinality Type Description
_outputFormat required optional 0..1 string

The format of the bulk data files generated through the FHIR Asynchronous Bulk Interaction Pattern. Defaults to application/fhir+ndjson. The Data Provider SHALL support Newline Delimited JSON, but MAY choose to support additional output formats. The Data Provider SHALL accept the full content type of application/fhir+ndjson as well as the abbreviated representations application/ndjson and ndjson.

_since required optional 0..1 instant

Resources will be included in the response if their state has changed after the supplied time (e.g., if Resource.meta.lastUpdated is later than the supplied _since time). In the case of a Group level export, the Data Provider MAY return additional resources modified prior to the supplied time if the resources belong to the patient compartment of a patient added to the Group after the supplied time (this behavior SHOULD be clearly documented by the Data Provider). The Data Provider MAY return resources that are referenced by the resources being returned regardless of when the referenced resources were last updated. For resources where the Data Provider does not maintain a last updated time, the Data Provider MAY include these resources in a response irrespective of the _since value supplied by a Data Consumer.

_until optional optional 0..1 instant

Resources will be included in the response if their state has changed before the supplied time (e.g., if Resource.meta.lastUpdated is earlier than the supplied _until time). The Data Provider MAY return resources that are referenced by the resources being returned regardless of when the referenced resources were last updated. For resources where the Data Provider does not maintain a last updated time, the Data Provider MAY include these resources in a response irrespective of the _until value supplied by a Data Consumer.

_type optional optional 0..* string

The response SHALL be filtered to only include resources of the specified resource type(s).

If this parameter is omitted, the Data Provider SHALL return all supported resources within the scope of the Data Consumer's authorization, though implementations MAY limit the resources returned to specific subsets of FHIR, such as those defined in the US Core Implementation Guide. For Patient- and Group-level requests, the Patient Compartment SHOULD be used as a point of reference for recommended resources to be returned. However, other resources outside of the Patient Compartment that are referenced by the resources being returned and would be helpful in interpreting the patient data MAY also be returned (such as Organization and Practitioner). When this behavior is supported, a Data Provider SHOULD document this support (for example, as narrative text, or by including a GraphDefinition Resource).

A Data Provider that is unable to support _type SHOULD return an error and FHIR OperationOutcome resource so the Data Consumer can re-submit a request omitting the _type parameter. If the Data Consumer explicitly asks for export of resources that the Data Provider does not support, or asks for only resource types that are outside the Patient Compartment, the Data Provider SHOULD return details via a FHIR OperationOutcome resource in an error response to the request. When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

For example _type=Observation could be used to filter a given export response to return only FHIR Observation resources.

_elements optional, experimental optional 0..* string

When provided, the Data Provider SHOULD omit unlisted, non-mandatory elements from the resources returned. Elements SHOULD be of the form [resource type].[element name] (e.g., Patient.id) or [element name] (e.g., id) and only root elements in a resource are permitted. If the resource type is omitted, the element SHOULD be returned for all resources in the response where it is applicable.

A Data Provider is not obliged to return just the requested elements. A Data Provider SHOULD always return mandatory elements whether they are requested or not. A Data Provider SHOULD mark the resources with the tag SUBSETTED to ensure that the incomplete resource is not actually used to overwrite a complete resource.

A Data Provider that is unable to support _elements SHOULD return an error and a FHIR OperationOutcome resource so the Data Consumer can re-submit a request omitting the _elements parameter. When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

patient
(POST requests only)
optional optional 0..* Reference

Not applicable to system level export requests. This parameter is only valid in kickoff requests initiated through an HTTP POST request. When provided, the Data Provider SHALL NOT return resources in the patient compartments belonging to patients outside of this list. If a Data Consumer requests patients who are not present on the Data Provider (or in the case of a group level export, who are not members of the group), the Data Provider SHOULD return details via a FHIR OperationOutcome resource in an error response to the request.

A Data Provider that is unable to support the patient parameter SHOULD return an error and FHIR OperationOutcome resource so the Data Consumer can re-submit a request omitting the patient parameter. When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

includeAssociatedData optional, experimental optional 0..* code

When provided, a Data Provider with support for the parameter and requested values SHALL return or omit a pre-defined set of FHIR resources associated with the request.

A Data Provider that is unable to support the requested includeAssociatedData values SHOULD return an error and a FHIR OperationOutcome resource so the Data Consumer can re-submit a request that omits those values (for example, if a Data Provider does not retain provenance data). When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

A Data Consumer MAY include one or more of the following values. If multiple conflicting values are included, the Data Provider SHALL apply the least restrictive value (value that will return the largest dataset).

  • LatestProvenanceResources: Export will include the most recent Provenance resources associated with each of the non-provenance resources being returned. Other Provenance resources will not be returned.
  • RelevantProvenanceResources: Export will include all Provenance resources associated with each of the non-provenance resources being returned.
  • _[custom value]: A Data Provider MAY define and support custom values that are prefixed with an underscore (e.g., _myCustomPreset).
_typeFilter optional optional 0..* string

String of a FHIR REST search query.

When provided, a Data Provider with support for the parameter and requested search queries SHALL filter the data in the response for resource types referenced in the typeFilter expression to only include resources that meet the specified criteria. FHIR search result parameters such as _include and _sort SHALL NOT be used and a query in the _typeFilter parameter SHALL have the search context of a single FHIR Resource Type. See details.

A Data Provider unable to support the requested _typeFilter queries SHOULD return an error and FHIR OperationOutcome resource so the Data Consumer can re-submit a request that omits those queries. When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

organizeOutputBy optional optional 0..1 string

String of a FHIR resource type.

When provided, a Data Provider with support for the parameter SHALL organize the resources in output files by instances of the specified resource type, including a header for each resource of the type specified in the parameter, followed by the resource and resources in the output that contain references to that resource. When omitted, Data Providers SHALL organize each output file with resources of only a single type. See details, example manifest, and example output file.

A Data Provider unable to structure output by the requested organizeOutputBy resource SHOULD return an error and FHIR OperationOutcome resource. When a Prefer: handling=lenient header is included in the request, the Data Provider MAY process the request instead of returning an error.

allowPartialManifests optional optional 0..1 boolean

When provided, a Data Provider with support for the parameter MAY distribute the bulk data output files among multiple manifests, providing links for Data Consumers to page through the manifests (see details). Prior to all of the files in the export being available, the Data Provider MAY return a manifest with files that are available along with a 202 Accepted HTTP response status, and subsequently update the manifest with a paging link to a new manifest when additional files are ready for download (see details).

Note: Implementations MAY limit the resources returned to specific subsets of FHIR, such as those defined in the US Core Implementation Guide. If the Data Consumer explicitly asks for export of resources that the Data Provider does not support, the Data Provider SHOULD return details via a FHIR OperationOutcome resource in an error response to the request.

If an includeAssociatedData value relevant to provenance is not specified, or if this parameter is not supported by the Data Provider, the Data Provider SHALL include all available Provenance resources whose Provenance.target is a resource in the Patient compartment in a patient level export request, and all available Provenance resources in a system level export request unless a specific resource set is specified using the _type parameter and this set does not include Provenance.

Group Membership Request Pattern

To obtain new and updated resources for patients in a group, as well as all data for patients who have joined the group since a prior query, a Data Consumer can use the following pattern:

  • Initial Query (e.g., on January 1, 2020):

    • Data Consumer submits a group export request:

      [baseurl]/Group/[id]/$export

    • Data Consumer retrieves response data
    • Data Consumer retains a list of the patient ids returned
    • Data Consumer retains the transactionTime value from the response
  • Subsequent Queries (e.g., on February 1, 2020):

    • Data Consumer submits a group export request to obtain a patient list:

      [baseurl]/Group/[id]/$export?_type=Patient&_elements=id

    • Data Consumer retains a list of patient ids returned
    • Data Consumer compares the response to the patient ids from the first query and identifies new patient ids
    • Data Consumer submits a group export request via POST for patients who are new members of the group:

      POST [baseurl]/Group/[id]/$export
      
      {"resourceType" : "Parameters",
        "parameter" : [{
          "name" : "patient",
          "valueReference" : {reference: "Patient/123"}
        },{
          "name" : "patient",
          "valueReference" : {reference: "Patient/456"}
        ...
        }]
      }
      
    • Data Consumer submits a group export request for updated group data:

      [baseurl]/Group/[id]/$export?_since=[initial transaction time]

      Note that data returned from this request may overlap with that returned from the prior step.

    • Data Consumer retains the transactionTime value from the response.
_typeFilter Query Parameter

The _typeFilter parameter enables finer-grained filtering out of resources in the bulk data export response that would have otherwise been returned. For example, a Data Consumer may want to retrieve only active prescriptions rather than all prescriptions and only laboratory observations rather than all observations. When using _typeFilter, each resource type is filtered independently. For example, filtering Patient resources to people born after the year 2000 will not filter Encounter resources for patients born before the year 2000 from the export.

Filtering resources based on the dates associated with a clinical or administrative event, such as exporting encounters that occurred within a certain time period, SHOULD be done using the _typeFilter parameter and not the _since and _until parameters, since the resource modification date used in those filters might not correspond to the date of the clinical or administrative event.

The value of the _typeFilter parameter is a FHIR REST API query. Resources with a resource type specified in this query that do not meet the criteria in the search expression in the query SHALL NOT be returned, with the exception of related resources being included by the Data Provider to provide context about the resources being exported (see processing model). A Data Consumer MAY repeat the _typeFilter parameter multiple times in a kick-off request. When more than one _typeFilter parameter is provided with a query for the same resource type, the Data Provider SHALL include resources of that resource type that meet the criteria in any of the parameters (a logical "or").

FHIR search result parameters (such as _sort, _include, and _elements) SHALL NOT be used as _typeFilter criteria. Additionally, a query in the _typeFilter parameter SHALL have the search context of a single FHIR Resource Type. The contexts "all resource types" and "a specified compartment" are not allowed. Data Consumers SHOULD consult the Data Provider's CapabilityStatement to identify supported search parameters (see Data Provider capability documentation). Since support for _typeFilter is OPTIONAL for a Data Provider, Data Consumers SHOULD be robust to Data Providers that ignore _typeFilter.

Chained parameters used in a typeFilter query are an experimental feature, and when supported by a Data Provider, the set of exported resources resulting from the interactions between the _typeFilter parameter and other kickoff parameters may be surprising. We are soliciting feedback on the use of chained parameters, and depending on the response may consider deprecating this capability in a future version of this IG.

Example Request

The following is an export request for MedicationRequest resources, where the Data Consumer would further like to restrict the MedicationRequests to those that are active, or else completed after July 1, 2018. This can be accomplished with two _typeFilter query parameters and an _type query parameter:

  • MedicationRequest?status=active
  • MedicationRequest?status=completed&date=gt2018-07-01T00:00:00Z
$export?
  _type=
    MedicationRequest
  &_typeFilter=
    MedicationRequest%3Fstatus%3Dactive
  &_typeFilter=
    MedicationRequest%3Fstatus%3Dcompleted%26date%3Dgt2018-07-01T00%3A00%3A00Z

Note that newlines and spaces have been added above for clarity, and would not be included in a real request.

Processing Model

The following steps outline a model of how a Data Provider might process a bulk export request. The actual operations a Data Provider performs and the order in which they are performed might differ. Additionally, as documented elsewhere in this implementation guide, depending on the values and headers provided, some requests might cause a Data Provider to return an error rather than continuing to process the request.

All resources available for the Data Provider to exportExclude resources the Data Consumeris not authorized to receive(based on OAuth scopes and business logic)Exclude resources for patients outside groupyesgroup exportnoExclude resource of types not listed in `_type` parameteryes`_type` parameter?noExclude resources updated prior to `_since` timestamp*yes`_since` parameter?nohas `_typeFilter` criteriayesnoExclude resources of this resource typethat don't meet criteria in at least oneof the `_typeFilter` parametersRetain all resources for this resource typefor each resource type in exportadditional resource types in export?yes`_typeFilter` parameter?Add associated resources for resources in exportyes`includeAssociatedData` parameternoAdd other related resources to provide context to those in exportOutput resources for the Data Consumer
Model for processing a Bulk Data Export request.


* In the case of a Group level export, the Data Provider may retain resources modified prior to the _since timestamp if the resources belong to the patient compartment of a patient added to the Group after the supplied time and this behavior is documented by the Data Provider.

Response - Success

The Data Provider SHALL return a successful kick-off response with:

  • HTTP status 202 Accepted
  • Content-Location header with the absolute URL of an endpoint for subsequent status requests (polling location)

When a Prefer header value of separate-export-status is provided in the kickoff request and supported by the Data Provider, the response SHALL include the header Preference-Applied with values of respond-async and separate-export-status. These may be provided as comma-delimited values or the header may be repeated for each value.

The Data Provider MAY include a FHIR OperationOutcome resource in the body in JSON format.

Response - Error (e.g., unsupported search parameter)

The Data Provider SHALL return an error response with:

  • HTTP status 4XX or 5XX
  • FHIR OperationOutcome resource in the body in JSON format

If a Data Provider wants to prevent a Data Consumer from beginning a new export before an in-progress export is completed, it SHOULD respond with a 429 Too Many Requests status and a Retry-After header, following the rate-limiting advice for "Bulk Data Status Request" below.


Bulk Data Status Request

After a Bulk Data request has been started, the Data Consumer MAY poll the status URL provided in the Content-Location header according to the FHIR Asynchronous Bulk Interaction Pattern.

Data Consumers SHOULD follow an exponential backoff approach when polling for status. A Data Provider SHOULD supply a Retry-After header with a delay time in seconds (e.g., 120 to represent two minutes) or an HTTP-date (e.g., Fri, 31 Dec 1999 23:59:59 GMT). When provided, Data Consumers SHOULD use this information to inform the timing of future polling requests. The Data Provider SHOULD keep an accounting of status queries received from a given Data Consumer, and if a Data Consumer is polling too frequently, the Data Provider SHOULD respond with a 429 Too Many Requests status code in addition to a Retry-After header, and optionally a FHIR OperationOutcome resource with further explanation. If excessively frequent status queries persist, the Data Provider MAY return a 429 Too Many Requests status code and terminate the session. Other standard HTTP 4XX and 5XX status codes may be used to identify errors as mentioned below.

When requesting status, the Data Consumer SHOULD use an Accept header indicating a content type of application/json. In the case that errors prevent the export from completing, the Data Provider SHOULD respond with a FHIR OperationOutcome resource in JSON format.

Endpoint

GET [polling content location]

Responses

Response Type Description Example Response
In-Progress Returned by the Data Provider while it is processing the $export request. Response headers - no Prefer: separate-export-status header on kickoff
Status: 202 Accepted
X-Progress: "50% complete"
Retry-After: 120
Response headers - Prefer: separate-export-status header on kickoff
Status: 200 OK
X-Export-Status: 202 Accepted
X-Progress: "50% complete"
Retry-After: 120
Error Returned by the Data Provider if the export operation fails. Response headers - no Prefer: separate-export-status header on kickoff
Status: 500 Internal Server Error
Content-Type: application/fhir+json
Response headers - Prefer: separate-export-status header on kickoff
Status: 200 OK
X-Export-Status: 500 Internal Server Error
Content-Type: application/fhir+json
Body
{
  "resourceType" : "OperationOutcome",
  "id" : "export-error-operationoutcome-example",
  ...
  "issue" : [
    {
      "severity" : "error",
      "code" : "processing",
      "details" : {
        "text" : "An internal timeout has occurred"
      }
    }
  ]
}
Complete Returned by the Data Provider when the export operation has completed. Response headers - no Prefer: separate-export-status header on kickoff
Status: 200 OK
Expires: Mon, 22 Jul 2019 23:59:59 GMT
Content-Type: application/json
Response headers - Prefer: separate-export-status header on kickoff
Status: 200 OK
X-Export-Status: 200 OK
Expires: Mon, 22 Jul 2019 23:59:59 GMT
Content-Type: application/json
Body

This content is an example of the Bulk Data Manifest Logical Model and is not a FHIR Resource

    
{
  "resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkDataManifest",
  "id": "BulkDataManifestMinimalExample",
  "transactionTime": "2021-01-01T00:00:00Z",
  "requiresAccessToken": true,
  "output": [
    {
      "type": "Patient",
      "url": "https://example.org/output/patient_file_1.ndjson"
    }
  ]
}

  
Response - In-Progress Status

The Data Provider SHALL indicate an in-progress export job as follows:

Kick-off request HTTP status X-Export-Status
No separate-export-status 202 Accepted Not present
separate-export-status 200 OK 202 Accepted

The Data Provider MAY also return an X-Progress header with a text description of the status of the request that is less than 100 characters. The format of this description is at the Data Provider's discretion and MAY be a percentage complete value, or MAY be a more general status such as "in progress". The Data Consumer MAY parse the description, display it to the user, or log it.

When the allowPartialManifests kickoff parameter is true, the Data Provider MAY return a Content-Type header of application/json and a body containing an output manifest in the format described below, populated with a partial set of output files for the export. When provided, a manifest SHALL only contain files that are available for retrieval by the Data Consumer. Once returned, the Data Provider SHALL NOT alter a manifest when it is returned in subsequent requests, with the exception of optionally adding a link field pointing to a manifest with additional output files or updating output file URLs that have expired. The output files referenced in the manifest SHALL NOT be altered once they have been included in a manifest that has been returned to a Data Consumer.

Response - Error Status

The Data Provider SHALL indicate an export job failure as follows:

Kick-off request HTTP status X-Export-Status
No separate-export-status 4XX or 5XX Not present
separate-export-status 200 OK 4XX or 5XX

The body of the response SHOULD be a FHIR OperationOutcome resource in JSON format. If this is not possible (for example, the infrastructure layer returning the error is not FHIR aware), the Data Provider MAY return an error message in another format and include a corresponding value for the Content-Type header.

When the body is a FHIR OperationOutcome resource, the response SHALL include a Content-Type header of application/fhir+json.

In the case of a polling failure that does not indicate failure of the export job, a Data Provider SHOULD use a transient code from the IssueType valueset when populating the FHIR OperationOutcome resource's issue.code element to indicate to the Data Consumer that it will need retry the request at a later time.

Note: Even if some of the requested resources cannot successfully be exported, the overall export operation MAY still succeed. In this case, the Response.error array of the completion response body SHALL be populated with one or more files in NDJSON format containing FHIR OperationOutcome resources to indicate what went wrong (see below). In the case of a partial success, the Data Provider SHALL use a 200 status code instead of 4XX or 5XX. The choice of when to determine that an export job has failed in its entirety (error status) vs. returning a partial success (complete status) is left to the Data Provider.

Response - Complete Status

The Data Provider SHALL indicate a completed export job as follows:

Kick-off request HTTP status X-Export-Status
No separate-export-status 200 OK Not present
separate-export-status 200 OK 200 OK

The response SHALL include a Content-Type header of application/json and a body containing the output manifest described below.

The Data Provider SHOULD return an Expires header indicating when the files listed will no longer be available for access.

Response - Output Manifest

The output manifest is a JSON object providing metadata and links to the generated Bulk Data files. The files SHALL be accessible to the Data Consumer at the URLs advertised. These URLs MAY be served by file servers other than the Data Provider's FHIR Resource Server.

Field Cardinality Type Description
transactionTime 1..1 instant

Indicates the Data Provider's time when the query is run or files were generated. The bulk data files referenced in this manifest SHOULD NOT include any resources modified after this instant, and SHALL include any matching resources modified up to and including this instant.

requiresAccessToken 1..1 boolean

Indicates whether downloading the files referenced in this manifest requires the same authorization mechanism as the operation that resulted in the manifest. Value SHALL be true if both the Data Provider's file server and the Data Provider's FHIR API server control access using OAuth 2.0 bearer tokens. Value MAY be false for file servers that use access-control schemes other than OAuth 2.0, such as downloads from Amazon S3 bucket URLs or verifiable file servers within an organization's firewall.

manifestType 0..1 canonical

Canonical URL of the OperationDefinition for the operation associated with the provision of this manifest. E.g., http://hl7.org/fhir/uv/bulkdata/OperationDefinition/bulk-publish|1.0.0. This element will be mandatory in a future release of this IG.

request 0..1 string

Deprecated - this element SHOULD NOT be used and will be removed in a future release of this IG. When populated for backward compatibility, it contains the full URL of the original Bulk Data kick-off request. In the case of a POST request, this URL does not include the request parameters.

outputFormat 0..1 string

MIME type of the referenced bulk data output files. Defaults to application/fhir+ndjson when omitted. Corresponds to the _outputFormat parameter in a Bulk Export operation.

outputOrganizedBy 0..1 string

When resources in the output files are organized by instances of a resource type, that resource type is specified here. When each output file contains a single resource type, this element SHALL be omitted and an individual type element SHALL be included for each file in the output array.

outputOrganizedByDetail 0..1 string

Narrative text providing detail on the organizing resource listed in outputOrganizedBy. SHALL NOT be populated in the absence of the outputOrganizedBy element.

output 0..* BackboneElement

An array of file items with one entry for each generated file.

url 1..1 url

The absolute path to the file. The format of the file SHOULD reflect that requested in the _outputFormat parameter of the initial kick-off request and the outputFormat element in this manifest.

type 0..1 string

The FHIR resource type contained in the file. When the manifest does not include an outputOrganizedBy value, this element SHALL be populated. When the manifest includes the outputOrganizedBy element, this element SHALL NOT be populated.

continuesInFile 0..1 url

URL of the next output file when resources for an organizing resource span multiple files.

count 0..1 integer

The number of resources in the file.

fileSize 0..1 integer

The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file.

deleted 0..* BackboneElement

References to files containing pointers to deleted resources in the form of FHIR Transaction Bundles. Each line in the output files SHALL contain a FHIR Bundle with a type of transaction which SHALL contain one or more entry items that reflect a deleted resource. In each entry, the request.url and request.method elements SHALL be populated and request.method SHALL be set to DELETE.

url 1..1 url

The absolute path to the file.

count 0..1 integer

The number of resources in the file.

fileSize 0..1 integer

The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file.

error 0..* BackboneElement

Files containing OperationOutcome resources. Error, success, warning, information and other messages related to the operation SHOULD be included here (not in output). This element will be renamed to status in a future release of this IG.

url 1..1 url

The absolute path to the file.

count 0..1 integer

The number of resources in the file.

fileSize 0..1 integer

The size of the file in bytes. This provides Data Consumers with information about the storage and processing requirements for downloading and parsing the file.

countSeverity 0..* BackboneElement

Count of OperationOutcome resources grouped by severity level.

code 1..1 code

Severity level from OperationOutcome.issue.severity (fatal, error, warning, information, success)

count 1..1 integer

The number of OperationOutcome resources in the file with this severity level.

link 0..* BackboneElement

Link to related manifest.

relation 1..1 string

The relationship type. A value of 'next' indicates the URL points to the location of another manifest containing additional output files.

url 1..1 url

URL pointing to the location of another manifest. All fields in the linked manifest SHALL be populated with the same values as this manifest, apart from the contents of output, deleted, and link.

Implementation notes:

  • For transactionTime, to properly meet the inclusion constraints above, the Data Provider's FHIR server might need to wait for any pending transactions to resolve in its database before starting the export process.
  • Error, warning, and information messages related to the export SHOULD be included in error and not in output. If there are no relevant messages, the Data Provider SHOULD return an empty array. If the request contained invalid or unsupported parameters along with a Prefer: handling=lenient header and the Data Provider processed the request, the Data Provider SHOULD include a FHIR OperationOutcome resource for each of these parameters.
  • When the _since timestamp is supplied in the export request, the deleted array SHOULD be populated with files containing FHIR transaction Bundles for resources that match the kick-off request criteria but were deleted after the _since date. If no resources have been deleted, if _since was not supplied, or if the Data Provider has other reasons to avoid exposing these data, the Data Provider MAY omit this key or return an empty array. Resources that appear in deleted SHALL NOT also appear in output.

  • When the allowPartialManifests kickoff parameter is true, the manifest MAY include a link array with a single object containing a relation field with a value of next, and a url field pointing to the location of another manifest. All fields in the linked manifest SHALL be populated with the same values as the manifest with the link, apart from the output, deleted, error, and link arrays.
  • If the export has failed or a transient error has occurred, a Data Provider MAY return an error in response to a request for the next link, as described in the Error Status section above. For non-transient errors, a Data Consumer MAY process resources that have already been retrieved before re-running the export job or MAY discard them.

Example manifest, organizeOutputBy kickoff parameter is not populated:

This content is an example of the Bulk Data Manifest Logical Model and is not a FHIR Resource

    
{
  "resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkDataManifest",
  "id": "BulkDataManifestByTypeExample",
  "transactionTime": "2021-01-01T00:00:00Z",
  "requiresAccessToken": true,
  "output": [
    {
      "type": "Patient",
      "url": "https://example.org/output/patient_file_1.ndjson"
    },
    {
      "type": "Observation",
      "url": "https://example.org/output/observation_file_1.ndjson"
    },
    {
      "type": "Observation",
      "url": "https://example.org/output/observation_file_2.ndjson"
    }
  ],
  "deleted": [
    {
      "url": "https://example.org/output/del_file_1.ndjson"
    }
  ],
  "error": [
    {
      "url": "https://example.org/output/err_file_1.ndjson"
    }
  ],
  "extension": [
    {
      "url": "http://example.org/fhir/StructureDefinition/includes-telehealth-patients",
      "valueBoolean": true
    }
  ]
}

  

View Example

Example manifest, organizeOutputBy kickoff parameter is Patient, and allowPartialManifests kickoff parameter is true:

This content is an example of the Bulk Data Manifest Logical Model and is not a FHIR Resource

    
{
  "resourceType": "http://hl7.org/fhir/uv/bulkdata/StructureDefinition/BulkDataManifest",
  "id": "BulkDataManifestOrganizedByPatientExample",
  "transactionTime": "2021-01-01T00:00:00Z",
  "requiresAccessToken": true,
  "outputOrganizedBy": "Patient",
  "output": [
    {
      "url": "https://example.org/output/file_1.ndjson"
    },
    {
      "url": "https://example.org/output/file_2.ndjson",
      "continuesInFile": "https://example.org/output/file_3.ndjson"
    },
    {
      "url": "https://example.org/output/file_3.ndjson"
    }
  ],
  "deleted": [
    {
      "url": "https://example.org/output/del_file_1.ndjson"
    }
  ],
  "error": [
    {
      "url": "https://example.org/output/err_file_1.ndjson"
    }
  ],
  "extension": [
    {
      "url": "http://example.org/fhir/StructureDefinition/includes-telehealth-patients",
      "valueBoolean": true
    }
  ],
  "link": [
    {
      "relation": "next",
      "url": "https://example.org/output/manifest-2.json"
    }
  ]
}

  

View Example

Example deleted resource bundle (represents one line in an output file):

{
  "resourceType" : "Bundle",
  "id" : "deleted-resource-transaction-bundle-example",
  "meta" : {
    "lastUpdated" : "2020-04-27T02:56:00Z"
  },
  "type" : "transaction",
  "entry" : [
    {
      "request" : {
        "method" : "DELETE",
        "url" : "Patient/123"
      }
    }
  ]
}

View Example


Bulk Data Delete Request

After an asynchronous bulk request has been started, a Data Consumer MAY send a DELETE request to the URL provided in the Content-Location header to cancel the request. If the request has been completed, a Data Provider MAY use the request as a signal that the Data Consumer is done retrieving files and that it is safe for the Data Provider to remove those from storage. Following the delete request, when subsequent requests are made to the polling location, the Data Provider SHALL return a 404 Not Found error and an associated FHIR OperationOutcome resource in JSON format.

Endpoint

DELETE [polling content location]

Response - Success

The Data Provider SHALL return a successful delete response with HTTP status 202 Accepted.

The Data Provider MAY include a FHIR OperationOutcome resource in the body in JSON format.

Response - Error

The Data Provider SHALL return an error response with:

  • HTTP status 4XX or 5XX
  • FHIR OperationOutcome resource in the body in JSON format

Bulk Data Output File Request

Using the URLs supplied by the Data Provider in the manifest, a Data Consumer MAY download the referenced files within the time period specified in the Expires header, if present. A Data Consumer MAY re-fetch the manifest if file links have expired, and a Data Provider MAY provide updated links or an updated Expires timestamp in response.

As long as a Data Provider is following relevant security guidance, it MAY generate manifests where the requiresAccessToken field is true or false, including for Data Providers available on the public internet.

If the requiresAccessToken field in the manifest is set to true, the request SHALL include a valid access token.

If the requiresAccessToken field is set to false and no additional authorization-related extensions are present in the relevant manifest entry, then the URLs SHALL be dereferenceable directly as capability URLs. A Data Consumer SHALL NOT provide a SMART Backend Services access token when dereferencing a URL from a manifest entry where requiresAccessToken is false.

Returned content SHALL include only the most recent version of any returned resources unless the Data Consumer explicitly requests different behavior in a fashion supported by the Data Provider. Inclusion of the Resource.meta information in the resources is at the discretion of the Data Provider, as it is for all FHIR interactions.

A Data Consumer SHOULD provide an Accept-Encoding header when requesting files and SHOULD include gzip compression as one of the encoding options in the header. A Data Provider SHALL provide files as uncompressed, with gzip compression, or with another compression format from the Accept-Encoding header. When compression is used, a Data Provider SHALL communicate this to the Data Consumer by including a Content-Encoding header in the response. A Data Consumer SHALL accept files that are uncompressed or encoded with gzip compression, and MAY accept files encoded with other compression formats.

Endpoint

GET [url from manifest output, deleted, or error field]

Headers
  • Accept (optional, defaults to application/fhir+ndjson)

Specifies the format of the file being requested.

Response - Success

The Data Provider SHALL return a successful file response with:

  • HTTP status 200 OK
  • Content-Type header that matches the file format being delivered
  • Body of FHIR resources in newline delimited JSON, NDJSON, or another requested format

For files in NDJSON format, the Content-Type header SHALL be application/fhir+ndjson.

Response - Error

The Data Provider SHALL return an error response with HTTP status 4XX or 5XX.

Bulk Data Output File Organization

Output files may be organized by resource type, or by instances of a resource type specified in the organizeOutputBy kickoff parameter.

When the organizeOutputBy kickoff parameter is not populated, each output file SHALL contain resources of only one type, and a Data Provider MAY create more than one file for each resource type returned. The number of resources contained in a file MAY vary between Data Providers and files.

When the organizeOutputBy kickoff parameter is populated with a resource type, the output files SHALL be populated with blocks consisting of a header Parameters resource containing a parameter named header with a reference to a resource of the type in the kickoff parameter, followed by the resource referenced in this header and resources that reference the resource referenced in the header (together a "resource block"). Each output file MAY contain multiple resource blocks and, when possible, a single resource's block SHOULD NOT be split across files. If a resource block does span more than one file, the header SHALL be repeated at the start of each file where the block continues, and the association between these files SHALL be documented in the manifest using the continuesInFile field in the relevant output array items.

Resources that would otherwise be included in the export, but do not have references to the resource type specified in the organizeOutputBy parameter, MAY be included in resource blocks that contain resources they reference, MAY be repeated in every resource block, or MAY be omitted from the export.

When the organizeOutputBy parameter is set to Patient, Data Providers SHOULD use the Patient Compartment Definition to determine a base set of related resources to include in a resource block, though other resources may also be included. For other resource types, we are soliciting feedback on the best approach for documenting the set of resources in a resource block. Implementation Guides MAY reference a Compartment Definition, populate a GraphDefinition Resource, include narrative text, or use another approach.

Example NDJSON file when the organizeOutputBy parameter in the kickoff request is not populated:

{"id":"p-1","resourceType":"Patient", "name":[{"given":["Brenda"],"family":"Jackson"}],"gender":"female", ...}
{"id":"p-2","resourceType":"Patient", "name":[{"given":["Bram"],"family":"Sandeep"}],"gender":"male", ...}
{"id":"p-3","resourceType":"Patient", "name":[{"given":["Sandy"],"family":"Hamlin"}],"gender":"female", ...}
{...}

Example NDJSON file when the organizeOutputBy parameter in the kickoff request is set to Patient:

{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-1"}}]}
{"id": "p-1", "resourceType": "Patient", ...}
{"id": "c-1", "resourceType": "Condition", "subject":{"reference": "Patient/p-1"}, ...}
{"id": "o-1", "resourceType": "Observation", "subject":{"reference": "Patient/p-1"}, ...}
{...}
{"resourceType": "Parameters", "parameter": [{"name": "header", "valueReference": {"reference": "Patient/p-2"}}]}
{"id": "p-2", "resourceType": "Patient", ...}
{"id": "c-101", "resourceType": "Condition", "subject":{"reference": "Patient/p-2"}, ...}
{"id": "o-102", "resourceType": "Observation", "subject":{"reference": "Patient/p-2"}, ...}
{...}
Attachments

If resources in a returned file contain elements of the type Attachment, the Data Provider SHOULD populate the Attachment.contentType code as well as either the data element or the url element. If the data element is not populated and the url element is populated, the url element SHALL be an absolute URL that can be dereferenced to the attachment's content.

When the url element is populated with an absolute URL and the requiresAccessToken field in the manifest is set to true, the URL location SHALL be accessible by a Data Consumer with a valid access token and SHALL NOT require the use of additional authentication credentials. When the url element is populated and the requiresAccessToken field in the manifest is set to false, the URL location SHALL be accessible by a Data Consumer without an access token.

Note that if a Data Provider copies files to the Bulk Data output endpoint or proxies requests to facilitate access from this endpoint, it may need to modify the Attachment.url element when generating the Bulk Data output files.

Data Provider Capability Documentation

This implementation guide is structured to support a wide variety of Bulk Data Export use cases and Data Provider architectures. To provide clarity to developers on which capabilities are implemented by a particular Data Provider, Data Providers SHALL ensure that their CapabilityStatement accurately reflects the implemented Bulk Data Operations. Additionally, the Data Provider's CapabilityStatement SHOULD list the resource types available for export in the rest.resource element, and SHOULD list the search parameters that can be used in the _typeFilter parameter in rest.resource.searchParam elements.

Data Providers SHOULD indicate resource types and search parameters that are accessible through the REST API, but not available using the Bulk Export operation, with one or more extensions that have a URL of http://hl7.org/fhir/uv/bulkdata/Extension/operation-not-supported and a valueCanonical with the canonical URL for the OperationDefinition of the bulk operation that is not supported. Alternatively, the extension may be populated with the canonical URL for the FHIR Bulk Data Access Implementation Guide CapabilityStatement when none of the bulk operations are supported.

Data Providers SHOULD also ensure that their documentation addresses the topics below. Future versions of this IG may define a computable format for this information as well.

  • Does the Data Provider restrict responses to a specific profile like the US Core Implementation Guide or the Blue Button Implementation Guide?
  • What approach does the Data Provider take to divide data sets into multiple files (e.g., single file per single resource type, limit file size to 100MB, limit number of resources per file to 100,000)?
  • Are additional supporting resources such as Practitioner or Organization included in the export and under what circumstances?
  • Does the Data Provider support system-wide (or all-patients, or Group-level) export? What parameters are supported for each request type? Note that this SHOULD also be captured in the Data Provider's CapabilityStatement.
  • What outputFormat values does this Data Provider support?
  • In the case of a Group level export, does the _since parameter return additional resources modified prior to the supplied time if the resources belong to the patient compartment of a patient added to the Group after the supplied time?
  • What includeAssociatedData values does this Data Provider support?