SQL on FHIR
0.0.1-pre - ci-build International flag

SQL on FHIR, published by HL7. This guide is not an authorized publication; it is the continuous build for version 0.0.1-pre built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/FHIR/sql-on-fhir-v2/ and changes regularly. See the Directory of published versions

Home

Official URL: http://hl7.org/fhir/uv/sql-on-fhir/ImplementationGuide/hl7.fhir.uv.sql-on-fhir Version: 0.0.1-pre
Draft as of 2024-04-30 Computable Name: SQLonFHIR

This is an evolution of the original “SQL on FHIR” draft, which can still be found here.

Intro

The FHIR® standard is a great fit for RESTful and JSON-based systems, helping make healthcare data liquidity real. This spec aims to take FHIR usage a step futher, making FHIR work well with familiar and efficient SQL engines and surrounding ecosystems.

We do this by creating simple, tabular views of the underlying FHIR data that are tailored to specific needs. Views are defined with FHIRPath expressions in a logical structure to specify things like column names and unnested items.

Let’s start with a simple example, defining a “patient_demographics” view with the following ViewDefinition structure:

{
  "name": "patient_demographics",
  "resource": "Patient",
  "select": [
    {
      "column": [
        {
          "path": "getResourceKey()",
          "name": "id"
        },
        {
          "path": "gender",
          "name": "gender"
        }
      ]
    },
    {
      // Create columns from the official name selected here.
      "forEach": "name.where(use = 'official').first()",
      "column": [
        {
          "path": "given.join(' ')",
          "name": "given_name",
          "description": "A single given name field with all names joined together."
        },
        {
          "path": "family",
          "name": "family_name"
        }
      ]
    }
  ]
}

This will result in a “patient_demographics” table that looks like this. The table can be persisted and queried in your database of choice, using the view name as the table name:

id gender given_name family_name
1 female Malvina Gerda Vicario
2 male Yolotzin Adel Bristow
3 other Jin Gomer Aarens

Such tabular views can be created for any FHIR resource, with more examples here. See the View Definition page for the full definition of the above structure.

System Layers

The View Definition is the central element of this spec, but in practice it is really one layer of an overall system. The layers are:

  • the Data Layer
  • the View Layer
  • and the Analytics Layer.

High-level diagram of layers

Figure 1: High-level diagram of layers

The Data Layer

The Data Layer is a set of lossless representations that collectively enable FHIR to be used with a wide variety of different query technologies. It may optionally be persisted and annotated to make it or implementations of the view layer more efficient, but no specific Data Layer structure will be required by this specification.

Implementations are encouraged but not required to further annotate the FHIR resources to help View layer implementations run efficient queries. This primarily applies when the underlying FHIR resources are stored in databases that the View layer will query.

The View Layer

The View Layer defines portable, tabular views of FHIR data that can more easily be consumed by a wide variety of analytic tools. The use of these tools is described in Analytics Layer section. Our goal here is simply to get the needed FHIR data in a form that matches user needs and common analytic patterns.

The View Layer itself has two key components:

  • View Definitions, allowing users to define flattened views of FHIR data that are portable between systems.
  • View Runners are system-specific tools or libraries that apply view definitions to the underlying data layer, optionally making use of annotations to optimize performance.

See the View Definition documentation for details and examples; these are the central piece of this specification.

View Runners will be specific to the data layer they use. Each data layer may have one or more corresponding view runners, but a given View Definition can be run by many runners over many data layers.

Example view runners may include:

  • A runner that creates a virtual, tabular view in an analytic database
  • A runner that queries FHIR JSON directly and creates a table in a web application
  • A runner that loads data directly into a notebook or other data analysis tool

Generating Schemas

The output of many runners will have technology-specific schemas, such as database table definitions or schema for structured files like Parquet. This will be runner- and technology- specific, but runner implementaitons SHOULD offer a way to compute that schema from a ViewDefinition when applicable.

For example, a runner that produces a table in a database system could return a “CREATE TABLE” or “CREATE VIEW” statement based on the ViewDefinition, allowing the system to define tables prior to populating them by evaluating the views over data.

This would not apply to outputs that do not have common a schema specification, like CSV files.

The Analytics Layer

Finally, users must be able to easily leverage the above views with the analytic tools of their choice. This spec purposefully does not define what these are, but common use cases may be SQL queries by consuming applications, dataframe-based data science tools in Python or R, or integration with business intelligence tools.

License

FHIR® is the registered trademark of HL7 and is used with the permission of HL7. Use of the FHIR trademark does not constitute endorsement of the contents of this repository by HL7, nor affirmation that this data is conformant to the various applicable standards


Next: Purpose