Guidance for FHIR IG Creation
0.1.0 - CI Build International flag

Guidance for FHIR IG Creation, published by HL7 International - FHIR Management Group. This guide is not an authorized publication; it is the continuous build for version 0.1.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/FHIR/ig-guidance/ and changes regularly. See the Directory of published versions

Working with Test Data Factories

The IG publisher can generate a set of resources in a test data directory from a table of data. The factory is controlled by an json file that sets up the parameters for the factory.

There are two kinds of factories:

  • Liquid Template
  • Profile Generation

Note that you can have multiple factories that reuse the same source data tables

Liquid Templates

A liquid template is conceptually simple: a liquid template that constructs an instance of a resource from a set of data. The author provides the liquid script, and the data generation is predictable - just based on what the template and the data source provide. E.g. the data source as a set of columns and the liquid template refers to the columns, laying them out in the resource. One instance is created for each row in the column.

Liquid templates use the FHIR variant of the basic liquid syntax, which uses FHIRPath for expressions in the liquid template.

The liquid template must fully populate the resource, though if it leaves the Resource.id out, an autogenerated id will be added. The liquid template can produce either XML or JSON. In the case of JSON, the resource is treated as JSON5 and converted to normal JSON after it is run - this means that you don't have to get the commas correct in the generated json.

Profile Based Generation

Profile based generation works differently - there is no script laying out the content. Instead, the instances are generated based on the defined profile, including fixed values, pattern values and bindings. The data used in these generated test instances comes from one of three sources (in order of preference):

  • Locally provided data in the form of a spreadsheet with mapping details (see below)
  • All the data used in published examples in the FHIR ecosystem
  • Randomly generated garbage data (if there's nothing from the ecosystem)

The details of how the locally provided data works is described below.

Defining a Test Factory

Test Data Factories are defined using the parameter test-data-factories:

  <parameter>
    <code>
      <system value="http://hl7.org/fhir/tools/CodeSystem/ig-parameters"/>
      <code value="test-data-factories"/>
    </code>       
    <value value="factories/factories.json"/>
  </parameter>

Multiple test-data-factories are allowed, but since each json file can define multiple factories, there's usually only one entry. By convention, factories are defined in the folder 'factories' but this is not required. The value points to an json file with this format:

Test Factory Control File

{
  "factories-version" : 1,
  "factories" : [{
    // one entry for each factory
  }
}

Each entry in the factory control file has the following format:

{
  "name" : "{factory-name}",
  "type" : "liquid|profile",
  "liquid" : "{template-file}",
  "profile" : "{url}",
  "data" : "{data-source}",
  "filename" : "{filename}",
  "format" : "json|xml",
  "bundle" : true|false,
  "tables" : {
    "name" : "{data-source}",
  },
  "filter" : "{fhirpath expression}",
  "mappings" : [{
    // mapping details - see below
  }]

where:

  • name (mandatory): the name of the factory
  • type (mandatory): whether to use a liquid template or the profile driven factory
  • data-source (mandatory): A path to a source data table containing the data to drive generation (see below)
  • liquid (if liquid): a relative path to a liquid template that builds a resource
  • profile (if profile): the URL of a profile to use as the template for generating the instance
  • filename (mandatory): A script that controls the name of the output file (see immediately below)
  • format (optional): the format of the generated file (doesn't have to match the format that a liquid template produces)
  • bundle (optional): if true, the generated resources will be wrapped into a bundle and only a single file created
  • tables (optional): other tables
  • filter (optional): if present, a FHIR Path expression that must evaluate to true or the row is ignored when processing the source data
  • mapping (if profile): Describes how the data table maps into the generated instances (described below)
  • As specified in the mode, you nominate either a liquid template or a profile and a mapping script.

Source Data Tables

Source Data can be provided in multiple different forms:

  • As a text file containing comma separated values (.csv)
  • As an excel spreadsheet file (.xlsx). Note that you nominate the sheet name by appending ;{name} to the filename. In the absence of a sheet name, the first sheet will be used.
  • An SQLite db file (*.db). Note that you nominate the table or view name by appending ;{name} to the filename (required)
  • A value set. Nominate the canonical URL of the ValueSet.

For both .csv and .xlsx, the first row contains the names of the columns.

For all data sources, an additional column named counter is created, which is the index of the current row, a serially incrementing number starting a 1. None of the data sources can provide a column name 'counter' of their own.

Output Filename

The output filename controls where the generated data goes. It is a relative path (relative to the repository root folder). When bundle=true, it's a static filename for the single bundle produced by the generation. In the case where individual resources are produced, the filename is a script that looks like this: test/Patient-{$counter$}.json, where any $xxx$ will be interpreted as a reference to a named column in the primary data source

Generation Log

A log of the process of running the test data factory will be generated in output/qa-factory-$log-name$.txt. One reason it's provided is to help users see the paths in the profile generation (used below)

Liquid Processing Rules

The liquid template must produce resources in the specified format. If the template produces JSON, the commas do not need to be correct - the json is reprocessed once the liquid script is complete to fix up the commas (it must produce valid json5 output).

Each row of the data table is passed to the liquid template as a 'row' object whose properties are the named columns in the data table. E.g. if the data table has a column name, then the liquid statement {{ row.name }} inserts the value of the name column in the row. The data table should not contain any names containing spaces, or '-'.

The liquid template can produce a resource of any type (doesn't have to produce the same type). If bundle=true, the Liquid template should not produce a Bundle resource unless the desire is to have a Bundle of Bundles - the liquid script will run once for each row of data.

Data Lookup

The tables section of the configuration contains a list of named files. The data in the files will be available in the liquid template using [name].cell(row, col) where:

  • [name] is the name in the ini file
  • cell(row,col) gives access to the data. Row is an integer (1 based), and column is either an integer (1 based) or a name
  • lookup(lookupCol, value, outputCol) looks up a value in lookupCol, and returns the value in outputCol (or null)

Globals

In addition, a Global object is available as Globals. which has the following properties:

  • dateTime: the date and time in FHIR format of the instant that processing started
  • path: the path to the base FHIR specification (correct version path)

Profile Based Generation

In this mode, the instances are generated based on the information in the profile. The tighter the profile, the more coherent the generated instance will be.

If provided, an instance will be generated for each row in the primary data source. The intention with regard to the primary data source is to support user provided information. For this reason, there is a mapping table that maps between the source source data and proper FHIR data. The intention here is to support non-technical (e.g. clinical) users to provide the sample data.

Because the data providers aren't expected or required to be technical, here's a list of things the mapping script can do to massage the data into shape ready to go in a resource:

  • map from a named column to a path in the resource
  • build a complex data from multiple columns
  • extract a value from a column text
  • look up a value based on a number/code

Entry mapping entry looks like this:

{
  "path" : "{path}",
  "fhirType" : "{type}",
  "if" : "{fhirpath expression}",
  "expression" : "{fhirpath expression}",
  "parts" : [{
    "name" : "{prop-name}",
    "expression" : "{fhirpath expression}"
  }]
}

Documentation:

  • path: the path in the generated instance where the data will go
    • The path must match a correct path from the generation log
    • The first entry that matches the set of paths will be used
    • the path value can be one of the following (but just see the log)
      • The stated path in the profile
      • The underlying StructureDefinition.snapshot.element.id
      • The underlying StructureDefinition.snapshot.element.path
      • A hybrid id - either StructureDefinition.snapshot.element.id or ([extension.url]).value
  • fhirType - use when the type is polymorphic and not fixed in the profile. Can be either the name of a type, or a FHIRPath expression that returns the name of a type
  • if - if this is present, evaluate the expression, and only use the entry if the result is true
  • expression: An expression which evaluates to the value. See below for details
  • parts: a series of named expressions where the name of each part corresponds to a property name of a type
  • Each part contains either an expression, or a set of parts

There is 3 ways to refer to a column from the source data in the expression:

  • just by name, when the name of the column is a valid FHIRPath token e.g. "expression" : "patientId" where patientId is the name of the column in the source data
  • using the function column(name): e.g. "expression" : "column('Patient ID')" where Patient ID is the name of the column in the source data
  • using the function dateColumn(name) to wrangle with date formats e.g. "expression" : "column('Date of Birth', 'M/d/yyyy')" where Date of Birth is the name of the column in the source data, and M/d/yyyy is the format of the column. For format advice, see https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html.

Notes:

  • all columns and cells have surrounding whitespace trimmed from the value
  • the date time formatter runs in English mode
  • date handling in excel is complicated, so pay attention to the date formats in the log

Examples

This IG includes some examples. You can find the output from the examples in the package, or you can look in the package source to see how they work

Name Mode Script Flags Description
LiquidDemo liquid factories/patient.liquid json Bundle

A simple liquid script showing how to look up a random value in a table

PatientGenerator profile http://hl7.org/fhir/uv/howto/StructureDefinition/test-patient-profile json

Generate instances based on a profile in the IG, and fill out values from an excel spreadsheet

EncounterGenerator liquid factories/encounter.liquid json

Another liquid script showing how to do conditional content

BloodPressureGenerator profile http://hl7.org/fhir/StructureDefinition/bp json

A more complex example. Since this is a wide open profile, a lot of what the mappings do is suppress columns

WeightGenerator profile http://hl7.org/fhir/StructureDefinition/bodyweight json

Shows how to to do conditional content depending on the content of the spreadsheet

WarfarinGenerator profile http://hl7.org/fhir/StructureDefinition/MedicationStatement json

Shows how to filter the rows in the first place