Working with Test Data Factories

Guidance for FHIR IG Creation
0.1.0 - CI Build International flag

Guidance for FHIR IG Creation, published by HL7 International - FHIR Management Group. This guide is not an authorized publication; it is the continuous build for version 0.1.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/FHIR/ig-guidance/ and changes regularly. See the Directory of published versions

Liquid Templates
Profile Based Generation
Test Factory Control File
Output Filename
Generation Log
Liquid Processing Rules
Profile Based Generation

The IG publisher can generate a set of resources in a test data directory from a spreadsheet. The factory is controlled by an ini file that sets up the parameters for the factory.

There are two kinds of factories:

Liquid Template
Profile Generation

Note that you can have multiple factories that reuse the same data files

Liquid Templates

A liquid template is conceptually simple: a liquid template that constructs an instance of a resource from a set of data. The author provides the template, and the data generation is predictable - just based on what the template and the data source provide. E.g. the data source as a set of columns and the liquid template refers to the columns, laying them out in the resource. One instance is created for each row in the column.

Liquid templates use the FHIR variant of the basic liquid syntax, which uses FHIRPath for expressions in the liquid template.

The liquid template must fully populate the resource, though if it leaves the Resource.id out, an autogenerated id will be added. The liquid template can produce either XML or JSON. In the case of JSON, the resource is treated as JSON5 and converted to normal JSON after it is run - this means that you don't have to get the commas correct in the generated json.

Profile Based Generation

Profile based generation works differently - there is no script laying out the content. Instead, the instances are generated based on the defined profile, including fixed values, pattern values and bindings. The data used in these generated test instances comes from one of three sources (in order of preference):

Locally provided data in the form of a spreadsheet with a mapping script (see below)
All the data used in published examples in the FHIR ecosystem
Randomly generated garbage data (if there's nothing from the ecosystem)

The details of how the locally provided data works is described below.

Test Factory Control File

[factory]
type=liquid|template
data={data-file}
liquid={template-file}
profile={url}
mappings={mapping-file}
filename={filename}
format=json | xml
bundle=true|false
log={log-name}

[table]
name=file

where:

type - whether to use a liquid template or the profile driven factory
data-file: A relative path to a CSV or excel file containing the data, where the first row contains the names of the columns
liquid: a relative path to a liquid template that builds a resource
profile: the URL of a profile to use as the template for generating the instance
mapping: A json file describing how the data file maps into the generated instances (described below)
filename: A script that controls the name of the output file (see immediately below)
format: the format of the generated file (doesn't have to match the format that a liquid template produces)
bundle: if true, the generated resources will be wrapped into a bundle and only a single file created
log: the name by which the factory should be logged (see below)
As specified in the mode, you nominate either a liquid template or a profile and a mapping script.

Also, you can nominate other tables, where the table is a relative path to a CSV or excel file containing a table of data.

Output Filename

The output filename controls where the generated data goes. It is a relative path (relative to the repository root folder). When bundle=true, it's a static filename for the single bundle produced by the generation. In the case where individual resources are produced, the filename is a script that looks like this: test/$type$-{$id}.json

The following variables can be used in the filename:

$type$ - the resource type
$id$ - the id of the resource
$counter$ - a factory scoped serially incrementing counter starting at 1
$format$ - either json or xml depending on the format for the factory

Generation Log

A log of the process of running the test data factory will be generated in output/qa-factory-$log-name$.txt. One reason it's provided is to help users see the paths in the profile generation (used below)

Liquid Processing Rules

The spreadsheet should not contain any names containing spaces, or '-'. Also, the sheet cannot contain a column named 'counter'. Or else a data mapping file must be used (see below).

For a liquid template, the template does not need to get the commas correct in json - the json is reprocessed once the liquid script is complete to fix up the commas. (it must produce valid json5 output)

Data Lookup

The [tables] section in the ini file contains a list of named files. The data in the files will be available in the liquid template using [name].cell(row, col) where:

[name] is the name in the ini file
cell(row,col) gives access to the data. Row is an integer (1 based), and column is either an integer (1 based) or a name

Profile Based Generation

In this mode, the instances are generated based on the information in the profile. The tighter the profile, the more coherent the generated instance will be.

The intention of the spreadsheet approach is to support a user provided database. For this reason, the source has two parts: the source data, and a mappings script that describes how data in the spreadsheet is converted to FHIR data. The intention here is to support non-technical (e.g. clinical) users to provide the sample data. One instance is created per table row.

Because the data providers aren't expected or required to be technical, here's a list of things the mapping script can do to massage the data into shape ready to go in a resource:

map from a named column to a path in the resource
build a complex data from multiple columns
extract a value from a column text
look up a value based on a number/code

The mapping script looks like this:

{
  "format-version" : 1,
  "values" : [{
    "path" : "{path}",
    "source" : [{
      "property" : "{prop-name}",
      "column" : "{name}",
      "regex" : "{regex}",
      "constant" : "value"
    }]
  }]
}

Documentation:

path: the path in the generated instance where the data will go. The path must match the correct path from the generation log
values: one or more source columns in the spreadsheet that contribute source to this value
- If there's only one column, which matches a FHIR primitive type, there should be no property name provided
- If more than one column is named, the value is a complex that must match a FHIR Data type, and property names must be provided
property: When provided, the property names must match the names of the FHIR properties e.g. code, or period.start
column: the name in the provided main data spreadsheet
regex: a regex that extracts the data from the column
constant: Sometimes a fixed value is needed - e.g. providing a code system URL. In this case, provide a constant rather than a column (and no regex)

Note that a single column can appear in the values list more than once, usually with different regexes.

Examples:

IG © 2019+ HL7 International - FHIR Management Group. Package hl7.fhir.uv.howto#0.1.0 based on FHIR 5.0.0. Generated 2024-12-19
Links: Table of Contents | QA Report