Structured Data Capture
2.8.0 - CI Build

Structured Data Capture, published by HL7 International - FHIR Infrastructure Work Group. This is not an authorized publication; it is the continuous build for version 2.8.0). This version is based on the current content of https://github.com/HL7/sdc/ and changes regularly. See the Directory of published versions

Form data extraction

Questionnaires are excellent tools for data capture. They allow tight control over what data is gathered and ensure information is gathered consistently across multiple users. However, data gathered using different questionnaires - or even different versions of the same questionnaire - is often not comparable. It is also not very searchable or easily integrated with discrete data sources. Because of this, the general recommendation in FHIR is to use questionnaires for raw data capture but then to convert the resulting QuestionnaireResponse instances into other FHIR resources - Observations, MedicationStatements, FamilyMemberHistories, etc. This allows the data gathered to then be easily combined with other data into FHIR documents and messages and exposed over FHIR REST interfaces.

Such conversion can be done with custom code written on a Questionnaire by Questionnaire basis. However, it makes the process much easier if it's possible to write generic software that can convert any arbitrary QuestionnaireResponse into appropriate FHIR resources leveraging metadata embedded in the Questionnaire. This portion of the SDC guide defines mechanisms for doing so.

Caveats, considerations and rules with form data extraction

  • Extraction is a step that only makes sense to occur once a QuestionnaireResponse is completed. Prior to that, the information to create valid resource instances may not be available and conversion logic is likely to fail. The types of data produced will vary by Questionnaire.
  • Some questionnaires might result in a single resource. Others will produce a Bundle of resources. In some cases, the result might be a transaction intended to create some resources and update others.
  • If the Questionnaire is being designed with conversion to a resource in mind, conversion will be a straight-forward process because the questions will align well with how FHIR stores data - and the questions can even be tuned to align with particular profiles around the use of terminology, which elements are mandatory, etc. However, if converting data from questionnaires designed without FHIR in mind, mapping may be more challenging. Required elements may need to be inferred, codes may need to be transformed and other transformations may be necessary to ensure that converted data meets expectations for use, such as aligning with country-specific implementation guides. In some cases, alignment may not be possible.
  • Once a QuestionnaireResponse has been converted, it might not be necessary to retain the QuestionnaireResponse any longer, as all data access and subsequent maintenance will occur through the 'traditional' resources. However, in many environments the QuestionnaireResponse will be retained anyhow to keep a record of the original source-of-truth and for traceability reasons.
  • When resources have been generated from a QuestionnaireResponse, the Provenance instance associated with the creation of the resource instance(s) can and SHOULD include an entity reference of type 'source' that points back to the original QuestionnaireResponse. If Observations are generated, they can also have an explicit derivedFrom link pointing back to the QuestionnaireResponse
  • In theory, it's possible to use Questionnaire as a user-facing interface to allow maintenance of one or more resources. The source data can be used to populate the QuestionnaireResponse and once the response is 'submitted', the data can then be extracted and used to update the existing resources. For this to work, the id of the resource must be retained through the round-trip process to allow for the update. This can be supported by storing the id as a hidden question not shown to the user. Note that this is only relevant for the definition-based and StructureMap-based approaches. It is not relevant for the Observation-based approach.
  • Sometimes the author interested in making a Questionnaire "extractable" does not have authority to make changes to the "official" Questionnaire. In other cases, there might be one official Questionnaire, but a need to create extracted resources that comply with different sets of profiles - and thus a need for different metadata in the Questionnaire to support the extraction process. In this case, rather than basing the extraction on the original Questionnaire, it can be based on a derived Questionnaire - one that has a Questionnaire.derivedFrom relationship to the same canonical URL the QuestionnaireResponse refers to. The derived Questionnaire would contain the same content as the base Questionnaire, but would have additional extensions inserted to support data extraction.
  • When capturing quantities, it's common for questionnaires to prompt for the numeric value and to note the 'fixed' unit as part of the question. For example, "Please specify the patient's weight in kilograms". When extracting the value for representation in a resource, a unit will be added. The questionnaire-unit extension SHOULD be included on the question to support the extraction process.

Extraction service

Like Questionnaire population, extracting data from a QuestionnaireResponse is a complex process involving querying existing FHIR data and using more advanced technologies such as FHIRPath and StructureMap. It's therefore a function that systems may also wish to offload to a separate system. The QuestionnaireResponse extract has been created for this purpose. It takes in a completed QuestionnaireResponse and returns either an individual FHIR resource or a Bundle of resources, depending on the type of Questionnaire. The operation does not post the created resources to a server. It's up to the client system to determine what action(s) to take with the created content.

NOTE: It's the responsibility of the client system to ensure that any generated resources are valid against necessary profiles, etc. before using content produced by this operation.

Designing Questionnaires to support data extraction

This specification defines three different mechanisms to embed information in Questionnaires to support subsequent resource extraction:

Systems are free to experiment with other extraction mechanisms but cannot expect support for those from other SDC-conformant systems.

Each mechanism has its own profile that includes the additional resource elements or extensions relevant for supporting a particular mechanism: SDC Questionnaire Extract - Observation, SDC Questionnaire Extract - Definition, and SDC Questionnaire Extract - Structure Map profiles. Each profile identifies specific 'must support' elements and extensions that systems that claim to support a specific SDC extraction mechanism SHALL be capable of extracting data, as befits the CapabilityStatement(s) they claim conformance to. Each system should choose which approach(es) it wishes to use and support based on the elements specified in that profile.

Some of these mechanisms make use of FHIR-based queries, FHIRPath and/or CQL as well as extensions that include expressions in one of these languages. Implementers should read the Using Expressions page for background and guidance on these technologies and extensions.

Observation-based extraction

This is the simplest of the extraction mechanisms. It leverages the same data elements as are used for the Observation-based population mechanism. It takes advantage of the fact that most questions in the healthcare space typically correspond to the value element of an Observation. It also takes advantage of the Questionnaire.item.code element that identifies what a concept each question or group corresponds to. The SDC Questionnaire Extract - Observation profile has been created to support this mechanism.

To use this method:

  1. Include the item.code element on each question to be extracted. Typically, this will be a LOINC code, but in some jurisdictions/environments, SNOMED CT or other codes may be relevant.
  2. Groups can also have an item.code present - this might represent the code of the a panel or the Observation.code of an Observation with no value but with multiple Observation.component elements. Child question items can then assert the item.code of the "member-of" Observations or the Observation.component.code values.
  3. To signal that the item.code is intended for use in extraction (as opposed to just providing metadata about the Questionnaire item), the questionnaire-observationExtract extension must also be included (and set to true). This extension can be specified either at the root Questionnaire or on an individual question or group item (not a display item) that indicates that the observation-based approach should be used to extract either that particular item (based on the code present) or all items in the questionnaire (if they have a code present).
  4. Multiple item.code elements might be present. If so, each are considered one of the Observation.code Codings in the resulting extracted Observation.

For example:

    
	  <item>
		<extension url="http://hl7.org/fhir/uv/sdc/StructureDefinition/sdc-questionnaire-observationExtract">
		  <valueBoolean value="true" />
		</extension>
		<linkId value="code-pop-demo"/>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="29463-7"/>
		  <display value="Body weight"/>
		</code>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="3141-9"/>
		  <display value="Body weight Measured"/>
		</code>
		<code>
		  <system value="http://loinc.org"/>
		  <code value="8341-0"/>
		  <display value="Dry body weight Measured"/>
		</code>
		<text value="What is your current weight?"/>
		<type value="quantity"/>
		<answerOption>
		  <valueCoding>
			<system value="http://unitsofmeasure.org"/>
			<code value="kg"/>
		  </valueCoding>
		</answerOption>
		<answerOption>
		  <valueCoding>
			<system value="http://unitsofmeasure.org"/>
			<code value="[lb_av]"/>
		  </valueCoding>
		</answerOption>
	  </item>
	
  

When performing the extraction process, the system will create a batch that will contain creates or updates of Observation instances. It will go through the QuestionnaireResponse and identify all answers marked for extraction (if the corresponding Questionnaire item or the root Questionnaire has a questionnaire-observationExtract extension). For each of those it will then determine whether to create a new observation, update an existing observation or do nothing. Guidelines for making this decision are as follows:

Take no action if:
  • the answer was populated from an existing Observation;
  • the system rendering the QuestionnaireResponse can retain context and knows the 'id' of Observation to update;
  • the author of the original Observation is the same as the current author of the QuestionnaireResponse;
  • the context of the questionnaire and the use of it is one where updates are appropriate - as opposed to asserting a new Observation with a new performer and date; and
  • the answer has not changed from the populated value;
Update if:
  • all of the conditions above apply with the exception that the value has changed.
Create a new Observation if: the conditions in the preceding two rows are not met

If updating, the original Observation SHALL be adjusted to have the new value or component.value and the status changed to "amended", then PUT to the source system. If creating, data elements SHOULD be populated as follows:

  • Observation.basedOn and Observation.partOf - copy from QuestionnaireResponse elements of the same name
  • Observation.status - set to 'final'
  • Observation.category - if this can be inferred from any of the Questionnaire.item.code values or from known context of the Questionnaire itself, then fill it in, otherwise omit.
  • Observation.code - add all the Questionnaire.item.code values as Observation.code.coding instances
  • Observation.subject - set to QuestionnaireResponse.subject
  • Observation.encounter - set to QuestionnaireResponse.encounter (if an Encounter)
  • Observation.effectiveDateTime - set to QuestionnaireResponse.authored.

    Note, this is an inference. It is important that the question text implies that the value is 'current' not 'historical' for this to be safe - otherwise do not include the questionnaire-observationExtract extension that marks the question as appropriate for extraction.
  • Observation.issued - set to QuestionnaireResponse.authored
  • Observation.performer - set to QuestionnaireResponse.author
  • Observation.value[x] - set to QuestionnaireResponse.item.answer.value[x]
  • Observation.derivedFrom - set to a reference to the QuestionnaireResponse
  • Observation.interpretation and Observation.referenceRange - if these can be inferred from the QuestionnaireResponse.item.code (and for interpretation the answer value too), they can be populated, otherwise omit

If the Questionnaire.item that is linked to an Observation contains child items that are also linked to Observations, then things get more complex as a determination will need to be made on whether to link the parent to child as Observation.component or as Observation.hasMember. In the ideal situation, the system will recognize the Observation.item.code and know which approach is correct for that type of Observation. If not, then the system could query for other records of the child type and see if they appear as components anywhere. If unsure, systems should use "hasMember".

Considerations and rules when using this approach:

  • If a questionnaire item has the questionnaire-unit extension, the Observation.value SHOULD be a valueQuantity rather than integer or decimal and the units should be taken from the extension value.
  • If a question is skipped (no answer) or cleared, no Observation should be created. Existing Observations SHALL NOT be deleted.
  • If a question has multiple answers, each answer SHALL be a separate Observation instance.
  • There is no mechanism to support items that are mapped to Observation codes which then have nested items without codes - e.g. to capture the text description for an "other - please specify" code - one of the other extraction mechanisms will need to be used.
  • Implementers are free to try combining this mechanism with the Definition-based approach. If they do, they should take care that a given item (and its children) are only handled by one approach or the other - not both.
  • This approach does not allow for observations where Observation.focus is relevant or for capturing Observation.dataAbsentReason.
  • Where an Observation is known to directly correlate to another resource element value (e.g. LOINC 21112-8 corresponds to Patient.birthDate), systems MAY take advantage of this knowledge to update the value of resources other than Observations, however such use is discouraged - using one of the other extraction techniques is likely better and safer.
  • Obviously, this mechanism only works for questionnaire items that correspond to Observation values.

Definition-based extraction

This approach to extraction is more generic. It supports extracting data into any type of FHIR resource rather than being limited to only Observation. It also supports more Observation data elements than can be gathered using the Observation element - for example, explicit effective time ranges, interpretations, comments, etc. The SDC Questionnaire Extract - Definition profile has been created to support this mechanism.

To use this method:

  1. Include the questionnaire-itemExtractionContext extension either on the Questionnaire root or on 'group' items within the Questionnaire to identify the resource that will serve as the context for any extraction. The itemExtractionContext is used to set the context for the item.definition paths. If the itemExtractionContext is empty, then the Questionnaire is being used to create a resource. If the itemExtractionContext has a resource (or set of resources), then the Questionnaire is being used to update the resource(s).
  2. On descendant items of that element, fill in the Questionnaire.item.definition to point to the resource or profile element that Questionnaire item corresponds to. (Profiles may be relevant for data that is sliced or has fixed values for some properties.). The definition SHALL have the full canonical URL of the resource (or profile) followed by '#' followed by the snapshot.path of the element the Questionnaire item corresponds to.
  3. If necessary, define questionnaire-hidden items that have Questionnaire.item.initial.value[x] or that use the questionnaire-initialExpression extension to define their content to use to populate resource elements that the user will not be filling in. (The initialExpressions might in turn depend on variable and questionnaire-launchContext extensions, used as described in the Expression-based population section.

To perform the extraction process, first determine whether the context resource should be updated or created:

  • If the context resource existed and was used in the population of the resource and data has changed, the resource will typically be updated. (Note that this means the system must capture the resource ids as hidden items in the Questionnaire so they're available for update.)
  • In other cases, the record will be created.

If updating, the answer values from the questionnaire (including hidden, calculated answers) will be propagated to the context resource. If creating, then for each occurrence of each item asserting a questionnaire-itemExtractionContext, a resource of the context type will be defined. It will be populated with answers to questions (hidden or not) that have a definition that matches the resource type of the context. As well, if the definitions refer to a profile, any child elements of the profile that have fixed values or patterns declared will also be included in the instance with values matching the fixed or pattern values.

Considerations and rules when using this approach:

  • FHIR queries found in the questionnaire-initialExpression, questionnaire-calculatedExpression and variable extensions may contain embedded FHIRPath expressions (surrounded by double curly-braces). Systems SHALL evaluate and substitute the results of such queries before executing them
  • If the result of evaluating the FHIRPath expressions is an invalid query, that is an error. Systems SHOULD log it and continue with extraction as if the query had returned no data.
  • When crafting queries, be sure to filter all relevant elements. For example, ensuring status excludes entered-in-error elements, practitioners are active, etc.
  • In some cases, the context of an element will switch. For example, an item might have a definition of "http://hl7.org/fhir/MedicationStatement#MedicationStatement.medicationReference" and a questionnaire-itemExtractionContext extension that shifts the context to "Medication". Subsequent children would then have definitions based on the "http://hl7.org/fhir/Medication" resource.
  • Note that only one context can be in play at the same time. When a new context is declared, it takes the place of the old context.
  • If the same context are used for both questionnaire-itemPopulationContext and questionnaire-itemExtractionContext, then the value will be repeated for both extensions.
  • The items in the Questionnaire might not have the same order as those in the resource. When serializing into XML, the official order must still be respected.

StructureMap-based extraction

The StructureMap approach is the most sophisticated approach of the three - and the most powerful. It allows significant transformation of data, including code translations when generating output resources. It also allows the conversion process between data and Questionnaire to be maintained independently and to draw on shared sources across Questionnaires. This can be an advantage in certain environments where the content of the questionnaire may need tight control, but the data environment can be more dynamic. This comes at the cost of requiring expertise in the FHIR mapping language, which is not (yet?) a common skill. The SDC Questionnaire Extract - StructureMap profile has been created to support this mechanism.

To use this method:

  1. Include the questionnaire-targetStructureMap extension. This SHALL define a transform between the QuestionnaireResponse and either a single resource or a transaction Bundle containing the set of resources extracted from the QuestionnaireResponse.

To extract data from the completed QuestionnaireResponse, simply invoke the StructureMap on it.

Considerations when using this approach:

  • This mode has the drawback that if the StructureMap execution fails, there will generally not be any data extracted from the Questionnaire. With the other approaches, if one Observation or context fails, the others might still work. As a result, the StructureMap must be designed to be very robust in the face of missing or potentially 'bad' data.
  • The ability of StructureMaps to reference other StructureMaps allows for the possibility of re-use if certain sections of multiple questionnaires are consistent.