Guidance for FHIR IG Creation
0.1.0 - CI Build International flag

Guidance for FHIR IG Creation, published by HL7 International - FHIR Management Group. This is not an authorized publication; it is the continuous build for version 0.1.0). This version is based on the current content of https://github.com/FHIR/ig-guidance/ and changes regularly. See the Directory of published versions

Working with Terminology

This page includes frequently asked questions related to terminology design in implementation guides (IGs). The intention is to help specification designers make useful design choices around the use of terminology in their specs. The guidance here is generic and represents a consolidation of best practices that should apply to most implementation guides, regardless of jurisdiction or topic.

This guidance applies to terminology usage regardless of product family. I.e. it applies to FHIR, CDA, V2, etc. However, the content is written in language that is FHIR-centric because much of HL7's standards development tooling is moving to a FHIR-based platform - meaning that FHIR resources are used to represent terminology and data model artifacts, regardless of product family.

The guidance here is general-purpose and should apply regardless of the purpose of your IG, or the jurisdictions in which they are being used. However, there may be additional rules and guidance beyond what is covered here that apply in your region or organization, so be sure to ask.

IG authors looking to publish implementation guides within HL7 International, whether for international or U.S. specific use should follow these guidelines but are also bound to a set of additional policies set by various HL7 governance bodies. For these policies, authors should consult the HL7 International Terminology Playbook on HL7 Confluence which summarizes FAQs relating these HL7 International-specific policies.

NOTE: Pull requests against https://github.com/FHIR/ig-guidance to identify additional discovery tools (international or country-specific) are welcome.

Question Summary

General terminology design questions

This section contains additional general questions about terminology use not specific to code system and value set use.

Should an element be coded?

Many elements in HL7 resources and data types have a data type of CodeableConcept or allow a choice of coded data types and string. In these cases, the decision must be made whether to require (or allow) coded data, as well as whether to require (or allow) only string data.

Considerations around whether data should be coded are driven by three things:

  1. Is there a clear benefit to making the data computable?
  2. Is it a realistic imposition on those performing the data capture to gather coded data of the granularity necessary to support the desired computation?
  3. Is it realistic for the systems capturing the data to capture it in a coded way? (Typically, but not always, heavily influenced by whether existing systems capture it in a coded way.)

Obviously if data is not captured exclusively as coded, then string data should be supported. However, if coded data is supported, string data might still be appropriate if:

  • Not all data will be able to be represented by codes; or
  • There will be a need to capture ad-hoc additional detail not represented by the codes.

Requiring support for codes can be performed by establishing 'required' or 'extensible' bindings (see binding strength below), by constraining the allowed data types, and/or by requiring the CodeableConcept.coding element be present via cardinality or a constraint. Requiring string types can also be done via constraining data types and/or by cardinality settings or constraints.

How do I decide what binding strength to use?

The FHIR core specification provides good detail on the meaning of the different binding strengths. As a summary, the rules around the strengths are:

Required - Must have at least one coding from the value set. Cannot have CodeableConcept.text by itself. Extensible - Must have at least one coding from the value set if the concept can be represented at some granularity by one of the codes. Can have text by itself and/or a non-valueset code if the concept doesn't fall inside the value set. Preferred or Example - Send whatever you like, from the value set or not, including just text.

Decisions on strength are driven by the tension between "How much interoperability is needed?" and "What can systems reasonably achieve?".

A 'required' binding provides robust interoperability provided that the code systems are well defined with disjoint concepts, and the systems involved actually support the range of codes needed. However, it requires that the set of codes cover all possibilities for which the element might need to be present, including legacy data, exceptional data, and future evolution (though if the value set is intensional and/or the value-set reference is not version-specific, then future evolution might still be possible). For a system to comply with a 'required' binding, it must map all codes, both now and into the future to the bound value set, and if the binding is not version-specific, it must continue to update those mappings on a regular and ongoing basis. As such, using a 'required' binding is most reasonable when the set of codes to be mapped is both small and relatively static.

An 'extensible' binding ensures interoperability for 'most' concepts, however, allows for the possibility that the bound value set might not cover every eventuality. It still imposes a requirement for the implementer to perform mappings for their codes and thus means that the codes used either need to already be used by the system or be small and static. If a binding is 'extensible', it must not contain any concept that conveys the concept of 'other' or 'not elsewhere specified', nor may it contain a high-level concept that encompasses all permitted concepts within the element. In any of these cases, it is not possible to have a concept that falls outside the scope of the value set, and it is therefore, de facto, a 'required' binding.

'preferred' and 'example' bindings are used when there is no existing set of codes that reasonably covers most of the space, there is no consensus within the community around what codes should be used for the element, or when it is simply not practical to demand that systems translate from the codes they use to a single set of codes for interoperability. 'Example' bindings are used in the first two situations. 'Preferred' bindings are used in the last - as, while it cannot be demanded, it is certainly encouraged/recommended.

It is always best to strive for the tightest binding achievable. 'required' is more interoperable than 'extensible'. 'extensible' is better than 'preferred', and 'preferred' provides more guidance and hope for interoperability than 'example'.

See also the additional binding types below.

What are the different 'types' of bindings, and how do I use them?

In releases of FHIR prior to FHIR R5, there was a single 'core' binding element and several extensions that allowed for additional types of bindings. Specifically:

If there was a need to convey additional expectations (e.g. "There must be one coding from ICD10, plus one coding from SNOMED CT"), then the only mechanism was to use slicing.

In R5, the notion of additionalBindings was introduced. Through inter-version extensions, the notion of additional bindings can be used in profiles for versions earlier than R5 and is now a more preferred mechanism than the min and max valueset extensions or multiple slices on coding. This notion of additional bindings introduces a number of additional binding capabilities. Specifically:

  • maximum: A required binding, for use when the binding strength is 'extensible' or 'preferred'.
  • minimum: The minimum allowable value set - any conformant system SHALL support all these codes.
  • candidate: This value set is a candidate to substitute for the overall conformance value set in some situations; usually these are defined in the documentation.
  • current: New records are required to use this value set, but legacy records may use other codes. The definition of 'new record' is difficult, since systems often create new records based on pre-existing data. Usually 'current' bindings are mandated by an external authority that makes clear rules around this.
  • ui: This value set is provided for user look up in a given context. Typically, these valuesets only include a subset of codes relevant for input in a context.
  • starter: This value set is a good set of codes to start with when designing your system.
  • component: This value set is a component of the base value set. Usually this is called out so that documentation can be written about a portion of the value set.

These bindings can have different use contexts or other constraints that limit when they apply. So, for example, it would be possible to have different minimum bindings for a single element depending on jurisdiction or type of facility creating the instance.

In most cases a single 'standard' binding is all that is necessary. For more sophisticated conformance expectations or implementer guidance, the others can sometimes be helpful.

When should slicing be used to introduce multiple bindings?

When an element has a type of CodeableConcept, it is possible for multiple codes to be present simultaneously. The binding for the overall element sets an expectation that at least one of the codings within the collection must meet the requirements of the binding, but it does not matter which. Other repetitions might meet expectations of other profiles or simply convey other codings the system is also aware of for the same concept. However, in some cases, a profile might wish to make statements about additional codings that must be present and/or must be supported.

In the past, this was typically done by slicing CodeableConcept.coding by value and specifying required bindings on each slice. (The bindings must be 'required', and the value sets must be non-overlapping to ensure that the slices remain disjoint, which is required for slicing.) However, with the introduction of additional bindings, the need for slicing codings has largely disappeared. Instead, multiple bindings can now be asserted at the concept level without drilling down to the 'coding' level. This renders more concisely and is easier for authors. It is a preferred approach where tooling permits declaring value sets in this way.

Should the order of codings matter?

The 'coding' data type is not ordered. There is no difference in the meaning of the first coding vs. the last coding in the collection. Asserting rules around which coding should be first is strongly discouraged (and is also non-conformant). It significantly increases the likelihood of conflicts between profiles, forcing instances to write separate interfaces for different systems (something that increases costs for everyone). In most cases, a receiving system should be able to find the coding repetition they desire by checking the coding.system. In cases where different codings have different purposes beyond their code system, extensions should be used to designate these purposes, as the order of appearance has no semantic significance.

Should bindings be to specific value set versions?

Bindings refer to value sets using the canonical data type. This type allows the reference to the value set to specify a particular business version. Locking the value set version means that updates to the value set will not be reflected unless the specification containing the binding is also changed. Considerations around whether to do this are as follows:

  • Specifying a version is not relevant if the value set is defined in the same implementation guide package or by a referenced package. References within the package hierarchy are automatically considered specific to the version within the package or the declared dependency version of referenced packages.
  • Constraining the value set version will not necessarily prevent changes to the set of codes present in the value set. intensional value sets will evolve as the referenced code systems and/or value sets evolve. Some terminology servers may filter out codes that are flagged as deprecated or retired even for intensional value sets.
  • Limiting changes of codes to changes to the profile version may minimize challenges with mapping when working with 'required' or 'extensible' bindings
  • If a bound value set is extensional, making the binding version-specific means that it will require updating the IG containing the profile (and have implementers migrate to the new IG version) to make any changes to the allowed set of codes. For intensional bindings, an IG update will be necessary to tweak filters to account for code system evolution. Updating the value set definition often requires less process/overhead and can be done more quickly, making non-versioned references appealing if adaptation is expected to be necessary between IG releases.

Should codings require CodeSystem version?

The Coding data type allows declaring the version of the code system that is represented, but this is not typically declared in instances. There are two specific circumstances when declaring version is necessary (or at least useful) in an instance:

  • If the code system allows the meaning of codes to change between releases without changing the code system URI (against good terminology practice, but might not be prohibited in some code systems), the Coding.version is necessary to ensure to know the meaning of the code.
  • For code systems such as SNOMED CT, the version can identify the specific edition, which may be helpful for validators that only have access to some editions, but not all. This is primarily useful where the context of the exchange does not make the edition implicit. (For example, declaring the U.S. SNOMED edition in interfaces that are intended only for use in the U.S. is generally unnecessary, but it might be useful when populating an International Patient Summary document that might conceivably be shared across borders.)

Should codes have 'fixed' or 'pattern' values?

If there is a need to constrain a code to a specific value, it is better to use 'pattern' rather than 'fixed':

  • For CodeableConcept, a fixed value would prevent any additional translations from being present and would 'lock' or prohibit the .text element (which reflects what a user saw or typed).
  • For CodeableConcept and Coding, a fixed value would prevent variations in display name, setting primary, valueset, or other Coding properties.
  • For any of the data types, a fixed value will prevent the inclusion of extensions or the 'id' element. Even if the profile in question does not anticipate the use of 'id' or extensions, it is always possible that other profiles will have need for these elements, for example, in linking discrete data to narrative.

When defining a pattern for CodeableConcept, what should be declared?

When defining a pattern, usually only the Coding.system and Coding.code should be present for CodeableConcept and Coding elements. For elements of type 'code', a pattern that just sets the code value is sufficient. In rare cases where meaning is dependent on the CodeSystem version, then Coding.version might also be present where the data type supports. Coding.display should never be present as it prohibits translations or other legitimate substitutions of the display value. If the code is not meaningful to maintainers, a comment may be provided in the instance indicating what the chosen code means.

What constitutes a 'breaking change' when maintaining codes and value sets?

A breaking change is any change where a previously valid code is no longer valid, or where a previously transmitted code can no longer be safely interpreted according to the definition it previously had. This includes changing subsumption relationships such that a code is no longer a specialization of a code it was previously a specialization of. Note that deprecating codes is not considered to be breaking.

In what situations are 'breaking changes' to terminology artifacts acceptable?

In the initial stages of profile development, the appropriate set of codes as well as the meaning of and relationships between those codes might not be well-understood. In low-maturity profiles (FMM 2 or lower), breaking changes may be made to code systems or value sets defined within an implementation guide. Typically, breaking changes are not permitted for codes or value sets maintained in external code systems or repositories, including those hosted by terminology.hl7.org, though rules are looser for terminologies explicitly marked as 'draft' or 'experimental'.

Should code system post-coordination be used?

Post-coordination is a mechanism where codes can be composed using smaller codes together with a grammar. It is used in a number of code systems. For example, in UCUM, the code 'mL' is composed of the codes 'm' (milli) and 'L' (Liter). The language code 'en-CA' in BCP-47 is made up of the code "en" (English) and "CA" (Canada), the latter of which comes from a different code system. SNOMED CT has a complex post-coordination syntax supporting multiple layers of post-coordination across many axes.

Post coordination has a benefit in that rules around what types of qualification are legitimate can be embedded in the terminology. For example, while a person might have right and left eyes and feet, they do not have right and left hearts or mouths. Similarly, there can be millimeters and milliliters, but not milli-feet or milli-gallons.

Many implementers handle post-coordination by simply enumerating pre-coordinated concepts. For example, a value set might list "mL" and "uL" as permitted UCUM measurement codes, ignoring the fact that these are post-coordinated expressions. This avoids the need to worry about parsing the code, or understanding the varying post-coordination grammars for different code systems.

However, there are circumstances where enumeration fails. If an implementation guide wishes to capture a diagnosis, the affected body site (including laterality), and the severity, all using a single code, then enumerating all the allowed concepts becomes prohibitively difficult. In such cases, implementers will need to allow their users to select the different concepts and then produce the relevant post-coordinated code, following the code system grammar. Depending on its user interface, a consuming system might need to parse the post-coordinated concept and split the relevant concepts up into different pieces of its UI.

An additional consideration with post-coordination is that each code system that supports post-coordination has its own syntax and grammar.

Because of the complexity, support for post-coordination parsing and serializing is uncommon in healthcare systems. Before leveraging bindings that rely on support for post-coordination, consider what level of support for parsing and serializing it is reasonable to expect implementers to have.

Code System Questions

Code systems define the codes that are used to share computable concepts in HL7 specifications. Common specification design questions related to code systems include:

How do I decide what terminology to use?

In general, it is always best to leverage an existing terminology rather than creating a new one. Existing terminologies, by definition, are likely to have existing users, and are thus a better foundation for interoperability. As well, terminologies created by organizations dedicated to terminology management are likely to be more robust, better defined, and more useable than anything built by non-experts. HL7 policy is to leverage existing terminologies whenever possible.

Licensing

That said, some terminologies may be governed by licensing schemes that will limit use outside of specific communities and/or impose financial costs that may prove barriers to adoption. Typically, any terminology choices that impose incremental costs on implementers will pose a barrier to standards adoption and should be avoided. For example, the FHIR Management Group has specific policies on the adoption of terminologies that are not free for use for the target implementer community when included in HL7 International or HL7 U.S. Realm specifications.

Selecting an 'external' terminology

There are a wide range of terminologies available for use. Some are general purpose; others are domain specific. Considerations beyond licensing include:

  • What terminologies are in common use by the implementers at whom the specification is targeted?
  • What terminologies contain the types and granularity of concepts necessary to achieve the type(s) of computational interoperability desired?
  • Does the terminology have the relationships and properties necessary to allow the filtering or other logic needed? (See the section on code system supplements for alternatives when additional concepts or relationships are necessary.)
  • If selecting a terminology not commonly used by all expected implementers, are there standardized maps available (ideally as ConceptMaps) that can help translate between the selected terminology and whatever terminology is in use?
  • Is there any issue with publishing value sets or examples that enumerate codes from the code system?
  • Does the code system require the use of post-coordination that might be too complex for implementers to manage?
  • Are the display names associated with the codes in the system appropriate for the expected end users of the system?
  • If multiple languages are likely to be needed for display names, are those languages available? (CodeSystem supplements may be relevant in this case as well.)

If you are having trouble finding or choosing which code system(s) are most appropriate to use, it is a good idea to ask for guidance on the terminology stream on chat.fhir.org, as well as any streams associated with the domain area and/or country or region of use.

How do I find existing codes?

The answer to this question depends on the code system. Some code systems provide a web-based search capability, while others must be downloaded before searching is possible. The following bullets provide guidance for searching certain commonly used code systems.

  • HL7: HL7 does not currently provide a straightforward way to search for codes in a code system, though this is a known issue that will hopefully be addressed. In the interim, there are a couple of workable approaches for technical users who do not mind reading XML or JSON: Unfortunately, these mechanisms only search content hosted at terminology.hl7.org, not content hosted in implementation guides published by HL7, let alone guides published elsewhere.
    Another potentially useful mechanism for searching is the Simplifier registry
  • LOINC: search.loinc.org can be used, though it does require registering for a (free) membership to the National Library of Medicine (NLM) site.
  • SNOMED CT: browser.ihtsdotools.org can be used. For an international IG you will want to select the International Edition. For national IGs, select the appropriate national edition.

There are also jurisdiction-specific tools for terminology discovery.

Australia
Canada
U.S.
  • Code systems in VSAC: vsac.nlm.nih.gov/context/cs allows for searching by code or term either in all code systems or in a specific code system. VSAC also allows for specifying including inactive codes in the search and choosing between a specific version of a code system or all versions.
  • Code systems in PHIN VADS: PHIN VADS allows for searching for any specific code system hosted there as well as for any concept(s) in any of the hosted code systems.

How do I request new codes be added to a code system or changes to an existing code?

The exact process varies depending on the external code system, but the general process is to reach out to the entity responsible for maintaining the code system. The following bullets provide guidance for requesting new codes in commonly used external code systems:

  • LOINC: New term requests can be submitted via the LOINC term request form. LOINC also provides guidance on what information they need to process a request here.
  • SNOMED CT: New term requests go through SNOMED's Content Request Submission process where authorized users submit requests for new terms. For HL7 members: New term requests for the US realm or HL7 Affiliates with a SNOMED National Release Center (NRC) should be made directly to that affiliate's NRC. New term requests for the Universal realm or an HL7 affiliate without a SNOMED NRC should be submitted to the HL7 Terminology Authority (HTA) using the Context Request Form found here.
  • terminology.hl7.org Changes to most of these code systems can be initiated by submitting a terminology change request using HL7's UTG project on Jira. Doing this requires registering for an HL7 Jira user id, but those are freely given after manual review to filter out bots and spammers. HL7 maintains a confluence page that provides detailed guidance on submitting UTG change requests.

Note that most code systems will have rules for what types of new codes they will accept, as well as what types of changes are permitted to existing codes.

What can I do if I want to add a code to a code system, but cannot?

If the issue is merely one of permissions, it is often possible to find someone who has the necessary authority who might be able to make the request on your behalf. However, organizations will often have policies around the types of content they will accept and may also choose to 'freeze' certain terminologies, meaning that additions are not allowed.

It is NOT ok to simply make up your own codes and assert that they are part of an existing code system. For example, something like this:
"code": {"coding": [{"system": "http://hl7.org/fhir/sid/icd-10", "code": "MyNewDiagnosisCode"}]}
is never legal - because 'MyNewDiagnosisCode' is not a code defined as an official part of the ICD-10 code system.

In situations like these, if you cannot add the codes you need into the 'desired' code system, simply place them into a different code system or, if necessary, create a new code system to contain them. Your value set can then point to the sets of codes from both systems.

NOTE: If new codes are being defined, they must be defined in a distinct code system, they cannot be defined in a Code system supplement of the original code system.

What should I do when I need new codes, but the requirements are not final?

Typically, when adding codes to existing code systems, those systems will place requirements on what types of changes are possible once the code has been accepted. This can be problematic if the specification that needs those codes is still in the early development stages and the specific requirements for needed codes has not yet been settled. Codes may be needed for early testing and proof-of-concept may be needed, but there is a good chance that the specific code requirements (name, definition, relationships, granularity, etc.) may change before the specification becomes stable. Making such changes is often discouraged or prohibited, which means that additional new codes will be needed, and the original added codes will end up abandoned - which is undesirable.

One approach to addressing this issue is that even though the best practice is to place codes in existing code systems rather than defining IG-specific code systems, it may be appropriate as a short-term measure during early development. I.e. codes are initially defined in an IG-specific code system when the requirements are unstable. Once the specification matures, then codes can be requested in more 'official' external code systems. The downside of this approach is that early adopters will face a change to the code system. This expectation should therefore be telegraphed to early adopters so they can design for it. Also, the migration to official codes should happen as soon as the requirements become more solid.

Where should custom code systems be maintained?

The choice depends on how broadly the code system will be used, as well as what organizations or bodies are best set up to maintain the codes in a manner that will meet the needs of implementers (and maintain interoperability) in the long term. Organizations and regions may have policies around where certain types of terminologies should be hosted.

A summary of HL7 International policies that apply to code systems used in international and U.S. Realm implementation guides can be found on the HL7 Terminology Play Book on Confluence.

What URI, OID should I use for a code system someone else has defined?

Interoperability depends on all computer systems using the same 'id' when referring to a code system. If one V2 system called LOINC "LN" and a different one called it "LOINC" or "1558", those systems would not interoperate. For that reason, HL7 sets an expectation that all implementers need to use the same id when referring to the same code system. However, there is still a need to determine what that id should be.

If the content is being exposed by the maintainer as a FHIR resource, then the relevant FHIR URI and hopefully an OID for CDA/v3 (and perhaps v2) use will already be there. Those are the ids to use.

If the content is not available as a FHIR resource (or you do not know how to find such a resource if one exists), then the next logical option would be to ask the maintainer of the code system. However, most maintainers have no idea who HL7 is or even what a URI or OID are. For this reason, HL7 has defined a process to handle engagement with terminology managers to determine an official id to identify that system for use in HL7 instances. (In some cases, additional guidance might also be captured, such as how to identify versions, post-coordination syntax, implicit value sets, etc.) This process applies to all terminologies that are international, multi-national, or U.S. national in scope.

HL7 maintains lists of terminologies that have official identifiers within HL7. There are four lists, depending on the terminology source and what level of content is maintained in HL7's terminology repository:

If a code system is in any of those lists, the associated URI or OID SHALL be used in conformant HL7 instances. If there is an international or U.S. national code system not in the list that is needed in your IG, the code system needs to be added to one of those lists.

In some cases, HL7 might have started the process of assigning a URI or OID but not yet transitioned the content to the official terminology site. IG authors should also check this external terminology work in progress page. If not, HL7 has a process for seeking the registration of a new external terminology. This process can provide a 'temporary' identifier for use during initial development work while HL7 negotiates with the official terminology source over what the official identifier should be.

If dealing with national terminologies not covered under HL7 international's process, responsibility is left to HL7 affiliates or other national organizations. For example, Canada Health Infoway maintains a list of Canadian national code systems. The governance process for maintaining such lists will vary from country to country based on what works best in that space. Such processes should take into account the guidance for defining and changing URIs

For local terminologies, the official URI will need to be defined by (and received from) the organization issuing the codes. See the guidance on Defining URIs.

What URI should I use for a code system I am defining?

When assigning a URI to a code system, any legal URI can work, so long as it is unique to that specific code system, though there are best practices.

The first question to determine is what the boundaries of the code system are. Many organizations define a large number of codes. When defining URIs, it is necessary to understand whether that full collection of codes constitutes a single code system, a few large code systems, or a large number of separate code systems. Considerations are as follows:

  • All codes within a code system must be unique and have consistent meaning. If you have a code 123 that means 'appendicitis' and another code 123 that means 'purple', those codes MUST be maintained in separate code systems.
  • Codes within a single code system should be managed under a single maintenance process and are typically published together.

While any URI can be used for a code system, HL7 strongly recommends the use of meaningful, and ideally resolvable, URLs. Such identifiers are easier for developers to work with. The ability to resolve (even if only to an HTML page rather than a FHIR resource) allows a developer or analyst encountering a new code system in an instance to learn more about the system - and possibly the codes in it.

It is often tempting for systems converting from CDA or other v3 systems to adopt OID-based URIs for FHIR code systems (and identifier systems) when migrating to FHIR. While legal, doing so is discouraged:

  • OIDs are not human readable which makes errors easier and debugging harder.
  • OIDs cannot be resolved, which removes a helpful tool for implementers trying to discover what a code means or an identifier refers to.
  • Most of the world understands the concept of an identifier. Few understand OIDs. Using OIDs means that you incur a permanent learning curve for your developers, testers, etc. - including all those that will be brought into the project in the years and decades to come.

All implementers performing v3 to FHIR conversions will already have to have a conversion layer to convert from OIDs to meaningful URIs for all the code systems where HL7 international has defined standard URIs (SNOMED CT, LOINC, UCUM, etc.) as well as new ones defined in the future. The NamingSystem provides a convenient mechanism for storing and tracking the maps between OIDs and URIs. Given that this infrastructure must already exist, it makes sense to take advantage of it even for local code systems and thereby avoid the long-term implementer burden that OIDs will impose on the FHIR community long after v3 to FHIR conversion ceases to be relevant.

How do I get a new OID for a code system?

OIDs are needed when identifying code systems and value sets for v3 models (specifically CDA) and can be issued by a variety of bodies. Once an organization has an OID for their organization, they can assign descendant OIDs to whatever business objects they like. If registering an international or U.S. external code system through the HL7 international process or defining new code systems through the HL7 international Uniform Terminology Governance (UTG) process, an OID will be assigned automatically. HL7 offers an Paid OID registration process for organizations looking to obtain an OID root or to have someone else manage the OIDs for their local code systems or value sets.

How do I change the URI or OID for a code system?

In general, the URIs and OIDs for a code system should never change once established. Even if the organization responsible for a code system changes, the marketing name of the code system changes, or the website the URL points to is re-organized and the URI no longer resolves. URIs and OIDs get hard coded in software, are stored in a wide range of databases and other stores. It is extremely difficult for industry to shift to a new code system identifier once one has been established. It costs money both for the change, and for the interoperability discontinuity (and sometimes patient risk) associated with the transition. For these reasons, HL7 only allows changes to the code system identifiers of the code systems it maintains in limited circumstances - typically when there was an error that resulted in duplicate identifiers being issued, or something being issued an identifier that should not have been. Occasionally corrections might be made if the correction will not impact existing implementations (i.e. no one is yet using the code system identifier).

Is it ok to copy codes from one code system into another?

This is usually a bad idea - for two reasons:

  • Terminologies are intellectual property, like anything else. While the concepts represented in terminologies can be coded a variety of ways, the specific codes used, the display names assigned, the definitions and properties provided, and the relationships asserted are all the property of the individual or organization that defined them.
  • Even if licensing permissions allow outright copying of content, doing so inevitably impedes interoperability, because while a human might recognize that the copied code means the same as the original code, computer systems will not. Making computer systems recognize the equivalence requires creating maps, writing code, and spending time and money.

In most cases, the driver for copying codes is either avoiding IP issues or because there is a need to tweak the content in some way - for example adding properties, translations, clarifying definitions, etc.

As has been discussed earlier, copying codes to avoid using a code system with restricted IP does not actually avoid IP issues - and may even make things worse. If you cannot (or are unwilling to pay to) get a license to the desired code system, then you will have to use a different code system or invent something completely different from the desired code system (including taking care to ensure that the development is not in any way informed by other existing code systems). Typically, this second approach will cost more than simply buying a license.

If the need is to tweak the content of existing codes, this can typically be done using a Code System Supplement. (Also see the guidance on adding new codes and what to do if you cannot add new codes.)

If I define my own code system, what should the metadata say?

The purpose of the metadata in a code system is to help downstream consumers (those reading the IG, those finding the content through registries, those interacting with a terminology service, etc.) to understand the purpose of the artifact. Not all the consumers will necessarily have the scope of the IG to go on, so including at least some useful level of information, relevant keywords, etc. is important. Titles should be descriptive and reflect context - not presuming that users will only see the artifact in the context of the IG. Descriptions should typically be several sentences explaining the purpose. Some metadata such as date, status, and contact information might be propagated from the IG to the code system and other IG artifacts to minimize maintenance effort.

If I define new codes, what should the codes and display names look like?

HL7 has done a poor job of maintaining consistency within its code systems around display names and codes, so the bar is low. However, consistency does make things easier for implementers who often make (reasonable) assumptions that conventions around case, use of dashes, use of whitespace, etc. will be consistent across codes from the same code system. If nothing else, be consistent. Additional guidelines to consider:

  • First and foremost, if adding codes to an existing code system, follow the conventions (if any) established by the existing codes.
  • Good vocabulary practice often dictates that codes are meaningless identifiers. However, practice has found that this works best when dealing with large code systems like SNOMED, LOINC, or ICD10 and is less desirable for codes like gender or status. In FHIR, any codes that will be used with an element of data type code SHOULD be human-readable, and the same rule goes for any small to moderate size code system (< 100 codes or so).
  • In terms of meaningful code representation, lowerCamelCase or UpperCamelCase are more readable than alllowercase or ALLUPPERCASE. Use of hyphens or underscores to separate words can also work but are unnecessary when using mixed case representations and may create extra work when translating codes into constant names. (On the other hand, where constant names are all upper-case, having defined separators makes for more readable code.) There is no strong industry consensus here beyond 'be consistent'.
  • Single spaces are permitted in codes used in FHIR code systems, but may not be supported by all software, so are best avoided.
  • When using meaningful codes, it is helpful if the meaning is evident to the average software developer (who may not be familiar with acronyms)
  • While at one time, having extremely short codes was useful to save bandwidth, this is less critical now. Codes that are 10 or 15 characters are usually not an issue. However, codes that are significantly large (50+) are likely excessive as it will mean undesired typing for developers, test data creators, etc. Save the details for the display name and the definition.
  • Display names will often be surfaced to end users, so use names that will be meaningful. Ideally, the names should consider the context in which they will be used to avoid redundancy. E.g. if the element name is "status", then "held" is a better display value than "held status".
  • Display names should be either 'Title Case' or 'all lower case'. Again, there is no consensus on the preferred approach, beyond a desire to be consistent.
  • Another area to look for consistency is to be consistent with tense, pluralization, and part of speech. For example, "hold" and "stopped" would not be ideal display names because the first is imperative and the second is past tense. The codes should either be "hold" and "stop" or "held" and "stopped".
  • In some cases, it may be useful to have more than one display name (designation) for a code. This allows value set authors and implementers to choose the name that most appropriately fits their use-case. For example, fully qualified names, short names, patient-friendly names, etc. Translations may also be relevant.

What constitutes a 'good' definition for a concept?

A 'good' definition is one that clearly expresses the meaning of the concept while also clearly distinguishing it from other similar concepts. Additionally, the definition should not include the exact terms being defined in most cases. E.g. do not define "stopped" by saying "the order is stopped". Something like "the order is no longer valid, and action being taken under its authorization should cease" would be better. Always presume that the reader is confused about the meaning of the code string and display name and is counting on the definition to clarify/disambiguate.

Consider whether the code is a candidate for re-use (codes used in multiple specs are more likely to be recognized) and try to express codes in generic terms that are still meaningful in the context in which you are first intending to apply them.

Examples may be helpful as part of a definition. Usage notes MAY be put in a definition, but might be better sent using a comment property.

Should I have properties or relationships in my code system?

Properties and relationships do several things:

  • They make it easier to create expression-based (intensional) ValueSets by providing a basis to filter codes.
  • They allow software to reason based on the code or apply rules based on the code (e.g. checking if one code is a specialization of another, whether a code has a property that allows it to be user-selected, etc.)
  • They help to make the meaning of the code clearer by expressing information computably rather than merely through textual definitions.

The choice to use properties or relationships really comes down to whether any of those benefits are useful in your situation.

Look at the hierarchy meanings allowed for codes and consider whether one or more of them apply to your concepts. (If more than one, you can use hierarchy to represent one and properties to represent the others.) As a rule, if there are specialization or component relationships between codes, those should be computably expressed.

Also look at the standard concept properties and consider whether any of them would be useful. Then consider whether there are other properties held in common by certain sets of your codes that would be useful to surface computably. Feel free to take inspiration from other similar code systems. In most cases, having at least 'status' will be relevant. Even if it is not needed immediately, it is helpful to give implementers a heads up that codes might eventually be deprecated or retired.

Are there any other guidelines I should follow when defining new codes?

In addition to the guidance here on code and display values, definitions, properties, and copying codes, designers should consider the following:

  • Use existing codes rather than creating new ones if possible. There is no need for all codes to come from one system. Feel free to draw codes from an existing system and then just define the few additional ones that do not already exist. This means re-using codes for concepts like 'other', 'not applicable' and 'unknown' as well, if relevant.
  • Codes that are siblings should be orthogonal - in most cases it should not be possible for a single concept to match more than one code in the system unless the codes are defining different aspects (ideally distinguished by the codes being part of different hierarchies or having different properties), or one code is a specialization of the other.

When should I use a Code System Supplement?

Code System Supplements are a special type of code system that does not define any codes of its own. Instead, it supplements an existing code system, defining additional designations, relationships, and/or properties for the codes of that system. Additional designations might support language translation or display names that are more suitable to a type of user-interface or usage context. Additional properties and relationships are helpful when there is a need to surface computable information about existing codes for value set generation and/or software logic that are not present in the original code system.

In most cases, it is desirable to get such additional information into the original code system. However, the maintainer of the code system might not be interested in undertaking such work. They may not be interested in such use-cases or may not feel able to maintain the information. (For example, the maintainer may not have the knowledge to maintain the language translations or bandwidth to keep up with maintaining relationships to a rapidly evolving external system.)

Supplements are only useful if the systems that need the additional information are able to leverage the supplement. Supplements are a new concept and many software systems do not yet have support for them.

Value Set questions

Valuesets select the codes from one or more code systems that are needed for a particular purpose. Common specification design questions related to valuesets include:

When should I re-use a value set definition?

When evaluating using an existing value set definition rather than creating your own, there are several questions to answer:

  • Does the defined purpose of the value set align with my intended use?
    If not, even if the current set of codes seems to match your needs, it is likely that the value set will evolve in the future in a way that no longer meets your needs, or that when you have a desire to evolve the content of the value set, the custodian will be unwilling to make the desired changes.
  • Does the set of codes in the existing value set meet my needs?
    If the answer is 'no', that does not necessarily mean that the value set cannot be used - but it will mean going through the change process for the value set to try to adjust it. The willingness of the custodian to make the desired changes will then determine if it is appropriate.
  • Are you confident that the governance process associated with the value set and the declared purpose of use will mean that the value set content will align with your needs in the future?
    In some cases, you may not want to give authority to others to change the set of codes that are allowed in your interface, or may not trust that the changes you might need in the future will be accepted by the custodian.
  • How complex is the value set definition and is there a saving in allowing someone else to maintain it?
    Some value sets are extremely easy to define (e.g. "all ICD10 codes") and there is little cost to there being 10 or 1000 different value set definitions that assert the same set of codes. While consistency around what code system identifiers are used is critical to interoperability, multiple valueset identifiers with the same definition does not impede interoperability. However, if a value set requires ongoing curation, either to maintain an extensional list of codes, or to continue to adjust complex filters on an intensional definition as the underlying code system(s) evolve, there may be a benefit in having that work only done in one place.
  • Will I benefit from using the same value set definition as another specification?
    If there is any chance of the value set evolving over time, it may be important that the evolution in your spec be the same as what occurs in a different specification - meaning that both specifications using the same definition will be beneficial.

Where should my value sets be maintained?

If you choose to maintain your own value set rather than using one that already exists, the next question is "where should the value set be hosted?" There are several choices:

  1. Keep the value set in your own implementation guide and host it alongside the other IG content.
    Benefits:
    • Maintains full control over the definition of the value set.
    • Re-use is still possible, by declaring dependency on the IG.
    • Ensures the value set definition will be stable for a particular release (which may be important for interoperability).
    Drawbacks:
    • Changing the value set design requires publishing a new release of the IG (which may be impractical depending on how often the filters or enumerations need to change)
    • The value set is less likely to be perceived as a "shared artifact" and may see less re-use.
    • Re-use requires asserting an IG dependency, which some authors may be reluctant to do for just one value set.
  2. Place the value set into an international shared repository, such as terminology.hl7.org or a country-specific repository such as the U.S. NLM Value Set Authority Center (VSAC) (requires registration to access content).
    Benefits:
    • Makes value sets broadly discoverable.
    • May provide tooling to support value set maintenance.
    • Ensures a degree of governance around value set change.
    • Encourages value set re-use.
    • Maintenance of value sets becomes independent of maintaining the specifications that reference them.
    Drawbacks:
    • There may be limits on what types of value sets a shared repository is willing to host.
    • The governance process for the shared repository may create delays (and sometimes barriers) to making desired changes.
    • The value set may involve based on feedback from others that are undesirable for your specification.
    • The value set definition may change at any time, meaning that if the specification value set references are not versioned, implementers will be faced with evolving content expectations. (This may be necessary, in some cases.)

HL7 International is developing policies for value set location for IGs it publishes. Other organizations may create similar policies.

What is the difference between composing and expanding a value set?

Composing a value set means defining the rules for the value set. However, the rules may be expressed in such a way that the answer to "what codes are allowed by this value set" may be different at different times (and in rare situations, in different places). These changes in 'what codes are allowed' happen because of evolution in the content that the value set references:

  • Existing codes may go from 'draft' to 'active' status, or from 'active' to 'deprecated' or 'retired'.
  • New codes might be added to the value set, or the properties and relationships of existing codes might be corrected, which will change whether they meet the rules of the expansion.
  • Referenced value sets may have their own definitions changed.

When creating or maintaining a value set, authors should specify the 'compose' portion and leave the 'expansion' portion empty, because the focus is on defining the rules, not evaluating them. The 'expansion' process happens later, closer to implementation (and even multiple times a day) as the implementer needs to know which set of codes are allowed when validating, or when presenting options to a user.

Expansion may be done through local logic, or through the invocation of an $expand operation. This operation considers the current versions of the code system and underlying value sets available at the time the operation is run (or set via configuration parameters) and evaluates the valueset 'rules' based on those versions. In some cases, the expansion might be a limited expansion that only returns a subset of codes, optionally filtered by a string, for example to help build a drop-down box based on content the user has specified.

While expansions CAN be included as part of a published ValueSet, this is rarely done because expansions quickly become out-of-date. Also, some tooling will ignore provided expansions and generate their own. If the desire is to lock a value set to a specific set of codes unaffected by underlying changes, it is better to do this by composing an extensional value set bound to specific code system versions.

What is the difference between an 'intensional' and 'extensional' value set?

  • An 'extensional' value set is a value set where the definition is an enumerated list of codes. An extensionally defined value set for all the child concepts of 'Nutritional finding' would need to list every single child code in the value set definition - and would need to keep that list up to date as SNOMED CT evolves. (SNOMED CT is more challenging because there are multiple editions, each having their own set of codes, such that the correct set of codes can vary by jurisdiction.)
  • An 'intensional' value set is a value set where the definition is a computable set of rules that can be resolved to the desired list of codes. For example, "All SNOMED CT concepts that are children of the SNOMED CT concept 'Nutritional finding (finding).'". Intensional value sets MAY have some level of code enumeration as well. However, if any of the value set rules involve filters, the value set is 'intensional'. Intensional value sets may also be created by basing a value set on the inclusion or exclusion of codes from other value sets.

One of the main differences between intensional and extensional value sets is that the expansions of the value set SHOULD be identical for a single version of an extensional value set, while they may differ over time for intensional value sets as the underlying code systems and/or value sets evolve. (The expansion for an extensional value set CAN change if listed codes are deprecated or retired and the expansion operation is configured to exclude deprecated or retired codes.)

When should I use an intensional vs. an extensional value set?

The choice of whether to use an intensional or extensional value set definition comes down to two things:

  1. Is the set of codes too large and/or too dynamic to enumerate them all?
  2. Is there a need for the allowed set of codes to change automatically as the underlying code system changes (e.g. if new drug codes are added to the underlying code system, should they automatically be allowed as part of the value set without needing to formally publish a new value set version?)

If either of these are 'yes', then the value set should be defined intensionally. However, there is a third consideration, which is "are the necessary properties and/or relationships present in the underlying code system(s) to allow filtering to the desired set of codes?". If the answer to this is false, it means that extensional definition might be necessary, though another option is to use a Code System Supplement to inject the needed properties and/or relationships. (However, the maintenance effort for the supplement will likely be equivalent to or even more than the effort to maintain an enumerated value set definition, so typically this only makes sense if it can be used to support multiple distinct value sets or is useful for other purposes.)

One drawback of intensional value sets is that they require implementers to be have access to an expansion of the value set to the 'current' set of codes that meet the filter requirements of the intensional definition. This could be done by distributing a version of the ValueSet with a built-in expansion element, by the implementer being able to perform the expansion themselves, or them having access to a terminology service that supports the $expand operation.

When should I flag a value set as immutable?

The ValueSet.immutable flag on a value set indicates that the formal definition of the value set is 'frozen'. That means that for extensional definitions, no codes can be added or removed and for intensional definitions, the set of filters describing the allowed codes cannot be modified. Changes might still be made to metadata about the value set, such as updating the description, purpose, comments, etc. or even changing the status of the value set.

Note that setting the value set flag to immutable does not necessarily mean that the expansion cannot change over time. Changes to the underlying code system(s) and/or value-set(s) may still impact the set of codes considered to be valid on any given day.

The primary purpose of marking a value set as immutable is to create trust in downstream users of the value set. A value set called "Key Diagnoses" that draws from ICD10 could, in theory, be updated at some point to instead draw from SNOMED CT or to exclude codes that had previously been included. The risk of such a change might make other designers reluctant to point to a value set. If the immutable flag is set, then such changes cannot happen.

However, the other side of the immutable flag is that, if set, the value set definition is permanently frozen. If there is a need to change the filters or codes, the only option is to define a new value set with a new URL, new name, etc. (and possibly deprecate the old one), then update all models to point to the new value set and get implementers to make the change.

This is most typically done with value sets that, by definition, are frozen. E.g. "all LOINC codes" can safely be frozen because the purpose can easily be expressed by a simple definition and there is no need for that definition to change.

If I define my own value set, what should the metadata say?

The answer here is the same as it was for code systems.

Should value sets be tied to specific code system versions?

Making a value set reference version-specific helps to ensure that the expansions are less likely to change. (To completely avoid change, intensional value sets that reference other value sets would need to ensure that value set references are also version-specific and that all value sets traced in the dependency chain also use version-specific references to code systems and value sets.

Locking the set of codes down makes implementation easier for implementers. If they need to map, they can map once during the design process and not worry about dealing with changes to the set of allowed codes (at least not until they update). Implementers can also write their logic presuming an understanding of the complete list of permitted codes. Elements of type 'code' with value sets defined this way in the core specification can be enumerated in schemas.

The downside is that if business requirements drive a need for the codes to change, the only way to accommodate that need is to publish a new release of the standard with an updated value set and then migrate all implementations to use the updated version of the standard. This means that the model needs need to be extremely stable, the value set needs to include a safety valve such as 'other', or the binding strength needs to be loose enough (i.e. 'extensible' or looser) that new needs can be met without updating the value set.

What guidelines apply to drawing from multiple code systems in a single value set?

If the needed set of codes for an element cannot be found in a single code system, but there are other code systems that have the needed codes, it is completely fine to draw from multiple systems. However, there are a few considerations to keep in mind:

  • If binding to a 'code' element, the codes MUST be unique across the complete expansion of codes - including if the code systems evolve after the establishment of the value set (if the binding is not version-specific). Obviously making the binding version-specific is one way of fixing this, though the code system conventions around code value syntax might also eliminate any possibility of overlap.
  • It is best practice to avoid having codes from multiple code systems that have overlapping meanings. There generally will not be specialization relationships across code systems, which makes it difficult to reason across systems.

Drawing concepts from multiple code systems does mean that there can be differences in styles around display names, definitions, properties, etc. that may also impact decisions on whether combining from multiple sources will work well - though code system supplements may be able to relieve some of these concerns.

Should value sets include 'Other', 'Unknown', etc.?

In CDA and other v3-based specifications, exceptional values such as Other, Unknown, Not Applicable, etc. are conveyed in a separate data element from regular coded data - nullFlavor. Any concepts represented in nullFlavor SHOULD NOT be included in value sets intended for use by v3 implementations.

However, in FHIR and v2, such concepts are sent alongside coded data and, when needed, SHOULD be sent as part of the value sets alongside other codes (though they might be drawn from other systems). While it is possible for implementers to use extensions such as data-absent-reasons for FHIR coded elements or companion Z-segments for v2, this is discouraged. The only exception to embedding 'exception' codes as part of a value set is when there is an explicit element in the data model for capturing exceptional values (e.g. Observation.dataAbsentReason).

Exceptional values are most critical when a value set is likely to be required. In these situations, the element cannot be sent without choosing a code from the value set, so if there's any chance the value might be unknown or not within the list, the 'unknown' and 'other' concepts are appropriate. Something like 'not applicable' becomes relevant if it is also likely that the element will be 1..x.

That said, there will be use-cases where these exceptional values are not appropriate - where a code must be known, or where - by definition - the conceptual space is known to be covered, or where for the use-case, a value must always be applicable. In these cases, exceptional values should be omitted. However, be careful to account for legacy data, externally sourced data, and various uncommon but still possible edge-cases before deciding to exclude the safety valve that exceptional values provide. Ask the question "Would I rather not receive this element (or even the entire instance) at all if an exceptional value is necessary?".

In some cases, a value set will be developed for widespread use where the specific context will not be known - and thus it will be hard to determine which (if any) exceptional values should be included. In this situation, it is best if the value set does not include any exceptional values. Designers can then build a value set including your value set as well as any exceptional codes needed for their use-case.

When should value sets draw from code system supplements?

Valuesets will only reference code system supplements if they are intensionally defined. Making a value set depend on a supplement adds complexity and reduces the number of systems that will know how to expand the value set. (Not all terminology servers or other types of systems know how to deal with code system supplements.) However, if the only reasonable way to define and maintain a value set is by using properties and/or relationships that are introduced by the supplement, then the value set will have to reference that supplement.

In what circumstances should value sets incorporate other value sets?

Defining a value set by referencing other value sets does two things - it helps save maintenance effort, and it also helps ensure alignment. The first benefit is obvious - if someone else has defined the enumeration or computable filter defining a portion of the codes you want to talk about, it is easier to point to that definition than define and maintain it yourself. However, the second benefit is also quite powerful:

  • Sometimes you want to include all the codes from a given value set but want to also allow some additional codes (but it is not appropriate to edit the definition of the base value set to add those codes). Solution: Define a value set that includes the base value set, but also include the referenced codes. IF the base value set changes, your value set will automatically include any new codes added and exclude any codes that were removed.
  • Similarly, if you have a value set where you want most of the codes, but want to exclude a subset of the codes (and again, it is not appropriate to remove them from the base value set), define your value set as including the referenced value set, then add excludes to filter out the undesired codes. If the base value set changes, your value set will automatically reflect those modifications.