Molecular Definition Implementation Guide for Molecular Data Types
1.0.0-ballot1 - ci-build International flag

Molecular Definition Implementation Guide for Molecular Data Types, published by HL7 International / Clinical Genomics. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot1 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/molecular-definition-data-types/ and changes regularly. See the Directory of published versions

Use Cases

Page standards status: Informative

This implementation guide is not complete. The included artifacts are marked as experimental, but they are ready for review, testing, and validation.

This implementation guide, along with its included artifacts, has been designed to address practical use cases involving the exchange of genomic molecules and their associated data across diverse institutions and systems. By providing standardized frameworks and code systems, the guide facilitates seamless interoperability and accurate data representation. This section shows how the guide can be used to represent real genomic concepts as FHIR resources, demonstrating its use in real-world genomic data exchange scenarios. The following subsections provide example instances of MolecularDefinition resource and their corresponding profiles.

Various stakeholders are encouraged to actively contribute their use cases and examples in support of the MolecularDefinition Datatype Implementation Guide to enhance its practical applicability and robustness. Contributions can be made by raising a JIRA ticket, posting comments on the Genomics Channel-Information Modeling at chat.fhir.org, or directly contacting any of the co-chairs of the HL7 Clinical Genomics Workgroup. These inputs are vital to refining the guide, facilitating standardized and interoperable genomic data exchange across diverse healthcare and research environments, and ensuring the guide effectively addresses real-world genomic data scenarios

Representing a Molecular Sequence as a Literal String

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a literal using the Sequence profile. The moleculeType and encoding attributes enable unambiguous interpretation of the sequence value.

Representing a Molecular Sequence Using Accession Number as a Code

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a code using an accession number using the Sequence profile.

Molecular Sequence from a Resolvable URL

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a resolvable URL using the Sequence profile. This example uses the DocumentReference resource to represent the URL.

Molecular Sequence from a File

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as an attached file.

Molecular Sequence extracted from another Molecular Sequence

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a subsequence extracted from a “parent” sequence. In these examples, a sequence representing the CYP2C19 genetic locus is used as the “parent”, from which three subsequences are extracted (corresponding to the upstream region, gene region, and downstream region).

Molecular Sequence constructed as a concatenation of several other Molecular Sequence instances

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a concatenation of sequence instances. In this example, the three subsequences from the Extracted example are reassembled into the full genetic locus.

Molecular Sequence constructed as a Repeated Motif

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a repeated sequence motif. In this use case, the CGG trinucleotide repeat from the FMR1 gene is represented in a compressed form that emphasizes the copyCount (convenient for use cases where the number of repeats is important).

Molecular Sequence constructed as an Edit on another Molecular Sequence

The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a relative sequence, which applies an edit to a starting sequence to create the sequence of interest. In this example, the starting sequence is a perfect CGG trinucleotide that was repeated 20 times (see the repeated motif example). The desired sequence is not a perfect repeat, however, and a single nucleotide must be edited to yield the sequence of interest. The result of this edit operation represents an actual CGG repeat region that is found in the FMR1 gene.

Molecular Sequence including a Contained Referenced resource

Many instances of MolecularDefinition reference other instances of MolecularDefinition. When references within a message are not desired, contained resources can be used. This example shows how contained resources can be used to create a standalone message. It is the same content that was used in the relative (edit) example, but the references have been changed to contained resources.

Please check the complete list of Sequence examples for more examples

Allele as a MolecularDefinition

The following examples illustrate instances of allele. In this example, the asserted state of the allele is different from the state of the context sequence at the given location. Note: the CG group is still determining how to best represent named alleles; therefore, the reference to the star allele in this example should be considered preliminary and subject to change.

Please check the complete list of Allele examples for more examples

Variation as MolecularDefinition

The following examples illustrate instances of variation. In this example, the state of the alternate allele is defined as being different from the state of the reference allele, but the same structure could be used to represent a variation where the two alleles are the same. Note that this example uses a 0-based interval coordinate system.

The following example illustrates a tri-allelic polymorphism. In this example, it is necessary to unambiguously specify the reference and alternate alleles, neither of which might match the state of the context sequence at the specified location. The slices on the representation element are needed to support this use case.

Please check the complete list of Variation examples for more examples

Two Aggregate Use Cases to Represent How Sequence, Allele, Haplotype and Genotype Profiles Can Work Together to Represent various Genotypes

HLA Genotype

To illustrate the interaction of various MolecularDefinition profiles, we begin with a foundational example: an instance of a Sequence profile representing the raw coding sequence of HLA00001.1, which corresponds to the HLA-A01:01:01:01 allele. Building upon this, two distinct sets of Allele profiles are introduced, each encompassing five individual alleles derived from the HLA-A01:01:01:01 and HLA-A*01:02:01:01 groups, respectively. Each Allele set is then aggregated into a corresponding Haplotype instance, capturing the linkage of alleles on a single chromosome. Finally, these two Haplotype instances are integrated into an instance of Genotype profile, representing the combined allelic composition across both chromosomes at the HLA-A locus. This example shows how raw sequence data can be built up through alleles and haplotypes into a complete genotype. The following is the set of MolecularDefinition instances that represent this use case:

CYP2C19 Genotype

The CYP2C19 gene encodes an enzyme essential for metabolizing several medications, including anti-coagulants, anti-depressants, and proton pump inhibitors. Variations in an individual’s CYP2C19 genotype can significantly influence drug response, affecting efficacy and risk of adverse effects. In this example, the genotype instance is represented as a composite of two haplotypes, CYP2C191.002](MolecularDefinition-example-haplotype-cyp2c19-1002.html) and [CYP2C193.002, each defined by two alleles located at positions 661 and 1016 within the reference sequence context. This genomic information involves representing these haplotypes and their constituent alleles by leveraging the MolecularDefinition profiles, i.e., Genotype, Haplotype, Allele and Sequence, through a series of interconnected profile instances. The following nested list shows this example and corresponding MolecularDefinition instances.

Please check the complete list of Haplotype examples and the complete list of Genotype examples for more examples