Molecular Definition Implementation Guide for Molecular Data Types, published by HL7 International / Clinical Genomics. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot1 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/molecular-definition-data-types/ and changes regularly. See the Directory of published versions
Page standards status: Informative |
This implementation guide is not complete. The included artifacts are marked as experimental, but they are ready for review, testing, and validation.
This implementation guide, along with its included artifacts, has been designed to address practical use cases involving the exchange of genomic molecules and their associated data across diverse institutions and systems. By providing standardized frameworks and code systems, the guide facilitates seamless interoperability and accurate data representation. This section shows how the guide can be used to represent real genomic concepts as FHIR resources, demonstrating its use in real-world genomic data exchange scenarios. The following subsections provide example instances of MolecularDefinition resource and their corresponding profiles.
Various stakeholders are encouraged to actively contribute their use cases and examples in support of the MolecularDefinition Datatype Implementation Guide to enhance its practical applicability and robustness. Contributions can be made by raising a JIRA ticket, posting comments on the Genomics Channel-Information Modeling at chat.fhir.org, or directly contacting any of the co-chairs of the HL7 Clinical Genomics Workgroup. These inputs are vital to refining the guide, facilitating standardized and interoperable genomic data exchange across diverse healthcare and research environments, and ensuring the guide effectively addresses real-world genomic data scenarios
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a literal using the Sequence profile. The moleculeType and encoding attributes enable unambiguous interpretation of the sequence value.
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a code using an accession number using the Sequence profile.
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a resolvable URL using the Sequence profile. This example uses the DocumentReference resource to represent the URL.
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as an attached file.
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a subsequence extracted from a “parent” sequence. In these examples, a sequence representing the CYP2C19 genetic locus is used as the “parent”, from which three subsequences are extracted (corresponding to the upstream region, gene region, and downstream region).
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a concatenation of sequence instances. In this example, the three subsequences from the Extracted example are reassembled into the full genetic locus.
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a repeated sequence motif. In this use case, the CGG trinucleotide repeat from the FMR1 gene is represented in a compressed form that emphasizes the copyCount (convenient for use cases where the number of repeats is important).
The following examples demonstrate how MolecularDefinition resources can represent of a sequence represented as a relative sequence, which applies an edit to a starting sequence to create the sequence of interest. In this example, the starting sequence is a perfect CGG trinucleotide that was repeated 20 times (see the repeated motif example). The desired sequence is not a perfect repeat, however, and a single nucleotide must be edited to yield the sequence of interest. The result of this edit operation represents an actual CGG repeat region that is found in the FMR1 gene.
Many instances of MolecularDefinition reference other instances of MolecularDefinition. When references within a message are not desired, contained resources can be used. This example shows how contained resources can be used to create a standalone message. It is the same content that was used in the relative (edit) example, but the references have been changed to contained resources.
Please check the complete list of Sequence examples for more examples
The following examples illustrate instances of allele. In this example, the asserted state of the allele is different from the state of the context sequence at the given location. Note: the CG group is still determining how to best represent named alleles; therefore, the reference to the star allele in this example should be considered preliminary and subject to change.
Please check the complete list of Allele examples for more examples
The following examples illustrate instances of variation. In this example, the state of the alternate allele is defined as being different from the state of the reference allele, but the same structure could be used to represent a variation where the two alleles are the same. Note that this example uses a 0-based interval coordinate system.
The following example illustrates a tri-allelic polymorphism. In this example, it is necessary to unambiguously specify the reference and alternate alleles, neither of which might match the state of the context sequence at the specified location. The slices on the representation element are needed to support this use case.
Please check the complete list of Variation examples for more examples
To illustrate the interaction of various MolecularDefinition profiles, we begin with a foundational example: an instance of a Sequence profile representing the raw coding sequence of HLA00001.1, which corresponds to the HLA-A01:01:01:01 allele. Building upon this, two distinct sets of Allele profiles are introduced, each encompassing five individual alleles derived from the HLA-A01:01:01:01 and HLA-A*01:02:01:01 groups, respectively. Each Allele set is then aggregated into a corresponding Haplotype instance, capturing the linkage of alleles on a single chromosome. Finally, these two Haplotype instances are integrated into an instance of Genotype profile, representing the combined allelic composition across both chromosomes at the HLA-A locus. This example shows how raw sequence data can be built up through alleles and haplotypes into a complete genotype. The following is the set of MolecularDefinition instances that represent this use case:
The CYP2C19 gene encodes an enzyme essential for metabolizing several medications, including anti-coagulants, anti-depressants, and proton pump inhibitors. Variations in an individual’s CYP2C19 genotype can significantly influence drug response, affecting efficacy and risk of adverse effects. In this example, the genotype instance is represented as a composite of two haplotypes, CYP2C191.002](MolecularDefinition-example-haplotype-cyp2c19-1002.html) and [CYP2C193.002, each defined by two alleles located at positions 661 and 1016 within the reference sequence context. This genomic information involves representing these haplotypes and their constituent alleles by leveraging the MolecularDefinition profiles, i.e., Genotype, Haplotype, Allele and Sequence, through a series of interconnected profile instances. The following nested list shows this example and corresponding MolecularDefinition instances.
Please check the complete list of Haplotype examples and the complete list of Genotype examples for more examples