Note to Implementers:
Molecular Definition resource will replace Molecular Sequence resource. Current page of the Molecular Sequence is temporarily available for referencing and review purposes.
10.8.1 Scope and Usage
The MoleculerDefinition resource is designed for representing genetic molecules (e.g., sequence). It can represent the genetic molecules in different ways, allowing implementations to adopt the most effective ones for their use cases.
The MoleculerDefinition resource is designed to represent a single sequence or a composite of genetic sequences (e.g., haplotype). Each genetic molecule might have multiple representations, but implementers SHALL ensure all representations are for the same molecule. This means that if a single MoleculerDefinition instance contains a literal, two formatted files, and a relative, all four of those representations must represent the same genetic molecule (e.g.,sequence). This can be a challenge across systems, as semantic equivalency of sequences cannot be guaranteed unless there is an agreed upon standard between sending and receiving systems.
10.8.2 Boundaries and Relationships
The MoleculerDefinition resource should only be used to capture molecular representations of genetic concepts such as sequence, allele, and haplotype. It will not be used for observational data related to specific patient. Those concepts will be captured in Observation profiles found in the Genomics Reporting Implementation Guide . The molecule that was observed and that led to the identification of those concepts can be delivered with this resource, and will be referenced by those observations.
MoleculerDefinition will not be used to capture data such as precise read of DNA sequences and sequence alignment are not included; such data may be accessible through references to GA4GH (Global Alliance for Genomics and Health) API, and may be referenced to by the literal element.
The coordinate system used to define the edited intervals on the starting sequence. Coordinate systems are usually 0- or 1-based Binding: LL5323-2 (Extensible)
@prefix fhir: <http://hl7.org/fhir/> .
[ a fhir:MolecularDefinition;
fhir:nodeRole fhir:treeRoot; # if this is the parser root
# from Resource: .id, .meta, .implicitRules, and .language
# from DomainResource: .text, .contained, .extension, and .modifierExtension
fhir:identifier ( [ Identifier ] ... ) ; # 0..* Unique ID for this particular resource
fhir:type[ code ] ; # 0..1 aa | dna | rna
fhir:location( [ # 0..* Location of this molecule
fhir:sequenceLocation[ # 0..1 Location of this molecule in context of a sequence
fhir:sequenceContext[ Reference(MolecularDefinition) ] ; # 1..1 Reference sequence
fhir:coordinateInterval[ # 0..1 Coordinate Interval for this location
fhir:numberingSystem[ CodeableConcept ] ; # 0..1 Coordinate System
# start[x]: 0..1 Start. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 End. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:strand[ CodeableConcept ] ; # 0..1 Forward or Reverse
] ;
fhir:cytobandLocation[ # 0..1 Location of this molecule in context of a cytoband
fhir:genomeAssembly[ # 1..1 Reference Genome
fhir:organism[ CodeableConcept ] ; # 0..1 Species of the organism
fhir:build[ CodeableConcept ] ; # 0..1 Build number
fhir:accession[ CodeableConcept ] ; # 0..1 Accession
# description[x]: 0..1 Genome assemble description. One of these 2
fhir:description[ a fhir:markdown ; markdown ]
fhir:description[ a fhir:string ; string ]
] ;
fhir:cytobandInterval[ # 1..1 Cytoband Interval
fhir:chromosome[ CodeableConcept ] ; # 1..1 Chromosome
fhir:startCytoband[ # 0..1 Start
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 Sub-band. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
fhir:endCytoband[ # 0..1 End
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 SuBand. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
] ;
] ;
fhir:featureLocation( [ # 0..* Location in context of a feature
fhir:geneId ( [ CodeableConcept ] ... ) ; # 0..* Gene Id
] ... ) ;
] ... ) ;
fhir:memberState ( [ Reference(MolecularDefinition) ] ... ) ; # 0..* Member
fhir:representation( [ # 0..* Representation
fhir:focus[ CodeableConcept ] ; # 0..1 The focus of the representation
fhir:code ( [ CodeableConcept ] ... ) ; # 0..* A code of the representation
fhir:literal[ # 0..1 A literal representation
fhir:encoding[ CodeableConcept ] ; # 0..1 The encoding used for the expression of the primary sequence
fhir:value[ string ] ; # 1..1 The primary (linear) sequence, expressed as a literal string
] ;
fhir:resolvable[ Attachment ] ; # 0..1 A resolvable representation of a molecule that optionally contains formatting in addition to the specification of the primary sequence itself
fhir:extracted[ # 0..1 A Molecular Sequence that is represented as an extracted portion of a different Molecular Sequence
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence that serves as the parent sequence, from which the intended sequence will be extracted
fhir:start[ integer ] ; # 1..1 The start coordinate (on the parent sequence) of the interval that defines the subsequence to be extracted
fhir:end[ integer ] ; # 1..1 The end coordinate (on the parent sequence) of the interval that defines the subsequence to be extracted
fhir:coordinateSystem[ CodeableConcept ] ; # 1..1 The coordinate system used to define the interval that defines the subsequence to be extracted. Coordinate systems are usually 0- or 1-based
fhir:reverseComplement[ boolean ] ; # 0..1 A flag that indicates whether the extracted sequence should be reverse complemented
] ;
fhir:repeated[ # 0..1 A Molecular Sequence that is represented as a repeated sequence motif
fhir:sequenceMotif[ Reference(MolecularDefinition) ] ; # 1..1 The sequence that defines the repeated motif
fhir:copyCount[ integer ] ; # 1..1 The number of repeats (copies) of the sequence motif
] ;
fhir:concatenated[ # 0..1 A Molecular Sequence that is represented as an ordered concatenation of two or more Molecular Sequences
fhir:sequenceElement( [ # 1..* One element of a concatenated Molecular Sequence
fhir:sequence[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence corresponding to this element
fhir:ordinalIndex[ integer ] ; # 1..1 The ordinal position of this sequence element within the concatenated Molecular Sequence
] ... ) ;
] ;
fhir:relative[ # 0..1 A Molecular Definition that is represented as an ordered series of edits on a specified starting sequence
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence that serves as the starting sequence, on which edits will be applied
fhir:edit( [ # 0..* An edit (change) made to a sequence
fhir:editOrder[ integer ] ; # 0..1 The order of this edit, relative to other edits on the starting sequence
fhir:coordinateSystem[ CodeableConcept ] ; # 1..1 The coordinate system used to define the edited intervals on the starting sequence. Coordinate systems are usually 0- or 1-based
fhir:start[ integer ] ; # 1..1 The start coordinate of the interval that will be edited
fhir:end[ integer ] ; # 1..1 The end coordinate of the interval that will be edited
fhir:replacementMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The sequence that defines the replacement sequence used in the edit operation
fhir:replacedMolecule[ Reference(MolecularDefinition) ] ; # 0..1 The sequence on the 'starting' sequence for the edit operation, defined by the specified interval, that will be replaced during the edit
] ... ) ;
] ;
] ... ) ;
]
The coordinate system used to define the edited intervals on the starting sequence. Coordinate systems are usually 0- or 1-based Binding: LL5323-2 (Extensible)
@prefix fhir: <http://hl7.org/fhir/> .
[ a fhir:MolecularDefinition;
fhir:nodeRole fhir:treeRoot; # if this is the parser root
# from Resource: .id, .meta, .implicitRules, and .language
# from DomainResource: .text, .contained, .extension, and .modifierExtension
fhir:identifier ( [ Identifier ] ... ) ; # 0..* Unique ID for this particular resource
fhir:type[ code ] ; # 0..1 aa | dna | rna
fhir:location( [ # 0..* Location of this molecule
fhir:sequenceLocation[ # 0..1 Location of this molecule in context of a sequence
fhir:sequenceContext[ Reference(MolecularDefinition) ] ; # 1..1 Reference sequence
fhir:coordinateInterval[ # 0..1 Coordinate Interval for this location
fhir:numberingSystem[ CodeableConcept ] ; # 0..1 Coordinate System
# start[x]: 0..1 Start. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 End. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:strand[ CodeableConcept ] ; # 0..1 Forward or Reverse
] ;
fhir:cytobandLocation[ # 0..1 Location of this molecule in context of a cytoband
fhir:genomeAssembly[ # 1..1 Reference Genome
fhir:organism[ CodeableConcept ] ; # 0..1 Species of the organism
fhir:build[ CodeableConcept ] ; # 0..1 Build number
fhir:accession[ CodeableConcept ] ; # 0..1 Accession
# description[x]: 0..1 Genome assemble description. One of these 2
fhir:description[ a fhir:markdown ; markdown ]
fhir:description[ a fhir:string ; string ]
] ;
fhir:cytobandInterval[ # 1..1 Cytoband Interval
fhir:chromosome[ CodeableConcept ] ; # 1..1 Chromosome
fhir:startCytoband[ # 0..1 Start
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 Sub-band. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
fhir:endCytoband[ # 0..1 End
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 SuBand. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
] ;
] ;
fhir:featureLocation( [ # 0..* Location in context of a feature
fhir:geneId ( [ CodeableConcept ] ... ) ; # 0..* Gene Id
] ... ) ;
] ... ) ;
fhir:memberState ( [ Reference(MolecularDefinition) ] ... ) ; # 0..* Member
fhir:representation( [ # 0..* Representation
fhir:focus[ CodeableConcept ] ; # 0..1 The focus of the representation
fhir:code ( [ CodeableConcept ] ... ) ; # 0..* A code of the representation
fhir:literal[ # 0..1 A literal representation
fhir:encoding[ CodeableConcept ] ; # 0..1 The encoding used for the expression of the primary sequence
fhir:value[ string ] ; # 1..1 The primary (linear) sequence, expressed as a literal string
] ;
fhir:resolvable[ Attachment ] ; # 0..1 A resolvable representation of a molecule that optionally contains formatting in addition to the specification of the primary sequence itself
fhir:extracted[ # 0..1 A Molecular Sequence that is represented as an extracted portion of a different Molecular Sequence
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence that serves as the parent sequence, from which the intended sequence will be extracted
fhir:start[ integer ] ; # 1..1 The start coordinate (on the parent sequence) of the interval that defines the subsequence to be extracted
fhir:end[ integer ] ; # 1..1 The end coordinate (on the parent sequence) of the interval that defines the subsequence to be extracted
fhir:coordinateSystem[ CodeableConcept ] ; # 1..1 The coordinate system used to define the interval that defines the subsequence to be extracted. Coordinate systems are usually 0- or 1-based
fhir:reverseComplement[ boolean ] ; # 0..1 A flag that indicates whether the extracted sequence should be reverse complemented
] ;
fhir:repeated[ # 0..1 A Molecular Sequence that is represented as a repeated sequence motif
fhir:sequenceMotif[ Reference(MolecularDefinition) ] ; # 1..1 The sequence that defines the repeated motif
fhir:copyCount[ integer ] ; # 1..1 The number of repeats (copies) of the sequence motif
] ;
fhir:concatenated[ # 0..1 A Molecular Sequence that is represented as an ordered concatenation of two or more Molecular Sequences
fhir:sequenceElement( [ # 1..* One element of a concatenated Molecular Sequence
fhir:sequence[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence corresponding to this element
fhir:ordinalIndex[ integer ] ; # 1..1 The ordinal position of this sequence element within the concatenated Molecular Sequence
] ... ) ;
] ;
fhir:relative[ # 0..1 A Molecular Definition that is represented as an ordered series of edits on a specified starting sequence
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The Molecular Sequence that serves as the starting sequence, on which edits will be applied
fhir:edit( [ # 0..* An edit (change) made to a sequence
fhir:editOrder[ integer ] ; # 0..1 The order of this edit, relative to other edits on the starting sequence
fhir:coordinateSystem[ CodeableConcept ] ; # 1..1 The coordinate system used to define the edited intervals on the starting sequence. Coordinate systems are usually 0- or 1-based
fhir:start[ integer ] ; # 1..1 The start coordinate of the interval that will be edited
fhir:end[ integer ] ; # 1..1 The end coordinate of the interval that will be edited
fhir:replacementMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The sequence that defines the replacement sequence used in the edit operation
fhir:replacedMolecule[ Reference(MolecularDefinition) ] ; # 0..1 The sequence on the 'starting' sequence for the edit operation, defined by the specified interval, that will be replaced during the edit
] ... ) ;
] ;
] ... ) ;
]
This resource supports three patterns for representing a sequence of interest:
By providing a literal string of IUPAC codes representing nucleotides or amino acids.
By linking to a formatted file or link containing the sequence information (e.g. FASTA file or GA4GH sequence repository).
By providing a list of edits from a starting sequence.
The MolecularSequence resource is designed to represent a single sequence in an instance. Each sequence might have multiple representations, but implementers SHALL ensure all representations are for the same sequence.
10.8.5.1.1 Sequence as a literal string
literal: This string element can be used to hold the sequence as a string of characters.
10.8.5.1.2 Sequence as a file or URL
formatted: This Attachment is used to refer to the sequence as embedded file content or via a URL reference.
This method can be used to refer to sequence data from in an external source. If the sequence is referring to a GA4GH repository, the formatted.url should refer to a GA4GH compliant endpoint that conforms to GA4GH data models.
10.8.5.1.3 Sequence as a series of edits from a known sequence
relative: This complex element is used for encoding sequence. When the information of starting sequence and edits are provided, the observed sequence will be derived. Here is a picture below:
10.8.5.1.3.1 Composing multiple relative sequences into one new sequence
relative.ordinalPosition: Indicates the order in which the sequence should be considered when putting multiple relative instances together.
relative.sequenceRange: Indicates the nucleotide range in the composed sequence when multiple relative instances are used together.
These attributes help to clarify what sequence is being represented with less computation/inference on the recipient side. Implementers SHOULD use sequenceRange first to determine order as the most reliable. If sequenceRange is not present then ordinalPosition SHOULD be used. Finally, if both sequenceRange and ordinalPosition are absent, then the order of the relative data elements SHOULD be used to calculate a composition. It is the responsibility of the data sender to ensure the message can be consistently understood. Additionally, gaps in sequenceRange are considered intentional (i.e. the composed sequence contains a sequence of N's, the placeholder nucleotide, for the gap range).
10.8.5.1.3.2 Representing the Starting Sequence
relative.startingSequence: There are four optional ways to represent a starting sequence in MolecularSequence resource:
relative.startingSequence.sequenceCodeableConcept: Starting sequence id in public database;
relative.startingSequence.sequenceReference: Reference to starting sequence stored in another sequence entity;
relative.startingSequence.genomeAssembly, relative.startingSequence.chromosome: The combination of genome assembly and chromosome.
The relative.startingSequence.windowStart and relative.startingSequence.windowEnddefines a range from the starting sequence that is used to define a subsequence used as the starting sequence.
10.8.5.1.3.3 Coordinate System
When saving the sequence information, the nucleic acid will be numbered with order. Some representations use a 0-based system (e.g. GA4GH API, BAM files) while some use a 1-based system (e.g. VCF file format). The element coordinateSystem contains this information.
relative.coordinateSystem binds to a LOINC answer list, please review those answers here as well as the detailed description found here .
10.8.5.1.3.4 Choice of Strand
There are many considerations concerning the directionality of DNA or RNA. Here we are using relative.startingSequence.orientation and relative.startingSequence.strand. Orientation represents the sense of the sequence, which has different meanings depending on the type. Strand represents the sequence writing order. Watson strand refers to 5' to 3' top strand (5' -> 3'), whereas Crick strand refers to 5' to 3' bottom strand (3' <- 5').
Only two possible values can be made by strand, watson and crick. Since the directionality of the sequence string might be represented in different ways in different omics scenario, below are examples of how to map other expressions into its correlated value:
Watson
Crick
5′-to-3′ direction
3′-to-5′ direction
+1
-1
Sense
Antisense
Positive
Negative
10.8.5.2 Character usage for sequence as strings
There are attributes where the sequence is represented as a string of characters.