Definitional content for a molecular entity, such as a nucleotide or protein sequence.
Note to Implementers:
Molecular Definition resource will replace Molecular Sequence resource. Current page of the Molecular Sequence is temporarily available for referencing and review purposes.
10.8.1 Scope and Usage
The MolecularDefinition resource represents molecular entities (e.g., nucleotide or protein sequences) for both clinical and non-clinical use cases, including translational research. The resource is definitional, in that it focuses on discrete, computable, and semantically expressive data structures that reflect the genomic domain. Because the resource focuses on the molecular entities rather than specimen source or annotated knowledge, it supports both patient/participant-specific use cases and population-based data, and both human and non-human data.
The MolecularDefinition resource itself is abstract, but it supports profiles for core molecular concepts, including Sequence (nucleotide and protein), Allele, Variation, Haplotype, and Genotype. Support for additional molecular types, such as structural variation, fusions, and biomarkers, will be considered in the future.
Use cases supported by this resource include but are not limited to:
Structured exchange of simple sequences of DNA, RNA, or amino acids (whole genome/exome sequencing)
Representation of clinically significant alleles that impact drug response (e.g., pharmacogenomic CDS)
Structured representation of simple and complex genetic variations for diagnostic purposes (clinical diagnosis or risk)
Expression of genotypes that have clinical or research significance (clinical decision making)
Representation of genomic variations that are stored within a public knowledge base
Expression of alleles that are used within risk calculators
10.8.1.1 Sequence Representation
Use cases often require expression of the same genomic concept in different ways. Since the concept is the same and only the serialization of it differs, the Molecular Definition resource supports multiple approaches to representing molecular sequences. This allows senders and receivers of messages to choose a sequence representation that is most intuitive for the particular use case.
It is important to note that all representations of a given sequence MUST resolve to the exact same primary sequence. Therefore, if a single instance of MolecularDefinition contains one literal, two resolvable files, and a code, all four of those representations must represent the same sequence. Note that this equivalence does not apply to metadata or annotations that are outside the scope of the Molecular Definition resource, since those data are not definitional to the molecule.
10.8.2 Boundaries and Relationships
The MolecularDefinition resource should be profiled and used to capture representations of molecular concepts such as sequence, allele, haplotype, and genotype.
This resource does not capture workflow (e.g., test ordering/resulting process), the method of obtaining or specifying the molecular content (e.g., the test or assay), or the interpretation of the results (e.g., clinical impact). Those concepts will be captured by profiles of Observation and by the Genomic Study resource. In particular, the Genomics Reporting Implementation Guide contains extensive support for the observation and reporting of clinical genomic results.
@prefix fhir: <http://hl7.org/fhir/> .
[ a fhir:MolecularDefinition;
fhir:nodeRole fhir:treeRoot; # if this is the parser root
# from Resource: .id, .meta, .implicitRules, and .language
# from DomainResource: .text, .contained, .extension, and .modifierExtension
fhir:identifier ( [ Identifier ] ... ) ; # 0..* Unique ID of an instance
fhir:description[ markdown ] ; # 0..1 Description of the Molecular Definition instance
fhir:moleculeType[ CodeableConcept ] ; # 0..1 The type of molecule (e.g., DNA, RNA, amino acid)
fhir:type ( [ CodeableConcept ] ... ) ; # 0..* Classification of the molecule into types other than those defined by moleculeType
fhir:topology ( [ CodeableConcept ] ... ) ; # 0..* The structural topology of the molecular entity (e.g., linear, circular)
fhir:member ( [ Reference(MolecularDefinition) ] ... ) ; # 0..* Constituents of an aggregate molecular concept (e.g., haplotype, genotype)
fhir:location( [ # 0..* A defined location on a molecular entity
fhir:sequenceLocation[ # 0..1 A coordinate-based location on a sequence
fhir:sequenceContext[ Reference(MolecularDefinition) ] ; # 1..1 The sequence on which the location is defined
fhir:coordinateInterval[ # 0..1 An interval on a sequence
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:strand[ CodeableConcept ] ; # 0..1 The strand at the coordinateInterval
] ;
fhir:cytobandLocation[ # 0..1 A cytoband-based location on a sequence
fhir:genomeAssembly[ # 1..1 Reference Genome
fhir:organism[ CodeableConcept ] ; # 0..1 Species of the organism
fhir:build[ CodeableConcept ] ; # 0..1 Build number
fhir:accession[ CodeableConcept ] ; # 0..1 Accession
# description[x]: 0..1 Genome assembly description. One of these 2
fhir:description[ a fhir:markdown ; markdown ]
fhir:description[ a fhir:string ; string ]
] ;
fhir:cytobandInterval[ # 1..1 Cytoband Interval
fhir:chromosome[ CodeableConcept ] ; # 1..1 Chromosome
fhir:startCytoband[ # 0..1 Start
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 Sub-band. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
fhir:endCytoband[ # 0..1 End
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 SuBand. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
] ;
] ;
] ... ) ;
fhir:representation( [ # 0..* A representation of a molecular entity
fhir:focus[ CodeableConcept ] ; # 0..1 The domain concept that is the focus of a given instance of the representation
fhir:code ( [ CodeableConcept ] ... ) ; # 0..* A code (e.g., sequence accession number) used to represent a molecular entity
fhir:literal[ # 0..1 A molecular entity defined as a string literal
fhir:encoding[ CodeableConcept ] ; # 0..1 The encoding used in the value
fhir:value[ string ] ; # 1..1 A string literal representation of the molecular entity, using the encoding specified in encoding
] ;
fhir:resolvable[ Reference(DocumentReference) ] ; # 0..1 A resolvable representation of a molecular entity (e.g., URI, attached and formatted file)
fhir:extracted[ # 0..1 A molecular entity that is represented as a portion of a different entity
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity that serves as the conceptual 'parent' from which the intended entity is derived
fhir:coordinateInterval[ # 0..1 The interval on startingMolecule that defines the portion to be extracted to produce the intended entity
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:reverseComplement[ boolean ] ; # 0..1 A flag that indicates whether the extracted sequence should be reverse complemented
] ;
fhir:repeated[ # 0..1 A representation as a repeated motif
fhir:sequenceMotif[ Reference(MolecularDefinition) ] ; # 1..1 The motif that is repeated
fhir:copyCount[ integer ] ; # 1..1 The number of copies of the motif
] ;
fhir:concatenated[ # 0..1 An ordered concatenation of molecular entities
fhir:sequenceElement( [ # 1..* One of the concatenated entities
fhir:sequence[ Reference(MolecularDefinition) ] ; # 1..1 A reference to the sequence that defines this specific concatenated element
fhir:ordinalIndex[ integer ] ; # 1..1 The ordinal index of the element within the concatenated representation
] ... ) ;
] ;
fhir:relative[ # 0..1 A molecular entity represented as an ordered series of edits on a specified starting entity
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity on which edits will be applied
fhir:edit( [ # 0..* A defined edit (change) to be applied
fhir:editOrder[ integer ] ; # 0..1 Defines the order of edits when multiple edits are to be applied to the startingMolecule
fhir:coordinateInterval[ # 0..1 The interval on startingMolecule that defines the portion to be extracted to produce the intended entity
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:replacementMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity that serves as the replacement in the edit operation
fhir:replacedMolecule[ Reference(MolecularDefinition) ] ; # 0..1 The portion of the molecular entity that is replaced by the replacementMolecule
] ... ) ;
] ;
] ... ) ;
]
@prefix fhir: <http://hl7.org/fhir/> .
[ a fhir:MolecularDefinition;
fhir:nodeRole fhir:treeRoot; # if this is the parser root
# from Resource: .id, .meta, .implicitRules, and .language
# from DomainResource: .text, .contained, .extension, and .modifierExtension
fhir:identifier ( [ Identifier ] ... ) ; # 0..* Unique ID of an instance
fhir:description[ markdown ] ; # 0..1 Description of the Molecular Definition instance
fhir:moleculeType[ CodeableConcept ] ; # 0..1 The type of molecule (e.g., DNA, RNA, amino acid)
fhir:type ( [ CodeableConcept ] ... ) ; # 0..* Classification of the molecule into types other than those defined by moleculeType
fhir:topology ( [ CodeableConcept ] ... ) ; # 0..* The structural topology of the molecular entity (e.g., linear, circular)
fhir:member ( [ Reference(MolecularDefinition) ] ... ) ; # 0..* Constituents of an aggregate molecular concept (e.g., haplotype, genotype)
fhir:location( [ # 0..* A defined location on a molecular entity
fhir:sequenceLocation[ # 0..1 A coordinate-based location on a sequence
fhir:sequenceContext[ Reference(MolecularDefinition) ] ; # 1..1 The sequence on which the location is defined
fhir:coordinateInterval[ # 0..1 An interval on a sequence
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:strand[ CodeableConcept ] ; # 0..1 The strand at the coordinateInterval
] ;
fhir:cytobandLocation[ # 0..1 A cytoband-based location on a sequence
fhir:genomeAssembly[ # 1..1 Reference Genome
fhir:organism[ CodeableConcept ] ; # 0..1 Species of the organism
fhir:build[ CodeableConcept ] ; # 0..1 Build number
fhir:accession[ CodeableConcept ] ; # 0..1 Accession
# description[x]: 0..1 Genome assembly description. One of these 2
fhir:description[ a fhir:markdown ; markdown ]
fhir:description[ a fhir:string ; string ]
] ;
fhir:cytobandInterval[ # 1..1 Cytoband Interval
fhir:chromosome[ CodeableConcept ] ; # 1..1 Chromosome
fhir:startCytoband[ # 0..1 Start
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 Sub-band. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
fhir:endCytoband[ # 0..1 End
# arm[x]: 0..1 Arm. One of these 2
fhir:arm[ a fhir:code ; code ]
fhir:arm[ a fhir:string ; string ]
# region[x]: 0..1 Region. One of these 2
fhir:region[ a fhir:code ; code ]
fhir:region[ a fhir:string ; string ]
# band[x]: 0..1 Band. One of these 2
fhir:band[ a fhir:code ; code ]
fhir:band[ a fhir:string ; string ]
# subBand[x]: 0..1 SuBand. One of these 2
fhir:subBand[ a fhir:code ; code ]
fhir:subBand[ a fhir:string ; string ]
] ;
] ;
] ;
] ... ) ;
fhir:representation( [ # 0..* A representation of a molecular entity
fhir:focus[ CodeableConcept ] ; # 0..1 The domain concept that is the focus of a given instance of the representation
fhir:code ( [ CodeableConcept ] ... ) ; # 0..* A code (e.g., sequence accession number) used to represent a molecular entity
fhir:literal[ # 0..1 A molecular entity defined as a string literal
fhir:encoding[ CodeableConcept ] ; # 0..1 The encoding used in the value
fhir:value[ string ] ; # 1..1 A string literal representation of the molecular entity, using the encoding specified in encoding
] ;
fhir:resolvable[ Reference(DocumentReference) ] ; # 0..1 A resolvable representation of a molecular entity (e.g., URI, attached and formatted file)
fhir:extracted[ # 0..1 A molecular entity that is represented as a portion of a different entity
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity that serves as the conceptual 'parent' from which the intended entity is derived
fhir:coordinateInterval[ # 0..1 The interval on startingMolecule that defines the portion to be extracted to produce the intended entity
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:reverseComplement[ boolean ] ; # 0..1 A flag that indicates whether the extracted sequence should be reverse complemented
] ;
fhir:repeated[ # 0..1 A representation as a repeated motif
fhir:sequenceMotif[ Reference(MolecularDefinition) ] ; # 1..1 The motif that is repeated
fhir:copyCount[ integer ] ; # 1..1 The number of copies of the motif
] ;
fhir:concatenated[ # 0..1 An ordered concatenation of molecular entities
fhir:sequenceElement( [ # 1..* One of the concatenated entities
fhir:sequence[ Reference(MolecularDefinition) ] ; # 1..1 A reference to the sequence that defines this specific concatenated element
fhir:ordinalIndex[ integer ] ; # 1..1 The ordinal index of the element within the concatenated representation
] ... ) ;
] ;
fhir:relative[ # 0..1 A molecular entity represented as an ordered series of edits on a specified starting entity
fhir:startingMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity on which edits will be applied
fhir:edit( [ # 0..* A defined edit (change) to be applied
fhir:editOrder[ integer ] ; # 0..1 Defines the order of edits when multiple edits are to be applied to the startingMolecule
fhir:coordinateInterval[ # 0..1 The interval on startingMolecule that defines the portion to be extracted to produce the intended entity
fhir:coordinateSystem[ # 0..1 The coordinate system used to define the location
fhir:system[ CodeableConcept ] ; # 0..1 The type of coordinate system used
fhir:origin[ CodeableConcept ] ; # 0..1 The location of the origin of the coordinate system
fhir:normalizationMethod[ CodeableConcept ] ; # 0..1 The normalization method used for determining a location within the coordinate system
] ;
# start[x]: 0..1 The start location of the interval. One of these 2
fhir:start[ a fhir:Quantity ; Quantity ]
fhir:start[ a fhir:Range ; Range ]
# end[x]: 0..1 The end location of the interval. One of these 2
fhir:end[ a fhir:Quantity ; Quantity ]
fhir:end[ a fhir:Range ; Range ]
] ;
fhir:replacementMolecule[ Reference(MolecularDefinition) ] ; # 1..1 The molecular entity that serves as the replacement in the edit operation
fhir:replacedMolecule[ Reference(MolecularDefinition) ] ; # 0..1 The portion of the molecular entity that is replaced by the replacementMolecule
] ... ) ;
] ;
] ... ) ;
]
Molecular sequences are represented using numerous encodings, which are not always explicitly specified. The representation.literal.encoding attribute captures this information directly, so that implementors can validate the content of messages and computationally determine how a particular sequence should be interpreted.
The examples below illustrate different encodings, which could be used to create terms for this attribute. They are based on the IUPAC symbols for nucleotide and amino acid sequences.
10.8.5.1.1 Nucleotide Symbols (1-letter, no ambiguity, DNA residues)
Symbol
Meaning
Origin of designation
G
Guanine
G
A
Adenine
A
T
Thymine
T
C
Cytosine
C
10.8.5.1.2 Nucleotide, 1-letter, no ambiguity, RNA residues
Symbol
Meaning
Origin of designation
G
Guanine
G
A
Adenine
A
U
Uracil
U
C
Cytosine
C
10.8.5.1.3 Nucleotide Symbols (1-letter, no ambiguity except N, DNA residues)
Symbol
Meaning
Origin of designation
G
Guanine
G
A
Adenine
A
T
Thymine
T
C
Cytosine
C
N
G or A or T or C
aNy
10.8.5.1.4 Nucleotide Symbols (1-letter, with ambiguity, DNA residues)
Symbol
Meaning
Origin of designation
G
Guanine
G
A
Adenine
A
T
Thymine
T
C
Cytosine
C
R
G or A
puRine
Y
T or C
pYrimidine
M
A or C
aMino
K
G or T
Keto
S
G or C
Strong interaction (3 H bonds)
W
A or T
Weak interaction (2 H bonds)
H
A or C or T
not-G, H follows G in the alphabet
B
G or T or C
not-A, B follows A
V
G or C or A
not-T (not-U), V follows U
D
G or A or T
not-C, D follows C
N
G or A or T or C
aNy
10.8.5.1.5 Amino Acid Symbols (1-letter, no ambiguity, 20 common)
Symbol
Amino acid
A
alanine
C
cysteine
D
aspartic acid
E
glutamic acid
F
phenylalanine
G
glycine
H
histidine
I
isoleucine
K
lysine
L
leucine
M
methionine
N
asparagine
P
proline
Q
glutamine
R
arginine
S
serine
T
threonine
V
valine
W
tryptophan
Y
tyrosine
10.8.5.1.6 Amino Acid Symbols (3-letter, no ambiguity, 20 common)
Symbol
Amino acid
Ala
alanine
Cys
cysteine
Asp
aspartic acid
Glu
glutamic acid
Phe
phenylalanine
Gly
glycine
His
histidine
Ile
isoleucine
Lys
lysine
Leu
leucine
Met
methionine
Asn
asparagine
Pro
proline
Gln
glutamine
Arg
arginine
Ser
serine
Thr
threonine
Val
valine
Trp
tryptophan
Tyr
tyrosine
10.8.5.1.7 Amino Acid Symbols (1-letter, with ambiguity)
Symbol
Amino acid
A
alanine
B
aspartic acid or asparagine
C
cysteine
D
aspartic acid
E
glutamic acid
F
phenylalanine
G
glycine
H
histidine
I
isoleucine
K
lysine
L
leucine
M
methionine
N
asparagine
P
proline
Q
glutamine
R
arginine
S
serine
T
threonine
U
selenocysteine
V
valine
W
tryptophan
X
unknown or 'other' amino acid
Y
tyrosine
Z
glutamic acid or glutamine
10.8.5.1.8 Amino Acid Symbols (3-letter, with ambiguity)
Symbol
Amino acid
Ala
alanine
Asx
aspartic acid or asparagine
Cys
cysteine
Asp
aspartic acid
Glu
glutamic acid
Phe
phenylalanine
Gly
glycine
His
histidine
Ile
isoleucine
Lys
lysine
Leu
leucine
Met
methionine
Asn
asparagine
Pro
proline
Gln
glutamine
Arg
arginine
Ser
serine
Thr
threonine
Sec
selenocysteine
Val
valine
Trp
tryptophan
Xaa
unknown or 'other' amino acid
Tyr
tyrosine
Glx
glutamic acid or glutamine
10.8.5.2 Molecular Representations
The Molecular Definition resource supports several different methods for representing a molecule. Some of the elements described below may apply only to sequences, and different elements may be added to support other types of molecular concepts.
Native representations: The literal, code, and resolvable are native representations, meaning they represent a sequence “as-is” without any additional computation.
Derived representations: The extracted, concatenated, repeated, and relative representations are derived representations, meaning they require one or more computational operations to be performed to create the sequence that is being represented.
10.8.5.2.1 Literal
The literal element can be used to represent a sequence as a string of characters. By convention, nucleotide sequences are expressed 5’ to 3’ and protein sequences are expressed N to C terminus. The encoding element can optionally be used to specify the encoding used for the sequence literal. The encoding can be important in disambiguating sequences that share alphabets (for example, ATG might represent a translation start codon in DNA, but it could also represent a peptide containing 3 amino acids).
10.8.5.2.2 Code
The code element can be used to represent a sequence by reference, using an accession number that identifies a specific sequence within a repository. The code, system, and version elements of the Coding data type can be used to fully disambiguate one code from another. Note that the code element does not guarantee that the repository is publicly accessible or that the sequence referenced by the code can be retrieved, it only specifies the sequence using a code that could be exchanged. Thus, this element could be used for both a public sequence repository (e.g., GenBank) and a private database (e.g., biobank).
10.8.5.2.3 Resolvable
The resolvable element can be used to represent a sequence by reference, but it also implies that the sequence is accessible and SHOULD be resolvable (although a security layer may be present). This element makes use of the Document Reference resource, which contains the content.attachment element. The Attachment datatype can be used to represent sequences that are captured as a formatted file (using .contentType and .data) or as a URL (using .contentType and .url).
10.8.5.2.4 Extracted
The extracted element can be used to represent a sequence that is derived from another, longer sequence. The startingMolecule element refers to the “parent” sequence, and is itself an instance of Molecular Definition (with its own representation). The coordinateInterval element specifies a precise interval on the “parent” sequence, which is to be extracted (conceptually or literally) and optionally reverse-complemented. This element provides a way to conveniently reference regions of very long molecules (e.g., chromosomes) without requiring either the “parent” or the extracted sequence to be serialized. Conceptually, this representation is the inverse operation of the concatenated representation.
10.8.5.2.5 Concatenated
The concatenated element can be used to represent a sequence that is comprised of other sequences that are concatenated together to form the intended sequence. Each sequenceElement is specified as an instance of Molecular Definition (and each has its own representation). The order of concatenation is explicitly defined using the ordinalIndex element. Conceptually, this representation is the inverse operation of the extracted representation.
10.8.5.2.6 Repeated
The repeated element can be used to represent a sequence that is comprised of a sequence motif that is repeated a specified number of times. The sequenceMotif is an instance of Molecular Definition (and has its own representation), and copyCount specifies the number of times the motif is copied in tandem. Conceptually, this representation is a special case of the concatenated representation, where each element is an identical copy of a given motif.
10.8.5.2.7 Relative
The relative element can be used to represent a sequence in relation to another sequence, where the difference between the two sequences can be expressed as an ordered series of edit operations. This representation can be used to conveniently represent minor but meaningful differences between long or complex sequences (e.g., HLA alleles). Algorithmically, the relative representation defines a sequence by beginning with a startingMolecule (an instance of Molecular Definition) and performing at least one edit operation on it. Each edit operation is performed in order and includes replacing the sequence (the replacedMolecule) at a defined coordinateInterval with the sequence specified by the replacementMolecule. The resulting sequence after all edits have been performed is the sequence referenced by this representation element.
Note that the edits specified in this representation are operations and NOT variations. Variations are defined as a specific comparison between two states (a reference and an alternative), and while they are sometimes called “changes” and therefore they might be confused for edit operations, they are semantically distinct concepts.
10.8.5.3 Combining Representations
Since the derived representations (extracted, concatenated, repeated, and relative) each reference Molecular Definition, representations can be combined to support complex use cases. For example:
An extracted representation can use as its startingMolecule a chromosome sequence that is specified using an accession number (represented as a code).
A repeated representation can define the sequenceMotif using a literal.
A concatenated representation for an assembled contig can include each sequenceElement as an attached, formatted file via resolvable.
A relative representation can specify the startingMolecule using a code, and the replacementMolecule for each edit could be defined using a literal.
It is possible to create arbitrarily deep structures using derived representations, and while there might be rationale for doing so implementations should avoid overly-complex representation structures.
10.8.5.4 Equivalence and Identity
Every representation, regardless of its complexity, can be resolved to a literal. Two instances of MolecularDefinition are considered equivalent if they define the same entity. For molecular sequences, this means that for two instances of MolecularDefinition to be equivalent they must resolve to the same literal sequence. Two instances are identical if their serializations are identical: they must contain the same elements, and each corresponding element must have the same value.
10.8.5.5 Profiling MolecularDefinition
10.8.5.5.0.1 Support for Molecular Concepts
The Molecular Definition resource supports several profiles that represent molecular concepts:
Sequence: a primary sequence
Allele: a Sequence at a Location on a larger, contextual Sequence
Variation: a defined comparison between a specified reference Allele and an alternative Allele, both at a given Location on a larger, contextual Sequence
In addition, profiles have been drafted to represent the concepts of Haplotype and Genotype, although they have not been exercised as deeply as the profiles listed above. Finally, preliminary work has demonstrated that the Molecular Definition resource could be used to represent concepts related to structural variation, including Adjacency and Fusion. It is anticipated that profiles to support these concepts will be developed over time.
10.8.5.5.0.2 Modular Semantics and Schemas
The MolecularDefinition resource is an abstract resource that provides building blocks for creating semantically robust, computable structures that define molecular entities. The two most complex backbone elements, location and representation, support the concept of molecular sequences but they might not be relevant to other types of entities. Conversely, other entities may require different backbone elements. As such, it is expected that these high-level backbone elements will serve as modular schemas that can be profiled as needed for a given molecular entity. Profiling could include constraints on cardinality (e.g., the Sequence profile has 0..0 location, while Allele has 1..1 location) and slicing.
10.8.5.5.0.3 Slicing the Representation Element
The representation backbone element provides a series of methods for specifying the value of a sequence. As a result, the entire structure can be used any time a sequence is referenced, and this is accomplished by slicing. For example, the current sequence-based profiles of MolecularDefinition slice the representation element as follows:
Profile
Cardinality
Focus (slice)
Semantic meaning
Sequence
1..1
Primary Sequence
The primary sequence of the molecule
Allele
1..1
Allele
The sequence of the Allele at the specified Location
Allele
0..1
Context
The sequence of the contextual sequence at the specified Location
Variation
1..1
Reference
The sequence defined as the reference allele (at the specified Location)
Variation
1..1
Alternate
The sequence defined as the alternate allele (at the specified Location)
Variation
0..1
Context
The sequence of the contextual sequence at the specified Location