Terminology Change Set Exchange
1.0.0-ballot - CI Build International flag

Terminology Change Set Exchange, published by HL7 International - Termionology Infrastructure Work Group. This guide is not an authorized publication; it is the continuous build for version 1.0.0-ballot built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/HL7/termchangeset-ig/ and changes regularly. See the Directory of published versions

Tinkar Model Concepts

Page standards status: Informative

The Tinkar Reference Model is a logical model described herein using the Object Management Group (OMG) Unified Modeling Language (UML) 2.0 notation to describe the structure of integrated data representation and change management for biomedical terminologies. Tinkar provides an architecture that delivers integrated terminology to the enterprise and its information systems. In doing so, it addresses the differences in management and structure across reference terminology, local concepts, and code lists/value sets. This section describes classes of objects that support a common foundational framework for terminology and knowledge base systems (e.g., SNOMED CT®, LOINC®, RxNorm, HL7). An implementation of Tinkar can provide a single representation for all terminologies required in the U.S. and other countries, while also providing a better foundation for managing change. Tinkar could support the operation of a variety of systems intended to deliver knowledge management for terminology to vendors providers, and standards-development organizations like HL7.

Standard Class Model

Class Version

Versioned Component

Figure 3.1. Versioned Component

The Tinkar Reference Model fulfills the requirement of capturing a complete record of all changes, including relevant context information. This is captured via the STAMP class using the following fields:

  1. Status: A status is identified by a concept, which may be annotated with other identifying information. For example: active or inactive (Requirement 19)
  2. Time: Timestamps must employ a common standard, which must support precision and time zone. (Requirement 20)
  3. Author: An author is identified by a concept, which may be annotated with other identifying information as required. (Requirement 21)
  4. Module: Assignment to the appropriate terminology (e.g., LOINC) or terminology component (e.g., SNOMED CT®, US Extension). A module is identified by a concept, which may be annotated with other identifying information. (Requirement 22)
  5. Path: Specification of an object under version control within a terminology development lifecycle, e.g., for distributed development, testing, staging, or production. A path is a common synomyn for “branch” as used in current software version control best practices/literature. A path is identified by a concept, which may be annotated with other identifying information. A core set of paths is necessary to support publication to external organizations. (Requirement 23)

These elements together are referred to by the acronym “STAMP,” as described previously. Every new assertion, whether a new component or a change to an existing component, must have a STAMP to determine when it is to be used. The STAMP properties support the ability to apply terminology components for specific purposes. For example,

  • “Path” can be used to test provisional content without physically swapping out systems.
  • “Module” can be used to filter out work that has not been authorized by the enterprise.
  • “Time” supports the ability to apply CDS rules as they would have looked in the past.

The Tinkar Reference Model does not merely support the ability to “STAMP” components; it asserts a requirement that all changes have a STAMP. STAMP assertions are unversioned IdentifiedComponents that are referenced by the components they scope. Since STAMP uses versioned concepts (that have a STAMP), having the STAMP as a versioned component would lead to an infinite regress.

Not all terminology systems contain all the information recorded in STAMP, but defaults can be used in cases where it is missing. For example, SNOMED CT contains the Status, Time, and Module but do not distribute the Path or Author. Since most terminologies only release a Production path, the Path could be defaulted to Production Path and the Author could be defaulted to SNOMED CT Author.

All IdentifedComponents in the knowledge base will consist of a series of change records, called ComponentVersions, (beginning with the “Create” version), all associated to an underlying ComponentChronology.

A Components Chronology only has properties attributed to it by its versions. Looking at the IdentifiedComponent through different sets of changes (published version, geographically defined set of modules, historical timestamp) may reveal substantially different IdentifedComponents.

Component Types

Component Types

Component Types

Figure 3.2. Component Types

All Components in Tinkar are uniquely identified using UUIDs. A Component will be represented by an array of UUIDs with at least one UUID, but can be represented by more than one UUID in the case of a concept being derived from multiple sources. For example, the concept Acetaminophen (which exists in SNOMED CT®, LOINC, and RxNorm) could have a UUID from each terminology and be represented as an array of UUIDs for this single concept within a Tinkar implementation.

A Concept is identified using UUIDs and contains no information. To assemble groups of assertions and to provide information about Concepts, Tinkar uses a construct called a Semantic. A Semantic is a class containing a set of predicates and objects about a subject. A semantic adds meaning to the components it references, through the fields it contains. A Semantic supports the specification of value sets, compositional definitions, and other components requiring internal structure, and it specifies the nature of the compositional relationship explicitly.

The Semantic class uses a Concept to define the relationship between the value(s) and the Concept; the value itself may be either a concept or some other kind of data type, such as a string. This creates the ability to assemble assertions into more complex structures.

Compositional Semantics

Compositional Semantics

Figure 3.3. Compositional Semantics

As discussed earlier, if an author makes a change to an IdentifiedComponent, the prior Version is unchanged, but a new version - with the appropriate STAMP information - is recorded. Users viewing the Concept and associated Semantics in the prior context (i.e., as of the prior time, if no other STAMP element has changed) will see the old values; users viewing the Concept and associated Semantics in the new context will see the new values.

Since it is versioned, a Semantic is manifested as a SemanticChronology, containing a set of SemanticVersions. SemanticVersion is a single instance of a Semantic with a STAMP, and a SemanticChronology is the set of versions having a STAMP for a Semantic. Concepts, too, are manifest as collections: a ConceptChronology consisting of a set of ConceptVersions. ConceptVersion is a single instance of an identifier for a concept with a STAMP and the ConceptChronology is the set of versions having a STAMP for a concept. A concept identifier specifies a ConceptChronology; specifying a ConceptVersion requires a rule or parameter for selecting among STAMP values.

If other IdentifiedComponents depend on the changed concept, these IdentifiedComponents can be identified by relationships in the Semantics. The Semantics can assert rules for how to manage these changes. A Semantic defining a value set for data entry might automatically accept any deactivations from the source system authority, while a Semantic defining a value set for research might automatically decline to adopt deactivations, or do so based on whether there are extant operational values. Escalating such decisions for human adjudication or review at multiple levels is also always an option. Systems might adopt any number of methods for dealing with identified changes: the important thing is to ensure the changes can be identified consistently.

Field Data Types

Tinkar supports the following field data types for use with Semantics.

  1. String - a sequence of characters, either as a literal constant or as a variable. Strings could be used to represent terms from code systems or URLs, textual definitions, etc.
  2. Integer - data type that represents some range of mathematical integers.
  3. Float - represents values as high-precision fractional values.
  4. Boolean - represents the values true and false.
  5. Byte Array - an array of 8-bit signed two's complement integers.
  6. Directed Graph or Digraph - a graph whose edges are ordered pairs of vertices. Each edge can be followed from one vertex to another vertex.
  7. Instant - models a single instantaneous point on a timeline.
  8. Planar Point - position in a two-dimensional space (a plane).
  9. Spatial Point - position in a three-dimensional space.
  10. Component ID List - an ordered list of Component IDs.
  11. Component ID Set - an unordered list of Component IDs.
  12. UUID (Universally Unique Identifier) - A 128-bit number used to identify information in computer systems.
  13. Directed Tree or Ditree - a graph obtained from an undirected tree by replacing each undirected edge by two directed edges with opposite directions.
  14. DiGraph - A graph in which a set of objects are connected where all the edges are directed from one vertex to another.
  15. Vertex - The fundamental unit of data that makes up a graph or tree.
    1. In Tinkar, property graphs are used as a general-purpose data pattern to represent an abstract syntax tree (AST), such as OWL EL++. This allows for data types without requiring custom nodes.
      1. An AST may be used “during semantic analysis, where the compiler checks for correct usage of the elements of the program and the language. The compiler also generates symbol tables based on the AST during semantic analysis. A complete traversal of the tree allows verification of the correctness of the program. After verifying correctness, the AST serves as the base for code generation. The AST is often used to generate an intermediate representation, sometimes called an intermediate language, for the code generation.” [15]
      2. An AST is made up of nodes and branches. In Tinkar, every tree will always have roots, but they are specific: “An OWL EL root” vs. a “BPMN root”, etc. Each node must have 0 or more children.
    2. Here is an example of Tinkar output of semantics that reference multiple concepts. Vertex Example In this output, one can see a sufficient set and necessary set. Bulleted items are properties in the node. The output is printed as a “depth first search.” Each depth adds 3 characters of padding and shows how OWL EL++ definitions, using only terminology and a standard property graph data structure, are represented. The 1st one is node index 0 which has a child of node index 8. Node index 0 is the OWL EL++ definition root. Node 8 points to Node 7, and the meaning of Node 8 is that it is a necessary set. Node 7 is 'And' and points to Node 5,1,6. Node 5's meaning is 'Role Type', Value is 'Role group', and its other property is 'Role Operator.' Node 5 points to Node 4. Node 4 is 'And.' Node 3 is 'Role Type.' Node 2 is Concept Reference. 7 also points to 1 and 6 (Concept References).
    3. The property graph model demonstrates that each vertex has a meaning. Tinkar can use concepts to represent anything end users might need at nodes. This allows for data types without requiring custom nodes. With no changes to the underlying data structures, Tinkar can represent more than OWL EL++. With updates to terminology, Tinkar can represent any parsable standard, such as BPMN and DMN, using this property graph model and a proper set of terminology concepts and semantics represented using Tinkar.

Pattern (For Semantics)

The Tinkar Reference Model defines a first-class feature of the model, the Pattern (PatternVersion and PatternChronology). A Pattern is a class defining a set of predicates and object types that can be asserted about a class of subjects. All Semantics follow Patterns. A PatternVersion is a single instance of a pattern with a STAMP and a PatternChronology is the set of versions having a STAMP for a pattern. This feature asserts patterns that Semantic components can follow, like an XML or RDF Schema.

Semantic Definition

Pattern

Figure 3.4. Pattern

Using the Pattern, Semantics with varying fields and data types can be specified to represent any structure needed to provide meaning to a concept. For example, if a field within a semantic is used to describe an SDO's website, the Meaning would be “URL,” DataType of “String,” and Purpose of “Website.” The Pattern would then contain an array of these FieldDefinitions.

Overall Tinkar Architecture

Tinkar Architecture

Overall Tinkar Architecture

Figure 3.5. Overall Tinkar Architecture

Class Definitions

Coordinate

The Tinkar Reference Model supports and encourages the storage of time series data that utilizes multiple coordinates, for example, STAMP, Language, Dialect, clinical domains, etc. The ability to efficiently search, display, and navigate concepts and semantics requires the ability to calculate combinations of content based on one or more of these different coordinates.

In order to facilitate the computability of various, complex coordinates, including time series data, a graph structure is commonly used in software versioned control systems. A particular type of graph structure that is commonly used is a “version graph,” such as a directed acyclic graph. A version graph would enable a Tinkar implementation to recover the state of the graph at a particular point in time. Most graph databases do not support versioning as a first-class concept. It is possible, however, to create a versioning scheme inside the graph model whereby nodes and relationships are timestamped and archived whenever they are modified. The downside of such versioning schemes is that they leak into any queries written against the graph, adding a layer of complexity to even the simplest query.

Types of Coordinates:

  1. STAMP coordinates are the most basic type of coordinate on which all content should be filtered. Examples of STAMP coordinates are:
    1. Most recent version
    2. Set of data from several versions
    3. All active components only
  2. STAMP coordinates are the most basic type of coordinate on which all content should be filtered. Examples of STAMP coordinates are:
    1. Displaying terms based on a language and/or dialect
    2. Prioritized list of synonyms based on a particular clinical domain
  3. STAMP coordinates are the most basic type of coordinate on which all content should be filtered. Examples of STAMP coordinates are:
    1. Result from various Description Logic Classifiers
  4. STAMP coordinates are the most basic type of coordinate on which all content should be filtered. Examples of STAMP coordinates are:
    1. Stated vs. inferred relationships from SNOMED CT®
    2. Concepts inclusion/exclusion for a particular domain

As the Tinkar specification evolves towards a DSTU and Connectathons, more coordinates and detailing will be provided.

Calculating Coordinates

The ComponentChronology contains all the versions of a component from the date it was instantiated until the most recent version. Components only get a new version whenever something about the component changes. To calculate the latest version requires the ability to find the most recent version of each component. Utilizing the STAMP Coordinates supports calculating all other coordinates:

  1. Identify the Module(s) the user would like to view/search/modify.
  2. Identify the Path the user would like to view/search/modify.
  3. Identify the Status or Statuses the user would like to view/search/modify.
  4. If relevant, identify the Author(s) the user would like to view/search/modify.
  5. The last piece of the STAMP coordinate (time) is the most difficult to calculate. In most cases the user will need to find the most recent version of the component as of the current time to calculate this point of the coordinate. However, since Tinkar supports and encourages the representation of historical, the user may need to calculate the most recent version as of a different point in time.

After the STAMP Coordinates have been calculated, additional coordinates can then be applied as well. For example, applying a language and dialect coordinate will be important not only for viewing and searching, but also to determine the appropriate preferred name for displaying a hierarchy.

Tinkar Starter Data Model Concepts

Fully Qualified Name Definition Origin
Acceptable (foundation metadata concept) Specifies that a description is acceptable, but not preferred within a language or dialect. DESCRIPTION_ACCEPTABILITY
Active status Concept used to represent a status for components that are active. STATUS_VALUE
Author ROOT_VERTEX
Canceled status Concept used to represent a status for components that are canceled STATUS_VALUE
Component field A display field type that references a concept ID. DISPLAY_FIELDS
Component Id list A display field that references an ordered list of Concept IDs. DISPLAY_FIELDS
Component Id set field A display field that references an unordered list of Concept IDs. DISPLAY_FIELDS
Decription not case sensitive Value which designate character as not sensitive for a given description DESCRIPTION_CASE_SIGNIFICANCE
Definition description type Semantic value describing the description type for the description pattern is a definition DESCRIPTION_TYPE
Description Human readable text for a concept DESCRIPTION_PATTERN
Description acceptability Whether a given human readable text for a concept is permissible US_DIALECT_PATTERN; GB_DIALECT_PATTERN
Description case sensitive Assumes the description is dependent on capitalization DESCRIPTION_CASE_SIGNIFICANCE
Description case significance Specifies how to handle the description text interms of case sensitivity DESCRIPTION_PATTERN
Description pattern Contains all metadata and human readable text that describes the concept PATTERN
Description semantic Purpose and meaning for the description pattern and dialect patterns DESCRIPTION_PATTERN; US_DIALECT_PATTERN; GB_DIALECT_PATTERN
Description type Specifying what type of description it is i.e. is it fully qualified or regular and etc. DESCRIPTION_PATTERN
Descriptum The semantic meaning for the EL++ patterns EL_PLUS_PLUS_INFERRED_AXIOMS_PATTERN; EL_PLUS_PLUS_STATED_AXIOMS_PATTERN
Development path A path that specifies that the components are currently under development PATH
Dialect Specifies the dialect of the language. MODEL_CONCEPT
DiGraph field A display field that references a di-graph whose edges are ordered pairs of vertices. Each edge can be followed from one vertex to another vertex. DISPLAY_FIELDS
Display Fields Captures the human readable terms PATTERN
DiTree field A display field that references a graph obtained from an undirected tree by replacing each undirected edge by two directed edges with opposite directions. DISPLAY_FIELDS
EL++ Inferred Axioms Pattern PATTERN
EL++ Inferred terminological axioms EL_PLUS_PLUS_INFERRED_AXIOMS_PATTERN
EL++ Stated Axioms Pattern PATTERN
EL++ Stated terminological axioms EL_PLUS_PLUS_STATED_AXIOMS_PATTERN
English Dialect Specifies the dialect of theEnglish language DIALECT
English Language Value for description language LANGUAGE
Existential restriction Existential restrictions describe objects that participate in at least one relationship along a specified property to objects of a specified class. ROLE_OPERATOR
Float field Represents values as high-precision fractional values. DISPLAY_FIELDS
Fully qualified name description type Fully qualified name is a description that uniquely identifies and differentiates it from other concepts with similar descriptions DESCRIPTION_TYPE
GB Dialect Pattern Particular form of language specific form of English language, particular to Great Britian PATTERN
Great Britian English dialect Great Britain: English Langauge reference set ENGLISH_DIALECT; GB_DIALECT_PATTERN
Identifier Pattern An identifier pattern is used to identity a concept which containts the identifier source and the actual value PATTERN
Identifier Source An identifier used to label the identity of a unique component. IDENTIFIER_PATTERN
Inactive Status Concept used to represent a status for components that are no longer active STATUS_VALUE
Inferred Navigation The origins and destinations for concepts based on the reasoner generated inferred terminological axioms INFERRED_NAVIGATION_PATTERN
Inferred Navigation Pattern A pattern specifying the origins and destinations for concepts based on the inferred terminological axioms PATTERN
Integer Field data type that represents some range of mathematical integers DISPLAY_FIELDS
Integrated Knowledge Management
Is-a Designates the parent child relationship INFERRED_NAVIGATION_PATTERN; STATED_NAVIGATION_PATTERN
Language Specifies the language of the description text. DESCRIPTION_PATTERN
Language for Description The semantic value indicating which language is used in the description text DESCRIPTION_PATTERN
Logical Definition The semantic value describing the purpose of the stated and inferred terminological axioms. EL_PLUS_PLUS_INFERRED_AXIOMS_PATTERN; EL_PLUS_PLUS_STATED_AXIOMS_PATTERN
Master path A default path for components PATH
Meaning The interpretation or explanation field for a pattern/semantics PATTERN
Model concept ROOT_VERTEX
Module ROOT_VERTEX
Necessary set A set of relationships that is always true of a concept. A concept that only contains necessary conditions is considered primitive EL_PLUS_PLUS_STATED_TERMINOLOGICAL_AXIOMS; EL_PLUS_PLUS_INFERRED_TERMINOLOGICAL_AXIOMS
Path A set of assets under version control that can be managed distinctly from other assets. Paths “branch” from other paths when established, and can be “merged” with other paths as well. ROOT_VERTEX
Pattern A Pattern is a set of display fields and values that can be asserted about a concept ROOT_VERTEX
Preferred Specifies that a description is preferred within a language or dialect. There will be one preferred description for each description type. DESCRIPTION_ACCEPTABILITY
Primordial module MODULE
Primordial path PATH
Primordial status Concept used to represent a status for components that have not yet been released and exist in their most basic form. STATUS_VALUE
Purpose The reason for which a Tinkar value in a pattern was created or for which it exist. PATTERN
Regular name description type There may be descriptions/synonyms marked as “regular.” DESCRIPTION_TYPE
Relationship destination Signifies path to child concepts which are more specific than the Tinkar term INFERRED_NAVIGATION_PATTERN; STATED_NAVIGATION_PATTERN
Relationship origin Signifies path to parent concepts which are more general than the Tinkar term INFERRED_NAVIGATION_PATTERN; STATED_NAVIGATION_PATTERN
Role Is an abstract representation of a high-level role for a therapeutic medicinal product; the concepts are not intended to describe a detailed indication for therapeutic use nor imply that therapeutic use is appropriate in all clinical situations. ROLE_GROUP; EL_PLUS_PLUS_STATED_TERMINOLOGICAL_AXIOMS; EL_PLUS_PLUS_INFERRED_TERMINOLOGICAL_AXIOMS
Role group An association between a set of attribute or axiom value pairs that causes them to be considered together within a concept definition or postcoordinated expression. EL_PLUS_PLUS_STATED_TERMINOLOGICAL_AXIOMS; EL_PLUS_PLUS_INFERRED_TERMINOLOGICAL_AXIOMS
Role operator Concept that is used to describe universal vs existential restrictions. ROLE
Role restriction ROLE
Role type Refers to a concept that represents a particular kind of realtionship that can exist between two entities. It defines the specific function or responsibility that one entity plays in relation to another. ROLE
Sandbox path A path for components under testing. PATH
Spanish langauge Value for the description language dialect LANGUAGE
Stated Navigation The origins and destinations for concepts based on the reasoner generated stated terminological axioms STATED_NAVIGATION_PATTERN
Stated Navigation Pattern A pattern specifying the origins and destinations for concepts based on the stated terminological axioms PATTERN
Status value The status of the STAMP Coordinate(Active, Cancelled, Inactive, Primordial) ROOT_VERTEX
String a sequence of characters, either as a literal constant or as a variable. Strings could be used to represent terms from code systems or URLs, textual definitions, etc. DISPLAY_FIELDS
Sufficient set A set of relationships that differentiate a concept and its subtypes from all other concepts. A concept that contains at least one set of necessary and sufficient conditions is considered defined. EL_PLUS_PLUS_STATED_TERMINOLOGICAL_AXIOMS; EL_PLUS_PLUS_INFERRED_TERMINOLOGICAL_AXIOMS
Text for description Captures the human readable text for a description in Komet DESCRIPTION_PATTERN
United States of America English dialect Particular form of language specific form of English language, particular to US ENGLISH_DIALECT; US_DIALECT_PATTERN
Universal Restriction Universal restrictions constrain the relationships along a given property to concepts that are members of a specific class. ROLE_OPERATOR
UNIVERSALLY_UNIQUE_IDENTIFIER A universally unique identifier that uniquely represents a concept in Tinkar IDENTIFIER_SOURCE
US Dialect Pattern PATTERN
Withdrawn state Concept used to represent a status for components that are withdrawn. STATUS_VALUE

[15] Abstract syntax tree. Wikipedia; 2020. Available from: https://en.wikipedia.org/wiki/Abstract_syntax_tree.