CMS FHIR Prototype Measure Calculation Tool IG
0.1.0 - CI Build United States of America flag

CMS FHIR Prototype Measure Calculation Tool IG, published by HL7 International - [Some] Work Group. This guide is not an authorized publication; it is the continuous build for version 0.1.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/cqframework/mct-ig/ and changes regularly. See the Directory of published versions

Test Plan

Test Plan

This page documents the test plan for the Measure Calculation Tool (MCT) prototype. The test plan is intended to demonstrate:

  1. Functionality of the Measure Calculation Tool and validation, certification, and testing content
  2. Correctness of a provider implementation of the Measure Calculation Tool

Test data was prepared by constructing random datasets based on the data requirements for the specific measure under test. Although this is a reasonable approach to functional testing, it is not necessarily representative of real-world test data. Additional testing should be performed using:

  1. Larger data sets
  2. More sophisticated data generation techniques, such as the Synthea tool
  3. Real-world data using deidentified data sets

Whenever possible, automated testing approaches should be used to enable more streamlined testing of the measure calculation tool. For tests that are intended to demonstrate functionality of the prototype, this automation can be accomplished using continuous integration and delivery pipelines. For tests that are intended to demonstate validity and capability of an integration, this automation can be accomplished through integration testing tools such as Postman.

Content Tests

These tests are performed as part of prototype development and testing to ensure that the measure content for the Validation Measure and for CMS104 is correctly evaluating given known input data.

NOTE: These tests cover proportion measure calculation only. Other calculation features would need to be tested specifically, including: ratio, continuous-variable, and composite calculation and stratifiers.

  1. Test Validation Measure
    1. Test data is present for each data element
      1. Ineligible - data is missing and the validation result indicates it is
      2. Invalid - data is present but invalid for each data element and the validation result provides validation messages
      3. Valid - data is present for each data element
    2. Test measure score is successful for
      1. Ineligible
      2. Initial population
      3. Denominator
      4. Denominator Exception
      5. Denominator Exclusion
      6. Numerator
  2. Test CMS104
    1. Test data is present for each data element
      1. Ineligible - data is missing and the validation result indicates it is
      2. Invalid - data is present but invalid for each data element and the validation result provides validation messages
      3. Valid - data is present for each data element
    2. Test measure score is successful for
      1. Ineligible
      2. Initial population
      3. Denominator
      4. Denominator Exception
      5. Denominator Exclusion
      6. Numerator

NOTE: The content unit tests are all patient-specific, rather than population level. Population level testing is performed as part of integration tests.

Content Data Elements

The Validation/Certification measure contains expressions to support validation of all QICore profiles. However, this prototype is focusing on the data elements involved in the CMS104 Measure:

  1. Encounter: Non-Elective Inpatient Encounter
  2. Condition: Diagnosis per Encounter
  3. ServiceRequest: Comfort Measures
  4. Procedure: Comfort Measures
  5. MedicationRequest: Antithrombotic Therapy
  6. MedicationRequest: Pharmacological Contraindications For Antithrombotic Therapy
  7. MedicationNotRequested: Antithrombotic Therapy

Integration Tests

These tests are performed as part of prototype development and testing to ensure that the Measure Calculation Tool is performing as expected in the prototype environment with known configuration and input data served through expected server behavior.

  1. Test CCN Configuration
    1. Validate the MeasureReport is produced with the configured CCN identifier
  2. Test Organization/Facility Configuration
    1. Validate the MeasureReport is produced with the configured reporter Organization, and location extensions for each configured facility
  3. Test Validation Measure
    1. Test data is present for each data element
    2. Test missing data produces expected validation messages
    3. Test invalid data produces expected validation messages
    4. Test measure score is successful for each test case (1..7)
  4. Test CMS104
    1. Test data is present for each data element
    2. Test missing data produces expected validation messages
    3. Test invalid data produces expected validation messages
    4. Test measure score is successful for each test case (1..7)

Validation Tests

These tests are performed at an implementing site to ensure that the prototype is installed and configured correctly and that it performs as expected within the site environment.

  1. Test Validation Measure Data
    1. MeasureReport has the correct CCN
    2. MeasureReport has the correct reporter Organization
    3. MeasureReport has the correct reported Location(s)
    4. MeasureReport has data for each element
    5. MeasureReport has expected validation messages for missing data
    6. MeasureReport has expected validation messages for invalid data
  2. Test Validation Measure Calculation
    1. MeasureReport has expected population count and score for each population test (1..7)
    2. MeasureReport has expected supplemental data
  3. Test Validation Measure Submission
    1. Validate submitted MeasureReport has correct:
      1. CCN
      2. Organization
      3. Reported location(s)
    2. Validate submitted MeasureReport has expected population count and score for each population (1..7)
    3. Validate submitted MeasureReport has expected data references
    4. Validate all expected data is submitted
    5. Validate no unexpected data is submitted

Submission Tests

These tests are performed at an implementing site to demonstrate calculation and submission of the CMS104 measure.

  1. Test CMS104 Measure Data
    1. MeasureReport has data for each element
    2. MeasureReport has expected validation messages for missing data
    3. MeasureReport has expected validation messages for invalid data
  2. Test CMS104 Measure Calculation
    1. MeasureReport has expected population count and score for each population (1..7)
    2. MeasureReport has expected supplemental data
  3. Test CMS104 Measure Submission
    1. Validate submitted MeasureReport has expected population count and score for each population (1..7)
    2. Validate submitted MeasureReport has expected data references
    3. Validate all expected data is submitted
    4. Validate no unexpected data is submitted

Performance Tests

These tests are performed as part of prototype development and testing and provide baseline performance characteristics in a known solution environment.

  1. Test Validation Measure Evaluation Performance
    1. Unit Test - 1, 10, 50, 100, and 200 Patients
    2. Integration Test - 1, 10, 50, 100, and 200 Patients
  2. Test CMS104 Measure Evaluation Performance
    1. Unit Test - 1, 10, 50, 100, and 200 Patients
    2. Integration Test - 1, 10, 50, 100, and 200 Patients

CMS104 Measure Evaluation Performance

The following is an analysis of the measure evaluation performance of the prototype using the CMS104 measure as the subject. For this analysis, the following three processes will be profiled:

  1. Gathering the patient data
  2. Validating the patient data gathered in step 1
  3. Evaluating the measure referencing the data gathered in step 1
Gathering Patient Data

The first step of gathering the patient data includes an analysis of the data requirements for the measure. The data requirements identify the resources and data elements used to evaluate the measure logic. The prototype uses the data requirements to generate FHIR REST queries, which are then executed across the specified facilities registered with an organization.

Validating Patient Data

The data validation step operates on the gathered patient data to ensure that the data adheres to a specified set of profiles (in this case QiCore version 4.1.1). Inconsistencies with the gathered patient data and the specified profiles are documented within the patient data as contained resources. Any missing data requirements will also be documented within the returned patient data bundle (see the $gather operation specification for more information).

Evaluating the Measure

The measure evaluation occurs on both a patient-level and population-level. The prototype is testing a proportion measure. The result of the evaluation returns individual and population reports detailing population group membership, a measure score, and the resources that were used during evaluation.

Methodology

The prototype operates on a linear scale. Meaning each of the processes outlined above are evaluated sequentially for each patient. Therefore, as the population or resources within that population (i.e. patients and/or patient resources) increase, the time to evaluate will also increase.

The prototype was profiled using populations sizes of 1, 10, 50, 100, and 200 patients (test cases) in order to provide a reasonable representation of the linear scaling and represent several measure population groupings (i.e. simulate a real-world population). The patient data is randomly generated with adherence to certain requirements. The requirements include:

  • Each measure population group (Ineligible, Initial population, Denominator, Denominator Exception, Denominator Exclusion, and Numerator) must be represented whenever possible.
    • For the single patient population, a Numerator population group was profiled.
  • The population should have ~60% success rate for the Numerator measure population group.
  • The population should have ~80% success rate for the Initial population measure group.
  • The population must use valid patient data for the measure.
    • Some profile validation errors should appear for full coverage profiling, but those errors must not coincide with the data elements required to evaluate the measure.
Metrics

Each population set was randomly generated 100 times and profiled recording the average runtime for each process in the following table.

Number of Test Cases Combined Measure Evaluation Patient Data Queries Validation
1 01.113 00.657 00.401 00.056
10 08.623 05.088 03.104 00.431
50 43.477 25.651 15.652 02.174
100 01:24.834 50.052 30.540 04.242
200 02:44.587 01:37.106 59.251 08.229

CMS104 Performance Graph

The following chart displays the runtime distribution for each of the profiled processes:

Performance Enhancements

Although the prototype could be implemented as-is and perform reasonably well for smaller populations, it is not currently recommended as an enterprise-level solution. In order to scale the prototype for enterprise use, there are several enhancements that could be implemented to improve the overall performance and user experience including, but not limited to:

  • Using parallel programming to carry out various processes simultaneously.
    • Could vastly improve performance when gathering patient data across multiple facilities.
    • Could enable evaluating multiple measures across multiple populations.
  • Using asynchronous programming to reduce/eliminate the limitations of sequential processing.
    • Asynchronous programming is non-blocking, meaning the program does not have to wait for the process to finish before performing other tasks.
    • Would be very impactful when processing large populations.
    • Would allow the user to perform other tasks while the measure is being evaluated.
  • Using the FHIR Bulk Data API to gather the patient data.
    • Patient data retrieval would be vastly improved, especially for facilities with large datasets.
CMS104 Test Cases

The following table outlines example test cases for each measure population group and the expected result the prototype should produce.

Population Group Test Case Expected Result
Ineligible Ineligible Test Bundle Ineligible Result
Initial Population Initial Population Test Bundle Initial Population Result
Denominator Denominator Test Bundle Denominator Result
Denominator Exception Denominator Exception Test Bundle Denominator Exception Result
Denominator Exclusion Denominator Exclusion Test Bundle Denominator Exclusion Result
Numerator Numerator Test Bundle Numerator Result

The following table provides larger test data sets to provide facility-level testing. Two facilities are provided to facilitate both single-facility report testing and aggregate report testing

Facility Test Bundle Expected Result
Facility A Facility A Bundle Facility A Result
Facility B Facility B Bundle Facility B Result
Facility A & Facility B   Aggregate Result