API for the Exchange of Medicinal Product Information (APIX)
0.1.0 - ci-build
API for the Exchange of Medicinal Product Information (APIX), published by Gravitate Health Project. This guide is not an authorized publication; it is the continuous build for version 0.1.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/cander2/recon-ig/ and changes regularly. See the Directory of published versions
The API Exchange of Medicinal Product Information (APIX) enables seamless exchange of regulatory submissions between stakeholders, including regulatory authorities (e.g., global health agencies) and pharmaceutical companies. To manage the high volume of submission data—such as electronic Product Information (ePI), Pharmaceutical Quality Information (PQI), and clinical datasets—APIX employs a streaming solution using Apache Kafka and JSON streaming. This page outlines the streaming architecture, its integration with FHIR Transaction Bundles, and implementation guidance for processing large-scale regulatory data in FHIR compliant JSON format.
This solution supports the high-throughput needs of regulatory workflows, handling thousands of daily submissions (e.g., ~100GB compressed JSON) and enabling real-time validation, storage, and querying. It aligns with global trends toward modern data formats and streaming architectures in regulatory exchange.
The APIX streaming solution leverages Apache Kafka, a distributed streaming platform, to process FHIR Transaction Bundles and Dataset-JSON payloads in real-time. Kafka’s high-throughput, fault-tolerant design ensures scalability and reliability for diverse regulatory and pharma workflows.
APIX-submissions
for FHIR Bundles, APIX-notifications
for status updates).APIX-submissions
topic.APIX-notifications
.APIX-submissions
: FHIR Bundles and Dataset-JSON.APIX-validation-errors
: Failed validations.APIX-notifications
: Submission status.ijson
for parsing large JSON datasets.JSONStream
for incremental processing.from kafka import KafkaProducer
import json
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# FHIR Bundle (simplified ePI)
bundle = {
"resourceType": "Bundle",
"type": "document",
"id": "ePI-123",
"entry": [
{
"resource": {
"resourceType": "Composition",
"status": "final",
"title": "ePI for [Medicinal Product]"
}
}
]
}
producer.send('APIX-submissions', bundle)
producer.flush()
schema = { "type": "object", "properties": { "resourceType": {"const": "Bundle"}, "type": {"const": "document"} }, "required": ["resourceType", "type"] }
consumer = KafkaConsumer( 'APIX-submissions', bootstrap_servers=['kafka:9092'], value_deserializer=lambda v: json.loads(v.decode('utf-8')) )
for message in consumer: try: validate(instance=message.value, schema=schema) print(f"Valid Bundle: {message.value['id']}") # Store in database except Exception as e: print(f"Validation error: {e}") producer.send('APIX-validation-errors', {"error": str(e), "bundle": message.value})
#### Performance
- **Throughput**: A 5-node Kafka cluster processes ~1GB/s, handling 100GB/day (typical compressed JSON volume for large regulatory workflows) in ~100 seconds.
- **Latency**: Sub-second validation and storage with parallel consumers.
- **Scalability**: Add brokers or partitions to manage peak submission periods (e.g., annual renewals).
#### Dataset-JSON Support
- Clinical datasets use CDISC Dataset-JSON, streamed as NDJSON for compatibility with clinical trial submissions.
- Validation leverages CDISC JSON Schema, integrated with the Schema Registry.
- Example: Stream SDTM dataset rows for real-time validation or analysis.
#### Security and Compliance
- **Encryption**: Use HTTPS for API endpoints and TLS for Kafka communication to protect sensitive data.
- **Validation**: Enforce JSON Schema to comply with international standards (e.g., HL7 FHIR, CDISC, ICH).
- **Access Control**: Implement OAuth2 or mutual TLS for producer/consumer authentication.
- **Auditability**: Log all events with Provenance resources; retain logs to meet regulatory audit requirements (e.g., 21 CFR Part 11, EU GMP Annex 11).
- **Data Integrity**: Use Kafka’s exactly-once semantics to ensure no loss or duplication of submissions.
#### Deployment Considerations
- **Cloud**: Deploy on managed services (e.g., AWS MSK, Confluent Cloud, Azure Event Hubs) or on-premises for flexibility.
- **Databases**: Use Elasticsearch for real-time querying or MongoDB for JSON storage.
- **Monitoring**: Prometheus for metrics (e.g., submission rate, error rate); ELK stack or similar for logging JSON payloads.
- **Legacy Integration**: Support legacy formats (e.g., SAS XPT) by converting to Dataset-JSON using open-source tools from CDISC initiatives.
#### Example Use Case
**Scenario**: A pharmaceutical company submits an ePI Bundle and a Dataset-JSON clinical dataset to a regulatory authority via APIX.
1. **Submission**: The company’s portal publishes both to APIX-submissions as NDJSON.
2. **Validation**: A consumer validates against APIX Bundle profile and CDISC schema.
3. **Storage**: Valid data is indexed in Elasticsearch for querying.
4. **Notification**: Submission status (accepted or rejected) is published to APIX-notifications.
5. **Querying**: Regulators or sponsors query Elasticsearch for the ePI or dataset.
**Sample NDJSON:**
```ndjson
{"resourceType":"Bundle","type":"document","id":"ePI-123","entry":[{"resource":{"resourceType":"Composition","title":"ePI for [Medicinal Product]"}}]}
{"datasetId":"TRIAL-456","type":"SDTM","data":{"rows":[...]}}