Requirements Federated Learning and mUlti-party computation Techniques for prostatE cancer
0.1.0 - ci-build
Funded by the European Union

Requirements Federated Learning and mUlti-party computation Techniques for prostatE cancer, published by HL7 Europe. This guide is not an authorized publication; it is the continuous build for version 0.1.0 built by the FHIR (HL7® FHIR® Standard) CI Build. This version is based on the current content of https://github.com/hl7-eu/flute-requirements/ and changes regularly. See the Directory of published versions

Artifacts Summary

This page provides a list of the FHIR artifacts defined as part of this implementation guide.

Example: Example Instances

These are example instances that show what data produced and consumed by systems conforming with this implementation guide might look like.

Device
F-HUF-1

Researcher node has jupyterlab interface.

F-HUF-10

Support for federated Grid Optimization.

F-HUF-11

Support for federated Generative Adversarial Networks (GAN).

F-HUF-12

Support for federated Variational auto-encoders (VAE).

F-HUF-13

Support for federated Diffusion models.

F-HUF-14

Support for at least one effective federated synthetic data generator learner (GAN, VAE, or DiffMod).

F-HUF-15

Support for multi-model synthetic health data (both tabular & image).

F-HUF-16

Synthetic data generation module should allow for specifying what data (images, tabular …) should be generated.

F-HUF-17

Synthetic data generation module should allow for specifying population subsets, e.g., only with cancer.

F-HUF-18

Generation of synthetic 3D MRI images.

F-HUF-19

Data owner node has functional interface with local data owner database.

F-HUF-2

Researcher node offers all features provided by TRUMPET researcher node.

F-HUF-20

Data owner node has user interface for data owner users.

F-HUF-21

Data owner node has server interfacing with other nodes.

F-HUF-3

Support for federated Logistic Regression (LR).

F-HUF-4

Support for federated Decision Trees (DT).

F-HUF-5

Support for federated Random Forests (RF).

F-HUF-6

Support for federated Support Vector Machines (SVM).

F-HUF-7

Support for federated Deep Neural Networks (DNN).

F-HUF-8

Support for federated Convolutional Neural Networks (CNN).

F-HUF-9

Support for federated Bayesian Optimization.

F-IMSD-1

SD algorithm shall offer a CSV file with the required number of instances of tabular data and each column should be in the expected format (i.e., categorical, numerical etc.).

F-IMSD-10

An option for users to save hyperparameters in draft and apply them at later time.

F-IMSD-11

Images input and outputs will be in DICOM format.

F-IMSD-12

SD algorithm shall have the ability to save a Database (DB) with current CSV file and previous CSVs files proposed.

F-IMSD-13

SD shall incorporate more than one SD algorithm to perform calculations based on customer choice.

F-IMSD-14

Data imputation should be considered when historical data is not available, and there is uncertainty or bad quality in the data.

F-IMSD-15

SD module will have a trained machine to generate synthetic data from new repositories shared by users.

F-IMSD-16

SD algorithm shall take into account that training SD generation can suppose a long waiting time.

F-IMSD-17

SD shall be implemented so that future modular extensions can be added.

F-IMSD-2

SD algorithm shall offer the possibility to modify some hyper-parameters and GUI shall offer a value reset option to set hyperparameters to their default value.

F-IMSD-3

Synthetic data should be evaluated using various methods and tools.

F-IMSD-4

Synthetic Images should be evaluated using various methods and tools including human expert validation.

F-IMSD-5

Ability to create error message when error occurs.

F-IMSD-6

A range of conditions can be forced for some features when synthetic tabular data is generated.

F-IMSD-7

Ability to add structured data by the user.

F-IMSD-8

SD algorithm shall offer a modular structure where each parameter is a module capable of being available or disable.

F-IMSD-9

SD should take into account that new users will probably need to change units or convert initial data according to specified standards.

F-PIL-1

mpMRI/bpMRI shall be performed within 1 year prior to the prostate biopsy.

F-PIL-10

Reduced sample datasets with the 7 clinical variables and associated images shall be shared to generate synthetic images and algorithms.

F-PIL-11

No sensitive information shall can be revealed from exchanged messages (aggregates, models statistics, etc.) between users.

F-PIL-12

Access to local FLUTE nodes shall be Controlled/restricted.

F-PIL-13

Platform shall allow AI developers to train their models in accordance with their legal requirements and document such training.

F-PIL-14

The training requests sent to the FLUTE nodes shall specify the minimum/maximum resources needed to be executed.

F-PIL-15

The performance of the model shall be higher than the BCN1 and BCN2 models.

F-PIL-16

AI researchers shall be able to discover and select the datasets registered at each FLUTE node and obtain descriptive statistics about the datasets.

F-PIL-17

Jupyter notebooks shall be integrated with the FLUTE functionalities to ensure discoverability of the datasets.

F-PIL-18

Platform shall allow the AI researchers to search for the relevant dataset.

F-PIL-19

Platform shall provide space to add guidance documents and instructions on how to use the Platform and the datasets.

F-PIL-2

Cohorts shall consist of men with clinical suspicion of PCa based on a PSA > 3.0 ng/ml and/or abnormal DRE.

F-PIL-20

Platform shall allow authentication of authorized individuals from Data owners and Data Users and varied level of access, based on their defined roles.

F-PIL-21

Platform shall keep record of Data Users and Data owners and logs details of their activity in the Platform.

F-PIL-22

Platform shall ensure that the training data remains on the federated node and any processing, analysis and AI training is performed there. Data User shall not see, directly access or download the data, i.e. the AI model shall only be trained in the local node.

F-PIL-23

There shall be a security check of the uploaded AI model prior to its deployment in the FLUTE data.

F-PIL-24

AI models BCN1/BCN2 trained through the platform shall be packaged into software components and deployed at the clinical sites involved in validation activities.

F-PIL-25

The platform SHALL be able to generate synthetic data for mpMRI, bpMRI and tabular data for BCN1 and BCN2 case series.

F-PIL-26

The platform SHALL be able to train BCN1 and BCN2 models from an augmented/balanced datasets thanks to synthetic data.

F-PIL-3

Lesions detected in mpMRI/bpMRI shall have to be reported using the Prostate Imaging-Report and Data System (PI-RADS) in version 2.0 or higher.

F-PIL-4

Prostate biopsies shall be systematic and targeted in cases of PI-RADS ≥3 lesions.

F-PIL-5

The platform shall define the methodology to extract/load/transform data from clinical databases and data warehouse into the FLUTE data node.

F-PIL-6

Input data shall be anonymized or pseudonymized.

F-PIL-7

Clinical data and MRI (both raw and processed) shall be linked.

F-PIL-8

MRI imaging study shall comply with specific requirements of QP-Prostate tool provided for FLUTE project.

F-PIL-9

Data shall be labelled with class csPCa 0 or 1.

F-SRS-1

Platform should provide secure methods to access the system like multi-factor authentication.

F-SRS-10

FLUTE platform should allow the user to select whether the central aggregator has clear access to the local models.

F-SRS-2

Access to different platform features should be role-based.

F-SRS-3

User sessions should time out after a period of inactivity.

F-SRS-4

FLUTE platform should allow to select which protection techniques are using in a training.

F-SRS-5

Local training algorithms should be run in the data owner infrastructure.

F-SRS-6

Local trained models should be sent to aggregator using TLS.

F-SRS-7

Data owners should be able to select which fields of their data sets can be used for model training.

F-SRS-8

FLUTE platform should log every use of the data.

F-SRS-9

FLUTE platform should initiate a local training when the data owner provides consent to use the data to that study.

F-STD-1

The FLUTE project SHOULD use the HL7 FHIR standard whenever possible.

F-STD-10

The FLUTE project SHOULD explore the possibility to model AI models using the HL7 standards FHIR and/or CQL.

F-STD-2

The FLUTE project SHOULD use SNOMED CT, LOINC and UCUM terminologies whenever possible.

F-STD-3

The FLUTE project SHOULD use DICOMweb (DICOM) for imaging evidences.

F-STD-4

A conceptual/logical model of the data that has to be exchanged SHALL be specified.

F-STD-5

Privacy Policies SHOULD be modelled and exchanged using the HL7 FHIR standard.

F-STD-6

Permission to access healthcare Data SHOULD be modelled and exchanged using the HL7 FHIR Permission resource.

F-STD-7

The prediction of whether a biopsy is need SHOULD be modelled and exchanged using the HL7 FHIR standard.

F-STD-8

The FHIR exchange capabilities of each system SHALL be modelled and exchanged using the HL7 FHIR CapabilityStatement resource.

F-STD-9

Each Hospital SHALL expose its non-imaging data using the HL7 FHIR standard.

FLUTE Administrator
FLUTE Platform
NF-HUF-1

All output preserves privacy.

NF-HUF-2

All algorithm implementations should follow the platform guidelines (adopted & revised from TRUMPET), e.g., on privacy/security parameters.

NF-IMSD-1

SD algorithm shall take into account that training SD generation can suppose a long waiting time.

NF-IMSD-2

SD shall be implemented so that future modular extensions can be added.

NF-IMSD-3

Synthetic data maintains data privacy and cannot correlate to patient data.

NF-IMSD-4

Synthetic data used in combination with real data (data augmentation) improves the prediction performance of the algorithms trained using only real data.

NF-IMSD-5

SD GUI shall be able to run several queries simultaneously to reduce total time.

NF-IMSD-6

A user manual and helping description must be provided.

NF-IMSD-7

SD GUI shall incorporate an internal counter which will be in charge of recording the amount of use the customer is making to allow a possible pay per use subscription method.

NF-PIL-1

Units shall be harmonized.

NF-PIL-10

Platform shall monitor the use of the data in the Platform, to detect potential misuse. It shall implement measures for detection of data breaches and potential privacy threats/leaks.

NF-PIL-11

Platform shall ensure that the pseudonymized data can be amended or withdrawn after its sharing, if the data subject (patient) requests the modification.

NF-PIL-2

A common (FHIR) data model shall be defined to represent the clinical data used in the study.

NF-PIL-3

The platform shall provide validators that check whether the clinical data pushed to the local node complies with the common data model.

NF-PIL-4

The data shall be standardized to a common (FHIR) data model before ingestion into the FLUTE local node.

NF-PIL-5

Cryptographic methods like homomorphic encryption and differential privacy shall be used to aggregate statistics about the cohort without disclosing (leaking) sensitive data outside the local node.

NF-PIL-6

Platform shall keep and display FLUTE data catalogue with defined basic metadata that characterizes the datasets available through the FLUTE Platform.

NF-PIL-7

Platform shall display terms and conditions of use (T&C) and Privacy policy.

NF-PIL-8

Platform shall display the conditions of the use of each of the datasets, as specified by its owner or the data hub.

NF-PIL-9

Datasets which are not defined as open to all users of the platform, shall only be available to uses which request access to the dataset and are permitted to use it by the data owner or data hub.

NF-SRS-1

Platform should have password policies.

NF-SRS-2

FLUTE Platform should implement several PETs to protect data privacy.

NF-SRS-3

Administrators of FLUTE platform should keep the systems up-to-date and patched.

NF-SRS-4

There should be security policies to avoid the use of potentially vulnerable software.

NF-SRS-5

FLUTE Platform should guarantee data is not tampered with in training processes.

NF-STD-1

The definition of the different actors of the platform SHOULD be modelled and exchanged using the HL7 FHIR ActorDefinition resource.

NF-STD-2

The definition of the different requirements of the platform SHOULD be modelled and exchanged using the HL7 FHIR Requirement resource.

NF-STD-3

The example scenarios of the platform usage SHOULD be modelled and exchanged using the HL7 FHIR ExampleScenario resource.

NF-STD-4

The testing of the different requirements of the platform SHOULD be modelled and exchanged using the HL7 FHIR TestScript, TestPlan and TestReport resource.

URS-1

Data should never leave data owner infrastructure.

URS-10

All the federated learning processes should be logged to be able to conduct an audit in case of a security incident.

URS-11

The system should provide consent management mechanisms.

URS-12

The exchange of data between data owner nodes and central aggregator should follow the principle of data minimization. Only sharing the necessary data to be able to train models effectively.

URS-13

FLUTE platform should be compliant with regulations.

URS-14

FLUTE platform should provide privacy in a semi-honest threat model (honest but curious parties).

URS-15

FLUTE platform should provide privacy in a threat model with malicious parties.

URS-2

Central aggregation of models should not leak any information of the data used to train local models.

URS-3

Access to the platform should be protected by a secure login with multi-factor authentication.

URS-4

Communication between system nodes should be encrypted.

URS-5

Personal and sensitive data should not be used in the model training. In case it is required it should be properly protected, for example, with anonymization.

URS-6

Users whose data is part in the training of a model should be protected to data reconstruction attacks.

URS-7

Users whose data is part in the training of a model should be protected to membership inference attacks.

URS-8

Users whose data is part if the training of a model should be protected to property inference attacks.

URS-9

Devices used in the Federated Learning process must be secure, regularly patched and protected against malware and other vulnerabilities.

User