FHIR-PYrate: a data science friendly Python package to query FHIR servers

doi:10.1186/s12913-023-09498-1

. 2023 Jul 6;23(1):734.

doi: 10.1186/s12913-023-09498-1.

FHIR-PYrate: a data science friendly Python package to query FHIR servers

René Hosch^#^{1

2}, Giulia Baldini^#^{3

4}, Vicky Parmar^{1

2}, Katarzyna Borys^{1

2}, Sven Koitka^{1

2}, Merlin Engelke^{1

2}, Kamyar Arzideh^{2

5}, Moritz Ulrich^{2

5}, Felix Nensa^{1

2}

Affiliations

¹ Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.
² Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany.
³ Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany. giulia.baldini@uk-essen.de.
⁴ Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany. giulia.baldini@uk-essen.de.
⁵ Central IT Department, Data Integration Center, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.

^# Contributed equally.

PMID: 37415138
PMCID: PMC10326955
DOI: 10.1186/s12913-023-09498-1

FHIR-PYrate: a data science friendly Python package to query FHIR servers

René Hosch et al. BMC Health Serv Res. 2023.

. 2023 Jul 6;23(1):734.

doi: 10.1186/s12913-023-09498-1.

Authors

René Hosch^#^{1

2}, Giulia Baldini^#^{3

4}, Vicky Parmar^{1

2}, Katarzyna Borys^{1

2}, Sven Koitka^{1

2}, Merlin Engelke^{1

2}, Kamyar Arzideh^{2

5}, Moritz Ulrich^{2

5}, Felix Nensa^{1

2}

Affiliations

¹ Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.
² Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany.
³ Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany. giulia.baldini@uk-essen.de.
⁴ Institute for Artificial Intelligence in Medicine, University Hospital Essen, Girardetstraße 2, Essen, 45131, Germany. giulia.baldini@uk-essen.de.
⁵ Central IT Department, Data Integration Center, University Hospital Essen, Hufelandstraße 55, Essen, 45147, Germany.

^# Contributed equally.

PMID: 37415138
PMCID: PMC10326955
DOI: 10.1186/s12913-023-09498-1

Abstract

Background: We present FHIR-PYrate, a Python package to handle the full clinical data collection and extraction process. The software is to be plugged into a modern hospital domain, where electronic patient records are used to handle the entire patient's history. Most research institutes follow the same procedures to build study cohorts, but mainly in a non-standardized and repetitive way. As a result, researchers spend time writing boilerplate code, which could be used for more challenging tasks.

Methods: The package can improve and simplify existing processes in the clinical research environment. It collects all needed functionalities into a straightforward interface that can be used to query a FHIR server, download imaging studies and filter clinical documents. The full capacity of the search mechanism of the FHIR REST API is available to the user, leading to a uniform querying process for all resources, thus simplifying the customization of each use case. Additionally, valuable features like parallelization and filtering are included to make it more performant.

Results: As an exemplary practical application, the package can be used to analyze the prognostic significance of routine CT imaging and clinical data in breast cancer with tumor metastases in the lungs. In this example, the initial patient cohort is first collected using ICD-10 codes. For these patients, the survival information is also gathered. Some additional clinical data is retrieved, and CT scans of the thorax are downloaded. Finally, the survival analysis can be computed using a deep learning model with the CT scans, the TNM staging and positivity of relevant markers as input. This process may vary depending on the FHIR server and available clinical data, and can be customized to cover even more use cases.

Conclusions: FHIR-PYrate opens up the possibility to quickly and easily retrieve FHIR data, download image data, and search medical documents for keywords within a Python package. With the demonstrated functionality, FHIR-PYrate opens an easy way to assemble research collectives automatically.

Keywords: Dataframe; Dicom; Electronic patient record; FHIR; Information extraction; Python.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig. 1**
Resource statistics at our institution. The plot shows the distribution of the 1,498,863,142 resources present on our FHIR server as of 2023–03-13. The most common resource is ServiceRequest, which is used for records of requests for procedures, diagnostics, or any other service. The second most common is Observation, which stores measurements like lab values, vital parameters but also small texts like clinical notes

**Fig. 2**
Overview of the tool’s role in the healthcare/machine learning picture. Presentation of an exemplary hospital infrastructure including multiple apps: a FHIR server, a DICOM Web capable PACS (or alternatively an app that handles the communication to the medical imaging storage), and various AI applications. Within the research domain, the FHIR-PYrate package handles the communication between the FHIR server and the PACS (DICOM Web). The package also helps with creating the data prerequisites needed for implementing machine learning solutions and for creating cohorts

**Fig. 3**
A schematic view of the package structure. The *Ahoy* class handles the user's authentication and creates an HTTP session, which the other classes will then use to interact with the FHIR server. The *Pirate* class handles the communication with the REST API, collects Bundles, and builds the *DataFrames*. The *Miner* class is to be used on a *DataFrame* (normally the result of a DiagnosticReport query), and it will output a report on whether a particular regular expression is present in a text, and in which particular parts of the documents. The *DICOMDownloader* class is used to download medical imaging scans in bulk and store information about them. The *Miner* and the *DICOMDownloader* class are connected with dashed lines, as they can be used as an optional step after retrieving the initial data

**Fig. 4**
Schematic view of the process within the *Pirate* class. The *Pirate* class can be used with any existing resource, and data retrieval processes remain unchanged. In this figure, the querying process for the pictured resources is always the same: First, a query has to be defined and run, then, a FHIR Bundle is returned, and finally, the Bundle is transformed into a *DataFrame*

**Fig. 5**
Retrieval of all CT studies for patients with scoliosis. Two FHIR-PYrate queries can be used to obtain all CT studies belonging to patients which suffer from scoliosis. The first query is for the Condition resource, while the second one is for the ImagingStudy resource. The request parameters for the FHIR server are specified using the request_params parameter. The df_constraints parameter is used to specify request parameters that should be constrained according to each row of the input *DataFrame*. The fhir_paths parameter selects which attribute of the resource should be returned

**Fig. 6**
Using processing functions. The steal_bundles_to_dataframe retrieves the required Observation resources that contain information about the blood pressure panel. Then, the obtained bundles are transformed to rows by using the get_blood_panel_info function. This function iterates through the entries of a bundle and collects the resource IDs, and then iterates through the component attribute, which contains multiple pieces of information about the blood panel status, in this case, the systolic and the diastolic blood pressure. Each component attribute contains a display name (a natural language name of what is being evaluated), a quantity (the actual measured value) and a unit of measurement. For each piece of information, the display name becomes the column header, the quantity becomes the value, and the unit is stored in an additional column. The table below the figure presents an example output for this query

**Fig. 7**
Including secondary resources. ImagingStudy resources also contain a reference to the Patient resource they belong to, and using the _include parameter the corresponding Patient resources can be imported in the output Bundle. The attributes that should be added to the final *DataFrame* can be specified with the fhir_paths parameter, where it is also possible to specify from which resource the attribute should come from, to ensure clarity. Usually, one row of the output *DataFrame* represents one entry of a Bundle. By specifying the merge_on parameter, the rows which have the same patient_id are merged, producing a *DataFrame* similar to the table below the figure

**Fig. 8**
An overview of the process to identify whether a text contains relevant sentences using the *Miner* class. First, if the document has a specific structure, all the information before a known keyword (e.g., “Findings”) is removed. Then, the sentences are identified using the SpaCy library. For each sentence, the input regular expression is matched against the text and if the sentiment of the sentence is not negative, the sentence is considered a match

**Fig. 9**
Example *Miner* Query. The *Miner* class is first initialized, then, a *DataFrame* containing the text documents in a specified column is processed using the nlp_on_dataframe function. However, the data may not be stored as readable text on the FHIR server (e.g. may be stored as HTML, encoded). For this purpose, a processing function to preprocess the text may be specified, in this case, decode_text

**Fig. 10**
Example *DICOMDownloader* Query. The *DICOMDownloader* class is initialized with the desired output format, and the data is downloaded to the desired output directory using the download_data_from_dataframe function

**Fig. 11**
An overview of the download of DICOM studies using the *DICOMDownloader*. A collection of StudyInstanceUID and SeriesInstanceUID is given as input to the DICOMDownloader (1), which uses them to communicate with a DICOMweb instance (2) and stores the results in a predefined folder (3). Additionally, two *DataFrames* are returned (3). The first one contains a list of all successfully downloaded series, while the second one has a list of all the failed series and the kind of error that was produced

**Fig. 12**
Data collection using FHIR-PYrate for breast cancer patients. The process starts by collecting patients with specific ICD-10 codes (1), the results are filtered (2), and then the clinical (3) and imaging data (4) is retrieved. The resulting *DataFrames* can also be merged to obtain the complete cohort information. The code retrieval is simplified by omitting the URL of each system (i.e., LOINC instead of "http://loinc.org")

See this image and copyright information in PMC

Cited by

Use of Real-World FHIR Data Combined with Context-Sensitive Decision Modeling to Guide Sentinel Biopsy in Melanoma.
Beckmann CL, Lodde G, Swoboda J, Livingstone E, Böckmann B. Beckmann CL, et al. J Clin Med. 2024 Jun 6;13(11):3353. doi: 10.3390/jcm13113353. J Clin Med. 2024. PMID: 38893064 Free PMC article.

References

1. Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE; 2013. p. 326–31. 10.1109/CBMS.2013.6627810.
1. Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29:1642–1653. doi: 10.1093/jamia/ocac105. - DOI - PMC - PubMed
1. Hehner S, Liese K, Loos G, Möller M, Schiegnitz S, Schneider T, et al. Die Digitalisierung in deutschen Krankenhäusern - eine Chance mit Milliardenpotenzial. McKinsey & Company Healthcare Practice. 2018. https://www.mckinsey.de/publikationen/digitalisierung-chance-mit-milliar.... Accessed 15 Aug 2022.
1. Dash P, Henricson C, Kumar P, Stern N. The hospital is dead, long live the hospital! McKinsey & Company Healthcare Systems and Services Practice. 2019. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-.... Accessed 15 Aug 2022.
1. Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc JAMIA. 2017;24:1142–1148. doi: 10.1093/jamia/ocx080. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE; 2013. p. 326–31. 10.1109/CBMS.2013.6627810.

[2] Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE; 2013. p. 326–31. 10.1109/CBMS.2013.6627810.

[3] Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29:1642–1653. doi: 10.1093/jamia/ocac105. - DOI - PMC - PubMed

[4] Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29:1642–1653. doi: 10.1093/jamia/ocac105. - DOI - PMC - PubMed

[5] Hehner S, Liese K, Loos G, Möller M, Schiegnitz S, Schneider T, et al. Die Digitalisierung in deutschen Krankenhäusern - eine Chance mit Milliardenpotenzial. McKinsey & Company Healthcare Practice. 2018. https://www.mckinsey.de/publikationen/digitalisierung-chance-mit-milliar.... Accessed 15 Aug 2022.

[6] Hehner S, Liese K, Loos G, Möller M, Schiegnitz S, Schneider T, et al. Die Digitalisierung in deutschen Krankenhäusern - eine Chance mit Milliardenpotenzial. McKinsey & Company Healthcare Practice. 2018. https://www.mckinsey.de/publikationen/digitalisierung-chance-mit-milliar.... Accessed 15 Aug 2022.

[7] Dash P, Henricson C, Kumar P, Stern N. The hospital is dead, long live the hospital! McKinsey & Company Healthcare Systems and Services Practice. 2019. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-.... Accessed 15 Aug 2022.

[8] Dash P, Henricson C, Kumar P, Stern N. The hospital is dead, long live the hospital! McKinsey & Company Healthcare Systems and Services Practice. 2019. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-.... Accessed 15 Aug 2022.

[9] Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc JAMIA. 2017;24:1142–1148. doi: 10.1093/jamia/ocx080. - DOI - PMC - PubMed

[10] Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc JAMIA. 2017;24:1142–1148. doi: 10.1093/jamia/ocx080. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FHIR-PYrate: a data science friendly Python package to query FHIR servers

Affiliations

FHIR-PYrate: a data science friendly Python package to query FHIR servers

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Research Materials