FHIR-PYrate: a data science friendly Python package to query FHIR servers - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 6;23(1):734.
doi: 10.1186/s12913-023-09498-1.

FHIR-PYrate: a data science friendly Python package to query FHIR servers

Affiliations

FHIR-PYrate: a data science friendly Python package to query FHIR servers

René Hosch et al. BMC Health Serv Res. .

Abstract

Background: We present FHIR-PYrate, a Python package to handle the full clinical data collection and extraction process. The software is to be plugged into a modern hospital domain, where electronic patient records are used to handle the entire patient's history. Most research institutes follow the same procedures to build study cohorts, but mainly in a non-standardized and repetitive way. As a result, researchers spend time writing boilerplate code, which could be used for more challenging tasks.

Methods: The package can improve and simplify existing processes in the clinical research environment. It collects all needed functionalities into a straightforward interface that can be used to query a FHIR server, download imaging studies and filter clinical documents. The full capacity of the search mechanism of the FHIR REST API is available to the user, leading to a uniform querying process for all resources, thus simplifying the customization of each use case. Additionally, valuable features like parallelization and filtering are included to make it more performant.

Results: As an exemplary practical application, the package can be used to analyze the prognostic significance of routine CT imaging and clinical data in breast cancer with tumor metastases in the lungs. In this example, the initial patient cohort is first collected using ICD-10 codes. For these patients, the survival information is also gathered. Some additional clinical data is retrieved, and CT scans of the thorax are downloaded. Finally, the survival analysis can be computed using a deep learning model with the CT scans, the TNM staging and positivity of relevant markers as input. This process may vary depending on the FHIR server and available clinical data, and can be customized to cover even more use cases.

Conclusions: FHIR-PYrate opens up the possibility to quickly and easily retrieve FHIR data, download image data, and search medical documents for keywords within a Python package. With the demonstrated functionality, FHIR-PYrate opens an easy way to assemble research collectives automatically.

Keywords: Dataframe; Dicom; Electronic patient record; FHIR; Information extraction; Python.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Resource statistics at our institution. The plot shows the distribution of the 1,498,863,142 resources present on our FHIR server as of 2023–03-13. The most common resource is ServiceRequest, which is used for records of requests for procedures, diagnostics, or any other service. The second most common is Observation, which stores measurements like lab values, vital parameters but also small texts like clinical notes
Fig. 2
Fig. 2
Overview of the tool’s role in the healthcare/machine learning picture. Presentation of an exemplary hospital infrastructure including multiple apps: a FHIR server, a DICOM Web capable PACS (or alternatively an app that handles the communication to the medical imaging storage), and various AI applications. Within the research domain, the FHIR-PYrate package handles the communication between the FHIR server and the PACS (DICOM Web). The package also helps with creating the data prerequisites needed for implementing machine learning solutions and for creating cohorts
Fig. 3
Fig. 3
A schematic view of the package structure. The Ahoy class handles the user's authentication and creates an HTTP session, which the other classes will then use to interact with the FHIR server. The Pirate class handles the communication with the REST API, collects Bundles, and builds the DataFrames. The Miner class is to be used on a DataFrame (normally the result of a DiagnosticReport query), and it will output a report on whether a particular regular expression is present in a text, and in which particular parts of the documents. The DICOMDownloader class is used to download medical imaging scans in bulk and store information about them. The Miner and the DICOMDownloader class are connected with dashed lines, as they can be used as an optional step after retrieving the initial data
Fig. 4
Fig. 4
Schematic view of the process within the Pirate class. The Pirate class can be used with any existing resource, and data retrieval processes remain unchanged. In this figure, the querying process for the pictured resources is always the same: First, a query has to be defined and run, then, a FHIR Bundle is returned, and finally, the Bundle is transformed into a DataFrame
Fig. 5
Fig. 5
Retrieval of all CT studies for patients with scoliosis. Two FHIR-PYrate queries can be used to obtain all CT studies belonging to patients which suffer from scoliosis. The first query is for the Condition resource, while the second one is for the ImagingStudy resource. The request parameters for the FHIR server are specified using the request_params parameter. The df_constraints parameter is used to specify request parameters that should be constrained according to each row of the input DataFrame. The fhir_paths parameter selects which attribute of the resource should be returned
Fig. 6
Fig. 6
Using processing functions. The steal_bundles_to_dataframe retrieves the required Observation resources that contain information about the blood pressure panel. Then, the obtained bundles are transformed to rows by using the get_blood_panel_info function. This function iterates through the entries of a bundle and collects the resource IDs, and then iterates through the component attribute, which contains multiple pieces of information about the blood panel status, in this case, the systolic and the diastolic blood pressure. Each component attribute contains a display name (a natural language name of what is being evaluated), a quantity (the actual measured value) and a unit of measurement. For each piece of information, the display name becomes the column header, the quantity becomes the value, and the unit is stored in an additional column. The table below the figure presents an example output for this query
Fig. 7
Fig. 7
Including secondary resources. ImagingStudy resources also contain a reference to the Patient resource they belong to, and using the _include parameter the corresponding Patient resources can be imported in the output Bundle. The attributes that should be added to the final DataFrame can be specified with the fhir_paths parameter, where it is also possible to specify from which resource the attribute should come from, to ensure clarity. Usually, one row of the output DataFrame represents one entry of a Bundle. By specifying the merge_on parameter, the rows which have the same patient_id are merged, producing a DataFrame similar to the table below the figure
Fig. 8
Fig. 8
An overview of the process to identify whether a text contains relevant sentences using the Miner class. First, if the document has a specific structure, all the information before a known keyword (e.g., “Findings”) is removed. Then, the sentences are identified using the SpaCy library. For each sentence, the input regular expression is matched against the text and if the sentiment of the sentence is not negative, the sentence is considered a match
Fig. 9
Fig. 9
Example Miner Query. The Miner class is first initialized, then, a DataFrame containing the text documents in a specified column is processed using the nlp_on_dataframe function. However, the data may not be stored as readable text on the FHIR server (e.g. may be stored as HTML, encoded). For this purpose, a processing function to preprocess the text may be specified, in this case, decode_text
Fig. 10
Fig. 10
Example DICOMDownloader Query. The DICOMDownloader class is initialized with the desired output format, and the data is downloaded to the desired output directory using the download_data_from_dataframe function
Fig. 11
Fig. 11
An overview of the download of DICOM studies using the DICOMDownloader. A collection of StudyInstanceUID and SeriesInstanceUID is given as input to the DICOMDownloader (1), which uses them to communicate with a DICOMweb instance (2) and stores the results in a predefined folder (3). Additionally, two DataFrames are returned (3). The first one contains a list of all successfully downloaded series, while the second one has a list of all the failed series and the kind of error that was produced
Fig. 12
Fig. 12
Data collection using FHIR-PYrate for breast cancer patients. The process starts by collecting patients with specific ICD-10 codes (1), the results are filtered (2), and then the clinical (3) and imaging data (4) is retrieved. The resulting DataFrames can also be merged to obtain the complete cohort information. The code retrieval is simplified by omitting the URL of each system (i.e., LOINC instead of "http://loinc.org")

Similar articles

Cited by

References

    1. Bender D, Sartipi K. HL7 FHIR: An Agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems. IEEE; 2013. p. 326–31. 10.1109/CBMS.2013.6627810.
    1. Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, et al. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc. 2022;29:1642–1653. doi: 10.1093/jamia/ocac105. - DOI - PMC - PubMed
    1. Hehner S, Liese K, Loos G, Möller M, Schiegnitz S, Schneider T, et al. Die Digitalisierung in deutschen Krankenhäusern - eine Chance mit Milliardenpotenzial. McKinsey & Company Healthcare Practice. 2018. https://www.mckinsey.de/publikationen/digitalisierung-chance-mit-milliar.... Accessed 15 Aug 2022.
    1. Dash P, Henricson C, Kumar P, Stern N. The hospital is dead, long live the hospital! McKinsey & Company Healthcare Systems and Services Practice. 2019. https://www.mckinsey.com/industries/healthcare-systems-and-services/our-.... Accessed 15 Aug 2022.
    1. Adler-Milstein J, Holmgren AJ, Kralovec P, Worzala C, Searcy T, Patel V. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide. J Am Med Inform Assoc JAMIA. 2017;24:1142–1148. doi: 10.1093/jamia/ocx080. - DOI - PMC - PubMed