Dataset for Machine Learning Assisted Citation Screening for Systematic Reviews

File uploads: We have fixed an issue which caused file uploads to fail. We apologise for the inconvenience it may have caused.

Published December 22, 2023 | Version 1
Dataset Open

Dataset for Machine Learning Assisted Citation Screening for Systematic Reviews

  • 1. ROR icon University of Geneva
  • 2. ROR icon HES-SO University of Applied Sciences and Arts Western Switzerland
  • 3. ROR icon University Hospital of Lausanne

Description

The work "Machine Learning Assisted Citation Screening for Systematic Reviews" explored the problem of citation screening automation using machine-learning (ML) with an aim to accelerate the process of generating systematic reviews. Manual process of citation screening involve two reviewers manually screening the searched studies using a predefined inclusion criteria. If the study passes the "inclusion" criteria, it is included for further analysis or is excluded. As apparant through manual screening process, the work considered citation screening as a binary classification problem whereby any ML classifier could be trained to separate the searched studies into these two classes (include and exclude).

 

A physiotherapy citation screening dataset was used to test automation approaches and the dataset includes the studies identified for citation screening in an update to the systematic review by Hilfiker et al. The dataset included titles and abstracts (citations) from 31,279 (deduplicated: 25,540) studies identified during the search phase of this SR. These studies were already manually assessed for relevance and labelled by two reviewers into two mutually exclusive labels. The uploaded file consists of 25,540 data samples, with each data sample separated by a new line. It is a tab separated file and the data in it is structured as shown below. This dataset was manually labelled into include and exclude by Hilfiker et al.

 

Title PMID Abstract  Class MeSH terms (separated by a pipe)
Structured exercise improves physical functioning in women with stages I and II breast cancer: results of a randomized controlled trial.   11157015 Abstract PURPOSE: Self-directed and supervised exercise were compared with usual care in a clinical trial designed to evaluate the effect of structured exercise on physical functioning and other dimensions of health-related quality of life in women with stages I and II breast cancer. PATIENTS AND METHODS: One hundred twenty-three women with stages I and II breast cancer completed baseline evaluations of generic and disease- and site-specific health-related quality of life, aerobic capacity, and body weight. Participants were randomly allocated to one of three intervention groups: usual care (control group), self-directed exercise, or supervised exercise. Quality of life, aerobic capacity, and body weight measures were repeated at 26 weeks... include or exclude Clinical Trial | Comparative Study | Randomized Controlled Trial | Research Support, Non-U.S. Gov't | Antineoplastic Combined Chemotherapy Protocols | Breast Neoplasms | Breast Neoplasms | Breast Neoplasms | Chemotherapy, Adjuvant | Exercise | Female | Humans | Middle Aged | Neoplasm Staging | Quality of Life | Radiotherapy, Adjuvant

 

If you use this dataset in your research, please cite our papers.

Files

citation_screening_deduplicated_corpus.csv

Files (53.9 MB)

Name Size Download all
md5:3d8c9589d8051b26d6f9dff7eaf1865d
53.9 MB Preview Download

Additional details

References

  • Hilfiker, Roger, et al. "Exercise and other non-pharmaceutical interventions for cancer-related fatigue in patients during or after cancer treatment: a systematic review incorporating an indirect-comparisons meta-analysis." British journal of sports medicine 52.10 (2018): 651-658.
  • Dhrangadhariya, Anjani, et al. "Machine Learning Assisted Citation Screening for Systematic Reviews." MIE. 2020.