Dataset for Machine Learning Assisted Citation Screening for Systematic Reviews
Creators
Description
The work "Machine Learning Assisted Citation Screening for Systematic Reviews" explored the problem of citation screening automation using machine-learning (ML) with an aim to accelerate the process of generating systematic reviews. Manual process of citation screening involve two reviewers manually screening the searched studies using a predefined inclusion criteria. If the study passes the "inclusion" criteria, it is included for further analysis or is excluded. As apparant through manual screening process, the work considered citation screening as a binary classification problem whereby any ML classifier could be trained to separate the searched studies into these two classes (include and exclude).
A physiotherapy citation screening dataset was used to test automation approaches and the dataset includes the studies identified for citation screening in an update to the systematic review by Hilfiker et al. The dataset included titles and abstracts (citations) from 31,279 (deduplicated: 25,540) studies identified during the search phase of this SR. These studies were already manually assessed for relevance and labelled by two reviewers into two mutually exclusive labels. The uploaded file consists of 25,540 data samples, with each data sample separated by a new line. It is a tab separated file and the data in it is structured as shown below. This dataset was manually labelled into include and exclude by Hilfiker et al.
Title | PMID | Abstract | Class | MeSH terms (separated by a pipe) |
Structured exercise improves physical functioning in women with stages I and II breast cancer: results of a randomized controlled trial. | 11157015 | Abstract PURPOSE: Self-directed and supervised exercise were compared with usual care in a clinical trial designed to evaluate the effect of structured exercise on physical functioning and other dimensions of health-related quality of life in women with stages I and II breast cancer. PATIENTS AND METHODS: One hundred twenty-three women with stages I and II breast cancer completed baseline evaluations of generic and disease- and site-specific health-related quality of life, aerobic capacity, and body weight. Participants were randomly allocated to one of three intervention groups: usual care (control group), self-directed exercise, or supervised exercise. Quality of life, aerobic capacity, and body weight measures were repeated at 26 weeks... | include or exclude | Clinical Trial | Comparative Study | Randomized Controlled Trial | Research Support, Non-U.S. Gov't | Antineoplastic Combined Chemotherapy Protocols | Breast Neoplasms | Breast Neoplasms | Breast Neoplasms | Chemotherapy, Adjuvant | Exercise | Female | Humans | Middle Aged | Neoplasm Staging | Quality of Life | Radiotherapy, Adjuvant |
If you use this dataset in your research, please cite our papers.
Files
citation_screening_deduplicated_corpus.csv
Files
(53.9 MB)
Name | Size | Download all |
---|---|---|
md5:3d8c9589d8051b26d6f9dff7eaf1865d
|
53.9 MB | Preview Download |
Additional details
References
- Hilfiker, Roger, et al. "Exercise and other non-pharmaceutical interventions for cancer-related fatigue in patients during or after cancer treatment: a systematic review incorporating an indirect-comparisons meta-analysis." British journal of sports medicine 52.10 (2018): 651-658.
- Dhrangadhariya, Anjani, et al. "Machine Learning Assisted Citation Screening for Systematic Reviews." MIE. 2020.