Machine Learning in Medicine

doi:10.1161/CIRCULATIONAHA.115.001593

Review

. 2015 Nov 17;132(20):1920-30.

doi: 10.1161/CIRCULATIONAHA.115.001593.

Machine Learning in Medicine

Rahul C Deo¹

Affiliations

Affiliation

¹ From Cardiovascular Research Institute, Department of Medicine and Institute for Human Genetics, University of California, San Francisco, and California Institute for Quantitative Biosciences, San Francisco. rahul.deo@ucsf.edu.

PMID: 26572668
PMCID: PMC5831252
DOI: 10.1161/CIRCULATIONAHA.115.001593

Review

Machine Learning in Medicine

Rahul C Deo. Circulation. 2015.

. 2015 Nov 17;132(20):1920-30.

doi: 10.1161/CIRCULATIONAHA.115.001593.

Author

Rahul C Deo¹

Affiliation

¹ From Cardiovascular Research Institute, Department of Medicine and Institute for Human Genetics, University of California, San Francisco, and California Institute for Quantitative Biosciences, San Francisco. rahul.deo@ucsf.edu.

PMID: 26572668
PMCID: PMC5831252
DOI: 10.1161/CIRCULATIONAHA.115.001593

Abstract

Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.

Keywords: artificial intelligence; computers; prognosis; risk factors; statistics.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: None.

Figures

**Figure 1**
Machine learning overview. A. Matrix representation of the supervised and unsupervised learning problem. We are interested in developing a model for predicting myocardial infarction (MI). For training data, we have patients, each characterized by an outcome (positive or negative training examples), denoted by the circle in the right-hand column, as well as by values of predictive features, denoted by blue to red coloring of squares. We seek to build a model to predict outcome using some combination of features. Multiple types of functions can be used for mapping features to outcome (**B–D**). Machine learning algorithms are used to find optimal values of free parameters in the model in order to minimize training error as judged by the difference between predicted values from our model and actual values. In the unsupervised learning problem, we are ignoring the outcome column, and grouping together patients based on similarities in the values of their features. B. Decision trees map features to outcome. At each node or branch point, training examples are partitioned based on the value of a particular feature. Additional branches are introduced with the goal of completely separating positive and negative training examples. C. Neural networks predict outcome based on transformed representations of features. A hidden layer of nodes integrates the value of multiple input nodes (raw features) to derive transformed features. The output node then uses values of these transformed features in a model to predict outcome. D. The k-nearest neighbor algorithm assigns class based on the values of the most similar training examples. The distance between patients is computed based on comparing multidimensional vectors of feature values. In this case, where there are only two features, if we consider the outcome class of the three nearest neighbors, the unknown data instance would be assigned a “no MI” class.

**Figure 2**
Overview of the C-Path image processing pipeline and prognostic model building procedure. A. Basic image processing and feature construction. B. Building an epithelial-stromal classifier. The classifier takes as input a set of breast cancer microscopic images that have undergone basic image processing and feature construction and that have had a subset of superpixels hand-labeled by a pathologist as epithelium(red) or stroma (green). The superpixel labels and feature measurements are used as input to a supervised learning algorithm to build an epithelial-stromal classifier. The classifier is then applied to new images to classify superpixels as epithelium or stroma. C. Constructing higher-level contextual/relational features. After application of the epithelial stromal classifier, all image objects are subclassified and colored on the basis of their tissue region and basic cellular morphologic properties. (Left panel) After the classification of each image object, a rich feature set is constructed. D. Learning an image-based model to predict survival. Processed images from patients alive at 5 years after surgery and from patients deceased at 5 years after surgery were used to construct an image-based prognostic model. After construction of the model, it was applied to a test set of breast cancer images (not used in model building) to classify patients as high or low risk of death by 5 years. From Beck et al, Sci Transl Med. 2011;3:108ra113. Reprinted with permission from AAAS.

**Figure 3**
Schematic of model development for breast cancer risk prediction. Shown are block diagrams that describe the development stages for the final ensemble prognostic model. Building a prognostic model involves derivation of relevant features, training submodels and making predictions, and combining predictions from each submodel. The model derived the attractor metagenes using gene expression data, combined them with the clinical information through Cox regression, gradient boosting machine, and k-nearest neighbor techniques, and eventually blended each submodel’s prediction. From Cheng et al, Sci Transl Med. 2013;5:181ra50. Reprinted with permission from AAAS.

**Figure 4**
Application of unsupervised learning to HFpEF. A. Phenotype heat map of HFpEF. Columns represent individual study participants; rows, individual features. B. Bayesian information criterion analysis for the identification of the optimal number of phenotypic clusters (pheno-groups). C. Survival free of cardiovascular (CV) hospitalization or death stratified by phenotypic cluster. Kaplan-Meier curves for the combined outcome of heart failure hospitalization, cardiovascular hospitalization, or death stratified by phenotypic cluster.

See this image and copyright information in PMC

Cited by

Developing clinical prediction models: a step-by-step guide.
Efthimiou O, Seo M, Chalkou K, Debray T, Egger M, Salanti G. Efthimiou O, et al. BMJ. 2024 Sep 3;386:e078276. doi: 10.1136/bmj-2023-078276. BMJ. 2024. PMID: 39227063 Free PMC article.
Identification of diagnostic biomarkers of rheumatoid arthritis based on machine learning-assisted comprehensive bioinformatics and its correlation with immune cells.
Mu KL, Ran F, Peng LQ, Zhou LL, Wu YT, Shao MH, Chen XG, Guo CM, Luo QM, Wang TJ, Liu YC, Liu G. Mu KL, et al. Heliyon. 2024 Aug 5;10(15):e35511. doi: 10.1016/j.heliyon.2024.e35511. eCollection 2024 Aug 15. Heliyon. 2024. PMID: 39170142 Free PMC article.
Alternative Splicing, Internal Promoter, Nonsense-Mediated Decay, or All Three: Explaining the Distribution of Truncation Variants in Titin.
Deo RC. Deo RC. Circ Cardiovasc Genet. 2016 Oct;9(5):419-425. doi: 10.1161/CIRCGENETICS.116.001513. Epub 2016 Sep 13. Circ Cardiovasc Genet. 2016. PMID: 27625338 Free PMC article.
Application of machine learning model to predict osteoporosis based on abdominal computed tomography images of the psoas muscle: a retrospective study.
Huang CB, Hu JS, Tan K, Zhang W, Xu TH, Yang L. Huang CB, et al. BMC Geriatr. 2022 Oct 13;22(1):796. doi: 10.1186/s12877-022-03502-9. BMC Geriatr. 2022. PMID: 36229793 Free PMC article.
Machine learning outperforms traditional logistic regression and offers new possibilities for cardiovascular risk prediction: A study involving 143,043 Chinese patients with hypertension.
Xi Y, Wang H, Sun N. Xi Y, et al. Front Cardiovasc Med. 2022 Nov 14;9:1025705. doi: 10.3389/fcvm.2022.1025705. eCollection 2022. Front Cardiovasc Med. 2022. PMID: 36451926 Free PMC article.

See all "Cited by" articles

References

1. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York, NY: Springer Science & Business Media; 2009.
1. Abu-Mostafa YS, Magdon-Ismail M, Lin H-T. Learning from Data. 2012 AMLbook.com.
1. Kannel WB, Doyle JT, McNamara PM, Quickenton P, Gordon T. Precursors of sudden coronary death. Factors related to the incidence of sudden death. Circulation. 1975;51:606–613. - PubMed
1. Lip GYH, Nieuwlaat R, Pisters R, Lane DA, Crijns HJGM. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest. 2010;137:263–272. - PubMed
1. O'Mahony C, Jichi F, Pavlou M, Monserrat L, Anastasakis A, Rapezzi C, Biagini E, Gimeno JR, Limongelli G, McKenna WJ, Omar RZ, Elliott PM. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy (HCM risk-SCD) Eur Heart J. 2014;35:2010–2020. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

[1] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York, NY: Springer Science & Business Media; 2009.

[2] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York, NY: Springer Science & Business Media; 2009.

[3] Abu-Mostafa YS, Magdon-Ismail M, Lin H-T. Learning from Data. 2012 AMLbook.com.

[4] Abu-Mostafa YS, Magdon-Ismail M, Lin H-T. Learning from Data. 2012 AMLbook.com.

[5] Kannel WB, Doyle JT, McNamara PM, Quickenton P, Gordon T. Precursors of sudden coronary death. Factors related to the incidence of sudden death. Circulation. 1975;51:606–613. - PubMed

[6] Kannel WB, Doyle JT, McNamara PM, Quickenton P, Gordon T. Precursors of sudden coronary death. Factors related to the incidence of sudden death. Circulation. 1975;51:606–613. - PubMed

[7] Lip GYH, Nieuwlaat R, Pisters R, Lane DA, Crijns HJGM. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest. 2010;137:263–272. - PubMed

[8] Lip GYH, Nieuwlaat R, Pisters R, Lane DA, Crijns HJGM. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest. 2010;137:263–272. - PubMed

[9] O'Mahony C, Jichi F, Pavlou M, Monserrat L, Anastasakis A, Rapezzi C, Biagini E, Gimeno JR, Limongelli G, McKenna WJ, Omar RZ, Elliott PM. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy (HCM risk-SCD) Eur Heart J. 2014;35:2010–2020. - PubMed

[10] O'Mahony C, Jichi F, Pavlou M, Monserrat L, Anastasakis A, Rapezzi C, Biagini E, Gimeno JR, Limongelli G, McKenna WJ, Omar RZ, Elliott PM. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy (HCM risk-SCD) Eur Heart J. 2014;35:2010–2020. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine Learning in Medicine

Affiliation

Machine Learning in Medicine

Author

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources