Abstract
Background: The adoption of biomarkers as part of high-throughput, complex microarray or sequencing data has necessitated the discovery and validation of these data through machine learning. Machine learning has remained a fundamental and indispensable tool due to its efficacy and efficiency in both feature extraction of relevant biomarkers as well as the classification of samples as validation of the discovered biomarkers.
Objectives: This review aims to present the impact and ability of various machine learning methodologies and models to process high-throughput, high-dimensionality data found within mass spectrometry, microarray, and DNA/RNA-sequence data; data that precluded biomarker discovery prior to the use of machine learning.
Methods: A vast array of literature highlighting machine learning for biomarker discovery was reviewed, resulting in the eligibility of 21 machine learning algorithms/networks and 3 combinatory architectures, spanning 17 fields of study. This literature was screened to investigate the usage and development of machine learning within the framework of biomarker discovery.
Results: Out of the 93 papers collected, a total of 62 biomarker studies were further reviewed across different subfields-49 of which employed machine learning algorithms, and 13 of which employed neural network-based models. Through the application, innovation, and creation of tools in biomarker-related machine learning methodologies, its use allowed for the discovery, accumulation, validation, and interpretation of biomarkers within varied data formats, sources, as well as fields of study.
Conclusion: The use of machine learning methodologies for biomarker discovery is critical to the analysis of various types of data used for biomarker discovery, such as mass spectrometry, nucleotide and protein sequencing, and image (e.g. CT-scan) data. Further studies containing more standardized techniques for evaluation, and the use of cutting- edge machine learning architectures may lead to more accurate and specific results.
Keywords: Biomarker discovery, machine learning, neural networks, mass spectrometry, sequencing, microarray.