Authors:
Mazid Osseni
1
;
Prudencio Tossou
2
;
François Laviolette
1
and
Jacques Corbeil
1
;
3
Affiliations:
1
GRAAL, Institute Intelligence and Data, Department of Computer Science and Software Engineering, Université Laval, Quebec, QC, Canada
;
2
Valence AI Discovery, Montréal, QC, Canada
;
3
Department of Molecular Medicine, Université Laval, Quebec, QC, Canada
Keyword(s):
Multiclass Classification, Cancer, Multi-Omics Analysis, Transformer Model, Precision Medicine.
Abstract:
Motivation: Breakthroughs in high-throughput technologies and machine learning methods have enabled the shift towards multi-omics modelling as the preferred means to understand the mechanisms underlying biological processes. Machine learning enables and improves complex disease prognosis in clinical settings. However, most multi-omic studies primarily use transcriptomics and epigenomics due to their over-representation in databases and their early technical maturity compared to others omics. For complex phenotypes and mechanisms, not leveraging all the omics despite their varying degree of availability can lead to a failure to understand the underlying biological mechanisms and leads to less robust classifications and predictions. Results: We proposed MOT (Multi-Omic Transformer), a deep learning based model using the transformer architecture, that discriminates complex phenotypes (herein cancer types) based on five omics data types: transcriptomics (mRNA and miRNA), epigenomics (DNA
methylation), copy number variations (CNVs), and proteomics. This model achieves an F1-score of 98:37% among 33 tumour types on a test set without missing omics views and an F1-score of 96:74% on a test set with missing omics views. It also identifies the required omic type for the best prediction for each phenotype and therefore could guide clinical decision-making when acquiring data to confirm a diagnostic. The newly introduced model can integrate and analyze five or more omics data types even with missing omics views and can also identify the essential omics data for the tumour multiclass classification tasks. It confirms the importance of each omic view. Combined, omics views allow a better differentiation rate between most cancer diseases. Our study emphasized the importance of multi-omic data to obtain a better multiclass cancer classification. Availability and implementation: MOT source code is available at https://github.com/dizam92/multiomic predictions.
(More)