Abstract
Background: The appearance of cancer subtypes with different clinical significance fully reflects the high heterogeneity of cancer. At present, the method of multi-omics integration has become more and more mature. However, in the practical application of the method, the omics of some samples are missing.
Objective: The purpose of this study is to establish a depth model that can effectively integrate and express partial multi-omics data to accurately identify cancer subtypes.
Methods: We proposed a novel partial multi-omics learning model for cancer subtypes, MPGIL (Multichannel Partial Graph Integration Learning). MPGIL has two main components. Firstly, it obtains more lateral adjacency information between samples within the omics through the multi-channel graph autoencoders based on high-order proximity. To reduce the negative impact of missing samples, the weighted fusion layer is introduced to replace the concatenate layer to learn the consensus representation across multi-omics. Secondly, a classifier is introduced to ensure that the consensus representation is representative of clustering. Finally, subtypes were identified by K-means.
Results: This study compared MPGIL with other multi-omics integration methods on 16 datasets. The clinical and survival results show that MPGIL can effectively identify subtypes. Three ablation experiments are designed to highlight the importance of each component in MPGIL. A case study of AML was conducted. The differentially expressed gene profiles among its subtypes fully reveal the high heterogeneity of cancer.
Conclusion: MPGIL can effectively learn the consistent expression of partial multi-omics datasets and discover subtypes, and shows more significant performance than the state-of-the-art methods.
Keywords: Partial multi-omics data, high-order proximity, cancer data, multi-channel, classifier, graph autoencoder.
[http://dx.doi.org/10.3322/caac.21660] [PMID: 33538338]
[http://dx.doi.org/10.1016/j.cell.2013.03.002] [PMID: 23540688]
[http://dx.doi.org/10.1038/bjc.2012.581] [PMID: 23299535]
[http://dx.doi.org/10.1109/TCBB.2019.2910515] [PMID: 30990192]
[PMID: 33397941]
[http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]
[http://dx.doi.org/10.1038/nrg.2016.49] [PMID: 27184599]
[http://dx.doi.org/10.1038/nature08987] [PMID: 20393554]
[http://dx.doi.org/10.1038/ng.2764] [PMID: 24071849]
[http://dx.doi.org/10.1093/nar/gky889] [PMID: 30295871]
[http://dx.doi.org/10.1371/journal.pcbi.1009224] [PMID: 34383739]
[http://dx.doi.org/10.1177/1177932219899051] [PMID: 32076369]
[http://dx.doi.org/10.14348/molcells.2021.0042] [PMID: 34238766]
[http://dx.doi.org/10.1016/j.neucom.2021.11.094]
[http://dx.doi.org/10.1093/bib/bbz138] [PMID: 31792509]
[http://dx.doi.org/10.1093/bib/bbx167] [PMID: 29272335]
[http://dx.doi.org/10.1016/j.csbj.2021.01.009] [PMID: 33613862]
[http://dx.doi.org/10.1093/bib/bbz015] [PMID: 31220206]
[http://dx.doi.org/10.1016/j.biotechadv.2021.107739] [PMID: 33794304]
[http://dx.doi.org/10.1002/wics.1553] [PMID: 35573155]
[http://dx.doi.org/10.1016/j.csbj.2020.02.011] [PMID: 32206210]
[http://dx.doi.org/10.1016/j.isci.2022.103798] [PMID: 35169688]
[http://dx.doi.org/10.1093/bib/bbaa102] [PMID: 32533167]
[http://dx.doi.org/10.1038/s41598-021-03034-z] [PMID: 34911979]
[http://dx.doi.org/10.1186/s12859-021-04195-4] [PMID: 34034643]
[PMID: 35057729]
[http://dx.doi.org/10.3390/genes12040526] [PMID: 33916856]
[http://dx.doi.org/10.1038/s41598-020-70229-1] [PMID: 32788601]
[http://dx.doi.org/10.1016/j.ymeth.2017.06.010] [PMID: 28627406]
[http://dx.doi.org/10.15252/msb.20178124] [PMID: 29925568]
[http://dx.doi.org/10.1109/TCBB.2021.3063284] [PMID: 33656995]
[http://dx.doi.org/10.1016/j.jtbi.2022.111328] [PMID: 36273593]
[http://dx.doi.org/10.1016/j.celrep.2021.108975] [PMID: 33852839]
[http://dx.doi.org/10.3389/fgene.2021.718915] [PMID: 34552619]
[http://dx.doi.org/10.1049/cje.2021.01.011]
[http://dx.doi.org/10.1109/BIBM.2017.8217682]
[http://dx.doi.org/10.1093/bioinformatics/btab535] [PMID: 34289034]
[http://dx.doi.org/10.1109/BIBM52615.2021.9669659]
[http://dx.doi.org/10.1016/j.compbiomed.2022.106085] [PMID: 36162197]
[http://dx.doi.org/10.1093/bib/bbac132] [PMID: 35437603]
[http://dx.doi.org/10.3389/fgene.2022.855629] [PMID: 35391797]
[http://dx.doi.org/10.3390/math9243259]
[http://dx.doi.org/10.1093/bib/bbab600] [PMID: 35079777]
[http://dx.doi.org/10.1109/BIBM52615.2021.9669608]
[http://dx.doi.org/10.3389/fgene.2022.806842] [PMID: 35186034]
[http://dx.doi.org/10.1093/bib/bbab398] [PMID: 34607358]
[http://dx.doi.org/10.1093/bioinformatics/btab625] [PMID: 34478501]
[http://dx.doi.org/10.1093/bib/bbab454] [PMID: 34791014]
[http://dx.doi.org/10.1093/nar/gkab394] [PMID: 34019646]
[http://dx.doi.org/10.1186/s12859-020-3465-2] [PMID: 32299344]
[http://dx.doi.org/10.1186/s12864-015-2223-8] [PMID: 26626453]
[http://dx.doi.org/10.1093/biostatistics/kxx017] [PMID: 28541380]
[http://dx.doi.org/10.1038/nmeth.2810] [PMID: 24464287]
[http://dx.doi.org/10.1007/978-3-030-84532-2_44]
[http://dx.doi.org/10.1093/bioinformatics/btab109] [PMID: 33599254]
[http://dx.doi.org/10.1093/bioinformatics/btac345] [PMID: 35639657]
[http://dx.doi.org/10.1093/bioinformatics/bty775] [PMID: 30184058]
[http://dx.doi.org/10.1093/bioinformatics/btz058] [PMID: 30698637]
[http://dx.doi.org/10.1186/s12859-016-1273-5] [PMID: 27716030]
[http://dx.doi.org/10.1016/j.ymeth.2020.08.001] [PMID: 32805397]
[http://dx.doi.org/10.1093/bioinformatics/bty866] [PMID: 30329022]
[http://dx.doi.org/10.1109/TCBB.2021.3139597]
[http://dx.doi.org/10.1016/0377-0427(87)90125-7]
[http://dx.doi.org/10.1080/03610927408827101]
[http://dx.doi.org/10.1109/TPAMI.1979.4766909] [PMID: 21868852]
[http://dx.doi.org/10.1093/nar/gkv007] [PMID: 25605792]