mixOmics: An R package for 'omics feature selection and multiple data integration

doi:10.1371/journal.pcbi.1005752

. 2017 Nov 3;13(11):e1005752.

doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.

mixOmics: An R package for 'omics feature selection and multiple data integration

Florian Rohart¹, Benoît Gautier¹, Amrit Singh^{2

3}, Kim-Anh Lê Cao^{1

4}

Affiliations

¹ The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia.
² Prevention of Organ Failure (PROOF) Centre of Excellence, Vancouver, British Columbia, Canada.
³ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada.
⁴ Melbourne Integrative Genomics and School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia.

PMID: 29099853
PMCID: PMC5687754
DOI: 10.1371/journal.pcbi.1005752

mixOmics: An R package for 'omics feature selection and multiple data integration

Florian Rohart et al. PLoS Comput Biol. 2017.

. 2017 Nov 3;13(11):e1005752.

doi: 10.1371/journal.pcbi.1005752. eCollection 2017 Nov.

Authors

Florian Rohart¹, Benoît Gautier¹, Amrit Singh^{2

3}, Kim-Anh Lê Cao^{1

4}

Affiliations

¹ The University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Queensland, Australia.
² Prevention of Organ Failure (PROOF) Centre of Excellence, Vancouver, British Columbia, Canada.
³ Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada.
⁴ Melbourne Integrative Genomics and School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria, Australia.

PMID: 29099853
PMCID: PMC5687754
DOI: 10.1371/journal.pcbi.1005752

Abstract

The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Fig 1. Overview of the mixOmics multivariate methods for single and integrative ‘omics supervised analyses.**
X denote a predictor ‘omics data set, and y a categorical outcome response (*e.g*. healthy vs. sick). Integrative analyses include N-integration with DIABLO (the same N samples are measured on different ‘omics platforms), and P-integration with MINT (the same P ‘omics predictors are measured in several independent studies). Sample plots depicted here use the mixOmics functions (from left to right) plotIndiv, plotArrow and plotIndiv in 3D; variable plots use the mixOmics functions network, cim, plotLoadings, plotVar and circosPlot. The graphical output functions are detailed in Supporting Information S1 Text.

**Fig 2. Prediction area visualisation on the Small Round Blue Cell Tumors data (SRBCT [35]) data, described in the Results Section, with respect to the prediction distance.**
From left to right: ‘maximum distance’, ‘Centroid distance’ and ‘Mahalanobis distance’. Sample prediction area plots from a PLS-DA model applied on a microarray data set with the expression levels of 2,308 genes on 63 samples. Samples are classified into four classes: Burkitt Lymphoma (BL), Ewing Sarcoma (EWS), Neuroblastoma (NB), and Rhabdomyosarcoma (RMS).

**Fig 3. Illustration of a single ‘omics analysis with mixOmics.**
**A) Unsupervised preliminary analysis with PCA**, A1: PCA sample plot, A2: percentage of explained variance per component. **B) Supervised analysis with PLS-DA**, B1: PLS-DA sample plot with confidence ellipse plots, B2: classification performance per component (overall and BER) for three prediction distances using repeated stratified cross-validation (10×5-fold CV). **C) Supervised analysis and feature selection with sparse PLS-DA**, C1: sPLS-DA sample plot with confidence ellipse plots, C2: arrow plot representing each sample pointing towards its outcome category, see more details in Supporting Information S1 Text. C3: Clustered Image Map (Euclidean Distance, Complete linkage) where samples are represented in rows and selected features in columns (10, 300 and 30 genes selected on each component respectively), C4: ROC curve and AUC averaged using one-vs-all comparisons.

**Fig 4. Illustration of N-integrative supervised analysis with DIABLO.**
A: sample plot per data set, B: sample scatterplot from plotDiablo displaying the first component in each data set (upper diagonal plot) and Pearson correlation between each component (lower diagonal plot). C: Clustered Image Map (Euclidean distance, Complete linkage) of the multi-omics signature. Samples are represented in rows, selected features on the first component in columns. D: Circos plot shows the positive (negative) correlation (r > 0.7) between selected features as indicated by the brown (black) links, feature names appear in the quadrants, E: Correlation Circle plot representing each type of selected features, F: relevance network visualisation of the selected features.

**Fig 5. Illustration of MINT analysis in mixOmics.**
A: Parameter tuning of a MINT sPLS-DA model with two components using Leave-One-Group-Out cross-validation and maximum distance, BER (y-axis) with respect to number of selected features (x-axis). Full diamond represents the optimal number of features to select on each component, B: Performance of the final MINT sPLS-DA model including selected features based on BER and classification error rate per class, C: Global sample plot with confidence ellipse plots, D: Study specific sample plot, E: Clustered Image Map (Euclidean Distance, Complete linkage). Samples are represented in rows, selected features on the first component in columns. F: Loading plot of each feature selected on the first component in each study, with color indicating the class with a maximal mean expression value for each gene.

See this image and copyright information in PMC

Cited by

The BNT162b2 mRNA vaccine demonstrates reduced age-associated T_H1 support in vitro and in vivo.
Brook B, Checkervarty AK, Barman S, Sweitzer C, Bosco AN, Sherman AC, Baden LR, Morrocchi E, Sanchez-Schmitz G, Palma P, Nanishi E, O'Meara TR, McGrath ME, Frieman MB, Soni D, van Haren SD, Ozonoff A, Diray-Arce J, Steen H, Dowling DJ, Levy O. Brook B, et al. iScience. 2024 Sep 26;27(11):111055. doi: 10.1016/j.isci.2024.111055. eCollection 2024 Nov 15. iScience. 2024. PMID: 39569372 Free PMC article.
Targeting AXL cellular networks in kidney fibrosis.
Grøndal SM, Blø M, Nilsson LIH, Rayford AJ, Jackson A, Gausdal G, Lorens JB. Grøndal SM, et al. Front Immunol. 2024 Nov 4;15:1446672. doi: 10.3389/fimmu.2024.1446672. eCollection 2024. Front Immunol. 2024. PMID: 39559366 Free PMC article.
Comparative transcriptomic analyses of diploid and tetraploid citrus reveal how ploidy level influences salt stress tolerance.
Bonnin M, Soriano A, Favreau B, Lourkisti R, Miranda M, Ollitrault P, Oustric J, Berti L, Santini J, Morillon R. Bonnin M, et al. Front Plant Sci. 2024 Oct 30;15:1469115. doi: 10.3389/fpls.2024.1469115. eCollection 2024. Front Plant Sci. 2024. PMID: 39544537 Free PMC article.
Analysis of blood metabolite characteristics at birth in preterm infants with bronchopulmonary dysplasia: an observational cohort study.
Guo Y, Chen J, Zhang Z, Liu C, Li J, Liu Y. Guo Y, et al. Front Pediatr. 2024 Oct 31;12:1474381. doi: 10.3389/fped.2024.1474381. eCollection 2024. Front Pediatr. 2024. PMID: 39544337 Free PMC article.
Diagnosis and prognosis prediction of gastric cancer by high-performance serum lipidome fingerprints.
Cai ZR, Wang W, Chen D, Chen HJ, Hu Y, Luo XJ, Wang YT, Pan YQ, Mo HY, Luo SY, Liao K, Zeng ZL, Li SS, Guan XY, Fan XJ, Piao HL, Xu RH, Ju HQ. Cai ZR, et al. EMBO Mol Med. 2024 Dec;16(12):3089-3112. doi: 10.1038/s44321-024-00169-0. Epub 2024 Nov 14. EMBO Mol Med. 2024. PMID: 39543322 Free PMC article.

See all "Cited by" articles

References

1. Lê Cao KA, Rohart F, Gonzalez I, Déjean S, Gautier B, Bartolo F, et al. mixOmics: Omics Data Integration Project; 2017. Available from: https://CRAN.R-project.org/package=mixOmics.
1. Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2007;8(1):32–44. doi: 10.1093/bib/bbl016 - DOI - PubMed
1. Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in bioinformatics. 2016; p. bbv108. doi: 10.1093/bib/bbv108 - DOI - PMC - PubMed
1. Labus JS, Van Horn JD, Gupta A, Alaverdyan M, Torgerson C, Ashe-McNalley C, et al. Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects. Pain. 2015;156(8):1545–1554. doi: 10.1097/j.pain.0000000000000196 - DOI - PMC - PubMed
1. Cook JA, Chandramouli GV, Anver MR, Sowers AL, Thetford A, Krausz KW, et al. Mass Spectrometry–Based Metabolomics Identifies Longitudinal Urinary Metabolite Profiles Predictive of Radiation-Induced Cancer. Cancer research. 2016;76(6):1569–1577. doi: 10.1158/0008-5472.CAN-15-2416 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

FR was supported, in part, by the Australian Cancer Research Foundation (ACRF) for the Diamantina Individualised Oncology Care Centre at The University of Queensland Diamantina Institute. KALC was supported, in part, by the National Health and Medical Research Council (NHMRC) Career Development fellowship (APP1087415). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources
Other Literature Sources

[1] Lê Cao KA, Rohart F, Gonzalez I, Déjean S, Gautier B, Bartolo F, et al. mixOmics: Omics Data Integration Project; 2017. Available from: https://CRAN.R-project.org/package=mixOmics.

[2] Lê Cao KA, Rohart F, Gonzalez I, Déjean S, Gautier B, Bartolo F, et al. mixOmics: Omics Data Integration Project; 2017. Available from: https://CRAN.R-project.org/package=mixOmics.

[3] Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2007;8(1):32–44. doi: 10.1093/bib/bbl016 - DOI - PubMed

[4] Boulesteix AL, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2007;8(1):32–44. doi: 10.1093/bib/bbl016 - DOI - PubMed

[5] Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in bioinformatics. 2016; p. bbv108. doi: 10.1093/bib/bbv108 - DOI - PMC - PubMed

[6] Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Briefings in bioinformatics. 2016; p. bbv108. doi: 10.1093/bib/bbv108 - DOI - PMC - PubMed

[7] Labus JS, Van Horn JD, Gupta A, Alaverdyan M, Torgerson C, Ashe-McNalley C, et al. Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects. Pain. 2015;156(8):1545–1554. doi: 10.1097/j.pain.0000000000000196 - DOI - PMC - PubMed

[8] Labus JS, Van Horn JD, Gupta A, Alaverdyan M, Torgerson C, Ashe-McNalley C, et al. Multivariate morphological brain signatures predict patients with chronic abdominal pain from healthy control subjects. Pain. 2015;156(8):1545–1554. doi: 10.1097/j.pain.0000000000000196 - DOI - PMC - PubMed

[9] Cook JA, Chandramouli GV, Anver MR, Sowers AL, Thetford A, Krausz KW, et al. Mass Spectrometry–Based Metabolomics Identifies Longitudinal Urinary Metabolite Profiles Predictive of Radiation-Induced Cancer. Cancer research. 2016;76(6):1569–1577. doi: 10.1158/0008-5472.CAN-15-2416 - DOI - PMC - PubMed

[10] Cook JA, Chandramouli GV, Anver MR, Sowers AL, Thetford A, Krausz KW, et al. Mass Spectrometry–Based Metabolomics Identifies Longitudinal Urinary Metabolite Profiles Predictive of Radiation-Induced Cancer. Cancer research. 2016;76(6):1569–1577. doi: 10.1158/0008-5472.CAN-15-2416 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

mixOmics: An R package for 'omics feature selection and multiple data integration

Affiliations

mixOmics: An R package for 'omics feature selection and multiple data integration

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources