Abstract
Complex microbial communities shape the dynamics of various environments, ranging from the mammalian gastrointestinal tract to the soil. Advances in DNA sequencing technologies and data analysis have provided drastic improvements in microbiome analyses, for example, in taxonomic resolution, false discovery rate control and other properties, over earlier methods. In this Review, we discuss the best practices for performing a microbiome study, including experimental design, choice of molecular analysis technology, methods for data analysis and the integration of multiple omics data sets. We focus on recent findings that suggest that operational taxonomic unit-based analyses should be replaced with new methods that are based on exact sequence variants, methods for integrating metagenomic and metabolomic data, and issues surrounding compositional data analysis, where advances have been particularly rapid. We note that although some of these approaches are new, it is important to keep sight of the classic issues that arise during experimental design and relate to research reproducibility. We describe how keeping these issues in mind allows researchers to obtain more insight from their microbiome data sets.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
9,800 Yen / 30 days
cancel any time
Subscription info for Japanese customers
We have a dedicated website for our Japanese customers. Please go to natureasia.com to subscribe to this journal.
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Meisel, J. S., Hannigan, G. D. & Tyldsley, A. S. Skin microbiome surveys are strongly influenced by experimental design. J. Invest. Dermatol. 136, 947–956 (2016).
Falony, G. et al. Population-level analysis of gut microbiome variation. Science 29, 560–564 (2016).
Noguera-Julian, M. et al. Gut microbiota linked to sexual preference and HIV infection. EBioMedicine. 5, 135–146 (2016).
Wu, Gary, D. et al. Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105–108 (2011).
Forslund, K. et al. Disentangling the effects of type 2 diabetes and metformin on the human gut microbiota. Nature 528, 262–266 (2015). This study is an excellent example of how study design and metadata collection can influence experimental results.
Jackson, M. A. et al. Proton pump inhibitors alter the composition of the gut microbiota. Gut 65, 749–756 (2016).
Halfvarson, J. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat. Microbiol. 2, 17004 (2017).
Kelly, B. J. et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 31, 2461–2468 (2015).
Debelius, J., Song, S. J., Vazquez-Baeza, Y., Xu, Z. Z., Gonzalez, A. & Knight, R. Tiny microbes, enormous impacts: what matters in gut microbiome studies? Genome Biol. 17, 217 (2016).
La Rosa, P. S. et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE. 7, e52078 (2012).
Knights, D., Costello, E. K. & Knight, R. Supervised classification of human microbiota. FEMS Microbiol. Rev. 35, 343–359 (2011).
Dethlefsen, L. & Relman, D. A. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc. Natl Acad. Sci. USA 108, 4554–4561 (2011).
Fierer, N., Hamady, M., Lauber, C. L. & Knight, R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc. Natl Acad. Sci. USA 105, 17994–17999 (2008).
Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012). This study was the first large-scale effort to characterize the healthy human microbiota and commonly used reference database.
McDonald, D., Birmingham, A. & Knight, R. Context and the human microbiome. Microbiome 3, 52 (2015).
Ramette, A. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62, 142–160 (2007).
Kostic, A. D., Howitt, M. R. & Garrett, W. S. Exploring host-microbiota interactions in animal models and humans. Genes Dev. 27, 701–718 (2013).
Ridaura, V. K. et al. Cultured gut microbiota from twins discordant for obesity modulate adiposity and metabolic phenotypes in mice. Science 341, 6150 (2013).
Reber, S. O. et al. Immunization with a heat-killed preparation of the environmental bacterium Mycobacterium Vaccae promotes stress resilience in mice. Proc. Natl Acad. Sci. USA 113, E3130–E3139 (2016).
Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102, 11070–11075 (2005).
Friswell, M. K. et al. Site and strain-specific variation in gut microbiota profiles and metabolism in experimental mice. PLoS ONE. 5, e8584 (2010).
Snijders, A. M. et al. Influence of early life exposure, host genetics and diet on the mouse gut microbiome and metabolome. Nat. Microbiol. 2, 16221 (2016).
Stagaman, K., Burns, A. R., Guillemin, K. & Bohannan, B. J. The role of adaptive immunity as an ecological filter on the gut microbiota in zebrafish. ISME J. 11, 1630–1639 (2017).
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).
Salter, S. J. et al. Reagent and laboratory contamination can. critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
Amir, A. et al. Correcting for microbial blooms in fecal samples during room-temperature shipping. mSystems 2, e00199–00116 (2017).
Fouhy, F. et al. The effects of freezing on faecal microbiota as determined using MiSeq sequencing and culture-based investigations. PLoS ONE. 10, e0119355 (2015).
Song, S. J. et al. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1, e00021–00016 (2016).
Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE. 7, e39315 (2012).
Chase, J. et al. Geography and location are the primary drivers of office microbiome composition. mSystems 1, e00022–00016 (2016).
Walker, A. W. et al. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome 3, 26 (2015).
Bonnet, R., Suau, A., Doré, J., Gibson, G. R. & Collins, M. D. Differences in rDNA libraries of faecal bacteria derived from 10- and 25-cycle PCRs. Int. J. Syst. Evol. Microbiol. 52, 757–763 (2002).
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).
Walters, W. A. et al. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 1159–1161 (2011).
Zaneveld, J. R., Lozupone, C., Gordon, J. I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010).
Okuda, S., Tsuchiya, Y., Kiriyama, C., Itoh, M. & Morisaki, H. Virtual metagenome reconstruction from 16S rRNA gene sequences. Nat. Commun. 3, 1203 (2012).
Langille, M. G. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).
Aßhauer, K. P., Wemheuer, B., Daniel, R. & Meinicke, P. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31, 2882–2884 (2015).
Jun, S. R., Robeson, M. S., Hauser, L. J., Schadt, C. W. & Gorin, A. A. PanFP: pangenome-based functional profiles for microbial communities. BMC Res. Notes 8, 479 (2015).
Scholz, M. et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods. 13, 435–438 (2016).
Mukherjee, S. et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 35, 676–683 (2016).
Abubucker, Sahar et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).
Quince, C., Walker, A. W. & Simpson, J. T. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017). This is a comprehensive review on using shotgun metagenomics.
Carini, P. et al. Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Nat. Microbiol. 2, 16242 (2016).
Emerson, J. B. et al. Schrödinger’s microbes: tools for distinguishing the living from the dead in microbial ecosystems. Microbiome 5, 86 (2017).
Giannoukos, G. et al. Efficient and robust RNA-Seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 13, 3 (2012).
Wang, Y., Hayatsu, M. & Fujii, T. Extraction of bacterial RNA from soil: challenges and solutions. Microbes Environ. 27, 111–121 (2012).
Tveit, A. T., Urich, T. & Svenning, M. M. Metatranscriptomic analysis of arctic peat soil microbiota. Appl. Environ. Microbiol. 80, 5761–5772 (2014).
Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl Acad. Sci. USA 111, E2329–E2338 (2014).
Maurice, C. F., Haiser, H. J. & Turnbaugh, P. J. Xenobiotics shape the physiology and gene expression of the active human gut microbiome. Cell 152, 39–50 (2013).
Bashiardes, S., Zilberman-Schapira, G. & Elinav, E. Use of metatranscriptomics in microbiome research. Bioinform. Biol. Insights. 10, 19–25 (2016).
Soergel, D. A. W., Dey, N., Knight, R. & Brenner, S. E. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).
Thompson, L. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–453 (2017). This study develops and implements standardized protocols and new analytical methods that enabled a massive comparison of over 100 studies to characterize the microbial diversity on Earth.
Glenn, T. C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).
Kunin, V., Engelbrektson, A., Ochman, H. & Hugenholtz, P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12, 118–123 (2010).
Reeder, J. & Knight, R. The ‘rare biosphere’: a reality check. Nat. Methods. 6, 636–637 (2009).
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 7, 335–336 (2010). This is a widely used software package for microbiome analysis.
Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).This is a widely used software package for microbiome analysis.
Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643 (2017).
Eren, A. M. et al. Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol. Evol. 4, 1111–1119 (2013).
Amir, A. et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems. 2, e00191–e00116 (2017).
Callahan, B. J. et al. DADA2: high resolution sample inference from Illumina amplicon data. Nat. Methods. 13, 581–583 (2016).
Lozupone, C. A. et al. “Meta-analyses of studies of the human microbiota”. Genome Res. 23, 1704–1714 (2013).
Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naïve bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).
McDonald, D. et al. An improved greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).
Kuczynski, J. et al. Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat. Methods. 7, 813–819 (2010).
Olm, M. R. et al. The source and evolutionary history of a microbial contaminant identified through soil metagenomic analysis. MBio. 8, e01969–16 (2017).
Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 4, 357–359 (2012).
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods. 12, 902–903 (2015).
Nguyen, N., Mirarab, S., Liu, B., Pop, M. & Warnow, T. TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30, 3548–3555 (2014).
Huson, D. H. et al. MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol 12, e1004957 (2016).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).
Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative analysis system. Nucleic Acids Res. 40, D115–D122 (2012).
Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, 1–6 (2016).
Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 1–10 (2014).
Prestat, E. et al. FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus. Nucleic Acids Res. 42, e145 (2014).
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 6237 (2015).
Xiao, L. et al. A catalog of the mouse gut metagenome. Nat. Biotechnol. 33, 1103–1108 (2015).
Qin, J. et al. A human gut microbial gene catalog established by metagenomic sequencing. Nature 464, 59–65 (2010). This study is the first large-scale effort to catalogue microbial genomes in the human gut using shotgun metagenomic sequencing.
Medema, M. H. et al. AntiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 39, W339–W346 (2011).
Howe, A. C. et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA 111, 4904–4909 (2014).
Ye, Y. & Tang, H. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32, 1001–1008 (2016).
Narayanasamy, S. et al. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol. 17, 260 (2016).
Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2014).
Vollmers, J., Wiegand, S. & Kaster, A. K. Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective - not only size matters! PLoS ONE 12, e0169662 (2017).
Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2015).
Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods. 11, 1144–1146 (2014).
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Laczny, C. C. et al. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3, 1 (2015).
Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, e1319 (2015).
White Iii, R. A. et al. ATLAS (Automatic Tool for Local Assembly Structures) -a comprehensive infrastructure for assembly, annotation, and genomic binning of metagenomic and metatranscriptomic data. PeerJ https://doi.org/10.7287/peerj.preprints.2843v1 (2017).
Treangen, T. J. et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 14, R2 (2013).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Sczyrba, A. et al. Critical assessment of metagenome interpretation–a benchmark of computational metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Barwell, L. J., Isaac, N. J. B. & Kunin, W. E. Measuring ß-diversity with species abundance data. J. Anim. Ecol. 84, 1112–1122 (2015).
Hamady, M., Lozupone, C. & Knight, R. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. 4, 17–27 (2010). This study underscores the power of incorporating phylogenetic information when comparing microbial communities.
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg Sci. 14, 927–930 (2003).
Anderson, M. J. & Walsh, D. C. I. What null hypothesis are you testing? PERMANOVA, ANOSIM and the Mantel test in the face of heterogeneous dispersions. Ecol. Monogr. 83, 557–574 (2013).
Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017).
McMurdie, P. J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).
Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. EMPeror: a tool for visualizing high-throughput microbial community data. GigaScience 2, 16 (2013).
Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Series B. Stat. Methodol. 44, 139–177 (1987).
Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis. 26, 27663 (2015).
Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1–13 (2016).
Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).
Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).
Kurtz, Z. D. et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11, e1004226 (2015).
Schwager, E., Mallick, H., Ventz, S. & Huttenhower, C. A. Bayesian method for detecting pairwise associations in compositional data. PLoS Comput. Biol. 13, e1005852 (2017).
Washburne, A. D. et al. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5, e2969 (2017).
Silverman, J. D., Washburne, A. D., Mukherjee, S. & David, L. A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 6, e21887 (2017).
Morton, J. T. et al. Balance trees reveal microbial niche differentiation. mSystems 2, e00162–00116 (2017).
Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).
Kleyer, H., Tecon, R. & Or, D. Resolving species level changes in a representative soil bacterial community using microfluidic quantitative. Front. Microbiol. 8, 2017 (2017).
Knights, D., Parfrey, L. W., Zaneveld, J., Lozupone, C. & Knight, R. Human-associated microbial signatures: examining their predictive value. Cell Host Microbe. 10, 292–296 (2011).
Yazdani, M. et al. Using machine learning to identify major shifts in human gut microbiome protein family abundance in disease. IEEE https://doi.org/10.1109/BigData.2016.7840731 (2016).
Huang, S. et al. Predictive modeling of gingivitis severity and susceptibility via oral microbiota. ISME J. 8, 1768–1780 (2014).
Teng, F. et al. Prediction of early childhood caries via spatial-temporal variations of oral microbiota. Cell Host Microbe. 18, 296–306 (2015).
Metcalf, J. L. et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 351, 158–162 (2016).
Subramanian, S. et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature 510, 417–421 (2014). This study demonstrates the power of machine learning with microbiome data by developing a microbiota maturity index.
Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods. 8, 761–763 (2011).
Lax, S. et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science 345, 1048–1052 (2014).
Roume, H. et al. A biomolecular isolation framework for eco-systems biology. ISME J. 7, 110–121 (2013).
Nicholson, J. K. & Lindon, J. C. Systems biology: metabonomics. Nature 455, 1054–1056 (2008).
Wang, R. & Seyedsayamdost, M. R. Hijacking exogenous signals to generate new secondary metabolites during symbiotic interactions. Nat. Rev. Chem. 1, 21 (2017).
Huan, T. et al. Systems biology guided by XCMS online metabolomics addressing reproducibility in single- laboratory phenotyping experiments. Nat. Methods 14, 461–462 (2017).
Hurley, J. R. & Cattell, R. B. The procrustes program: producing direct rotation to test a hypothesized factor structure. Behav. Sci. 7, 258–262 (1962).
Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).
Doledec, S. & Chessel, D. Co-inertia analysis: an alternative method for studying species-environment relationships. Freshwater Biol. 31, 277–294 (1994).
Boulesteix, A. & Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8, 32–44 (2007).
Witten, D. M. & Tibshirani, R. J. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8, 1–27 (2009).
Wilms, I. & Croux, C. Robust sparse canonical correlation analysis. BMC Syst. Biol. 10, 72 (2016).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Dhanasekaran, A. R., Pearson, J. L., Ganesan, B. & Weimer, B. C. Metabolome searcher: a high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction. BMC Bioinformatics. 16, 62 (2015).
Protsyuk, Ivan. et al. 3D molecular cartography using LC-MS combined with optimus and ‘ili software. Nat. Protoc. 13, 134–154 (2018).
McHardy, I. H. et al. Integrative analysis of the microbiome and metabolome of the human intestinal mucosal surface reveals exquisite inter-relationships. Microbiome 1, 17 (2013).
Whiteson, K. L. et al. Breath gas metabolites and bacterial metagenomes from cystic fibrosis airways indicate active pH neutral 2,3-butanedione fermentation. ISME J. 8, 1247–1258 (2014).
Theriot, C. M. et al. Antibiotic-induced shifts in the mouse gut microbiome and metabolome increase susceptibility to Clostridium difficile infection. Nat. Commun. 5, 3114 (2014). A great example of omics data integration (microbiome and metabolome data).
Erickson, A. R. et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease. PLoS ONE. 7, e49138 (2012).
Hultman, J. et al. Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature 521, 208–212 (2015).
Jagtap, P. D. et al. Metaproteomic analysis using the galaxy framework. Proteomics 15, 3553–3565 (2015).
Cheng, K. et al. MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5, 157 (2017).
Yilmaz, P. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specificaitons. Nat. Biotechnol. 29, 415–420 (2011).
Ríos-Covián, D. et al. Intestinal short chain fatty acids and their link with diet and human health. Front. Microbiol. 7, 185 (2016).
Balskus, E. P. Colibactin: understanding an elusive gut bacterial genotoxin. Nat. Prod. Rep. 32, 1534–1540 (2015).
Quinn, R. A. et al. Microbial, host and xenobiotic diversity in the cystic fibrosis sputum metabolome. ISME J. 95384, 1–16 (2015).
Fang, H., Huang, C., Zhao, H. & Deng, M. CCLasso: correlation inference for compositional data through lasso. Bioinformatics 31, 3172–3180 (2015).
Lê Cao, K. A., González, I. & Déjean, S. IntegrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).
Wikoff, W. R. et al. Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc. Natl Acad. Sci. USA 106, 3698–3703 (2009).
Liu, Z., Lozupone, C., Hamady, M., Bushman, F. D. & Knight, R. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res. 35, e120 (2007).
The Integrative HMP (iHMP) Research Network Consortium. The integrative human microbiome project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).
Korem, T. et al. Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 1101–1106 (2015).
Sangwan, N., Xia, F. & Gilbert, J. A. Recovering complete and draft population genomes from metagenome datasets. Microbiome 4, 8 (2016).
Bikel, S. et al. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput. Struct. Biotechnol. J. 13, 390–401 (2015).
Sultan, M. et al. Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics. 15, 675 (2014).
Peano, C. et al. An efficient rRNA removal method for RNA sequencing in GC-rich bacteria. Microb. Inform. Exp. 3, 1 (2013).
Acknowledgements
This review is informed by our work funded by the National Institutes of Health, National Science Foundation, Alfred P. Sloan Foundation, John Templeton Foundation and W. M. Keck Foundation, as well as that of hundreds of collaborators on the Human Microbiome Project, American Gut Project and Earth Microbiome Project.
Reviewer information
Nature Reviews Microbiology thanks J. Raes and other anonymous reviewers for their contributions to the peer review of this work.
Author information
Authors and Affiliations
Contributions
A.V. and B.C.T. researched the data for the article. A.G., T.K., D.M., J.N., J.G.S. and J.R.Z. substantially contributed to discussion of content. R.K., A.V., B.C.T., A.A., C.C., J.D., L.M., A.V.M., J.T.M., R.A.Q., L.R.T., A.T., Z.Z.X., Q.Z. and J.G.C. wrote the article. R.K., A.V., B.C.T., T.K., D.M., A.D.S. and P.C.D. reviewed and edited the manuscript before submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
EBI (http://www.ebi.ac.uk/)
Galaxy (https://usegalaxy.org/)
GitHub (https://github.com)
Jupyter Notebooks (http://jupyter.org)
QIIME 2 (https://qiime2.org)
Qiita (http://qiita.microbio.me)
R Markdown (https://rmarkdown.rstudio.com/)
Glossary
- Exact sequence variants
-
For marker gene sequencing, the exact DNA sequence for each read is used instead of operational taxonomic unit clustering.
- Operational taxonomic units
-
(OTUs). A group of closely related individuals or sequences (often 97% sequence similarity threshold).
- Machine learning
-
The use of algorithms to learn from and make predictions about data.
- Metadata
-
Information about the data. In many studies, this is structured as a matrix with samples as rows and metadata categories (age, sex, longitude, season, disease state, average monthly rainfall, and so on) as columns.
- Alpha diversity
-
A measure of within-sample diversity.
- Effect size analysis
-
Quantification of the magnitude of an effect of a particular metadata category (treatment group, sex and sequencing plate) on the data.
- Marker genes
-
Conserved genes (commonly 16S ribosomal RNA (rRNA), internal transcribed spacer (ITS) and 18 S rRNA) that typically contain a highly variable region that can be used for detailed identification that is flanked by highly conserved regions that can serve as binding sites for PCR primers.
- Nested statistical tests
-
Statistical tests that address variables related to the main effect. For example, soil plot would be a nested factor for testing the effects of a fertilizer on the soil microbiota.
- Coprophagic
-
Involving the consumption of faeces. Many animal species eat faeces to more efficiently break down plant matter by digesting the material twice.
- Reads
-
Inferred sequences of base pairs in a single DNA fragment.
- Metatranscriptome
-
The total content of gene transcripts from a community of organisms.
- Humic substances
-
Produced by biodegrading organic matter; humic substances are the main component of humus (soil).
- Metagenomes
-
The collection of genetic material from a community of organisms, for example, the genetic material from all microorganisms in the human gut microbiome.
- Naive Bayesian classifier
-
A simple probabilistic classifier used in machine learning that is based on applying Bayes’ theorem assuming strong independence between the features.
- K-mers
-
All possible sequences of length k from a read obtained through DNA sequencing.
- Beta diversity
-
A measure of similarity between samples.
- Faith’s phylogenetic diversity
-
An alpha diversity metric that uses a phylogenetic tree to compute sample diversity.
- Shannon index
-
A commonly used index to characterize species diversity in a community.
- False discovery rates
-
A method of understanding the rate of type I errors in null hypothesis testing when performing multiple comparisons.
- Isometric log ratio transform
-
(ilr). Converts a vector of proportions into a vector of log ratios using a tree as a reference. The computed log ratios consist of the difference of mean logarithms of species proportions between adjacent clades within the tree.
- Random forests regression
-
A machine learning technique that uses decision trees to perform classification.
- Family-wise error
-
The probability of making one or more type I errors (false discoveries) when performing multiple hypotheses tests.
Rights and permissions
About this article
Cite this article
Knight, R., Vrbanac, A., Taylor, B.C. et al. Best practices for analysing microbiomes. Nat Rev Microbiol 16, 410–422 (2018). https://doi.org/10.1038/s41579-018-0029-9
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41579-018-0029-9
This article is cited by
-
Decoding the impact of relic DNA on soil microbiomes: A new soil relic DNA removal method
Soil Ecology Letters (2025)
-
Association of microbial community structure with gill disease in marine-stage farmed Atlantic salmon (Salmo salar); a yearlong study
BMC Veterinary Research (2024)
-
Changes in amino acid concentrations and the gut microbiota composition are implicated in the mucosal healing of ulcerative colitis and can be used as noninvasive diagnostic biomarkers
Clinical Proteomics (2024)
-
Soil microbiome characterization and its future directions with biosensing
Journal of Biological Engineering (2024)
-
Dysregulated brain-gut axis in the setting of traumatic brain injury: review of mechanisms and anti-inflammatory pharmacotherapies
Journal of Neuroinflammation (2024)