Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 3;5(6):e01045-20.
doi: 10.1128/mSystems.01045-20.

Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity

Affiliations

Large-Scale Metagenome Assembly Reveals Novel Animal-Associated Microbial Genomes, Biosynthetic Gene Clusters, and Other Genetic Diversity

Nicholas D Youngblut et al. mSystems. .

Abstract

Large-scale metagenome assemblies of human microbiomes have produced a vast catalogue of previously unseen microbial genomes; however, comparatively few microbial genomes derive from other vertebrates. Here, we generated 5,596 metagenome-assembled genomes (MAGs) from the gut metagenomes of 180 predominantly wild animal species representing 5 classes, in addition to 14 existing animal gut metagenome data sets. The MAGs comprised 1,522 species-level genome bins (SGBs), most of which were novel at the species, genus, or family level, and the majority were enriched in host versus environment metagenomes. Many traits distinguished SGBs enriched in host or environmental biomes, including the number of antimicrobial resistance genes. We identified 1,986 diverse biosynthetic gene clusters; only 23 clustered with any MIBiG database references. Gene-based assembly revealed tremendous gene diversity, much of it host or environment specific. Our MAG and gene data sets greatly expand the microbial genome repertoire and provide a broad view of microbial adaptations to the vertebrate gut.IMPORTANCE Microbiome studies on a select few mammalian species (e.g., humans, mice, and cattle) have revealed a great deal of novel genomic diversity in the gut microbiome. However, little is known of the microbial diversity in the gut of other vertebrates. We studied the gut microbiomes of a large set of mostly wild animal species consisting of mammals, birds, reptiles, amphibians, and fish. Unfortunately, we found that existing reference databases commonly used for metagenomic analyses failed to capture the microbiome diversity among vertebrates. To increase database representation, we applied advanced metagenome assembly methods to our animal gut data and to many public gut metagenome data sets that had not been used to obtain microbial genomes. Our resulting genome and gene cluster collections comprised a great deal of novel taxonomic and genomic diversity, which we extensively characterized. Our findings substantially expand what is known of microbial genomic diversity in the vertebrate gut.

Keywords: animal microbiome; antimicrobial resistance; biosynthetic gene cluster; gut; metagenome assembly; novel diversity; vertebrate-microbe.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Large percentage of unmapped reads, even when using multiple comprehensive metagenome profiling databases. The dated host species phylogeny was obtained from http://timetree.org, with branches colored by host class. From inner to outer rings, the data mapped onto the tree are host diet, host captive/wild status, and mean number of metagenome reads mapped to various host-specific, nonmicrobial, and microbial databases. Note that captive/wild status sometimes differs among individuals of the same species. The databases are (i) representative of each publicly available genome from the host species (“vertebrata host genome”), (ii) all entries in the NCBI nucleotide (nt) database with taxonomy identifications matching host species (“vertebrata host nt”), (iii) the same as the previous category but with all Vertebrata sequences included, (iv) the Kraken2 “plant” database, (v) the Kraken2 “fungi” database, (vi) the Kraken2 “protozoa” database, and (vii) a custom bacterial and archaeal database created from the Genome Taxonomy Database, release 89 (“GTDB-r89”). Reads were mapped iteratively to each database in the order shown in the key (top to bottom), with only unmapped reads included in the next iteration. “Unclassified” reads did not map to any database, which were used along with reads mapping to GTDB-r89 for downstream analyses (“microbial + unclassified”).
FIG 2
FIG 2
A phylogeny of all 1,522 SGBs. From innermost to outermost rings, the data mapped onto the phylogeny are GTDB phylum-level taxonomic classifications, class-level taxonomies for Actinobacteriota, class-level taxonomies for Firmicutes, class-level taxonomies for Proteobacteria, taxonomic novelty, significant enrichment in host gut or environmental metagenomes, and significant enrichment in mammals or other animals in our multispecies gut metagenome data set. The phylogeny was inferred from multiple conserved loci via PhyloPhlAn. Orange dots on the phylogeny denote bootstrap values in the range of 0.7 to 1. The phylogeny is rooted on the last common ancestor of Archaea and Bacteria.
FIG 3
FIG 3
(A) Summary of the number of samples per biome for our multienvironment metagenome data set selected from the MGnify database. (B) Number of SGBs found to be significantly enriched in relative abundances in host (positive log2-fold change [“l2fc”]) versus environmental (negative l2fc) metagenomes. Values shown are the number of MAGs significantly enriched (blue) in either biome or not found to be significant (red). (C) Host- and environment-enriched SGBs have distinct traits. Phenotypes predicted based on MAG gene content (via Traitar [26]) are summarized for the SGBs significantly enriched in host or environmental metagenomes (DESeq2 adjusted P value of <0.01) or neither biome (“Neither” in the x axis facet). Note the difference in the x axis scale. Asterisks denote phenotypes significantly more prevalent in SGBs of the particular biome than in a null model of 1,000 permutations in which biome labels were shuffled among SGBs. See Table S3A in the supplemental material for all DESeq2 results. ONPG, o-nitrophenyl-β-d-galactopyranoside.
FIG 4
FIG 4
Phylogeny of all SGBs (n = 233) with ≥3 BGCs identified by AntiSMASH. From innermost to outermost rings, the data mapped onto the phylogeny are (i) GTDB phylum-level taxonomic classifications, (ii) taxonomic novelty, (iii) significant enrichment in host or environmental metagenomes, (iv) the prevalence of BGC families across the multispecies metagenome data set, and (v) the number of BGCs identified in the MAG. Prevalence is the maximum of any BGC family for that BGC type, and only BGC families with a prevalence of ≥25% are shown. The phylogeny is a pruned version of that shown in Fig. 2. Orange dots on the phylogeny denote bootstrap values in the range of 0.7 to 1. “NPRS”, “PKS,” and “RiPPs” stand for nonribosomal peptide synthetase, polyketide synthase, and ribosomally synthesized and posttranslationally modified peptides, respectively.
FIG 5
FIG 5
Summary of the 50% sequence identity clusters generated from the gene-based metagenome assembly of the combined data set. (A) Total number of gene clusters per phylum. For clarity, only phyla with ≥100 clusters are shown. Labels on each bar list the number of clusters (and percentage of the total). (B) Number of bacterial gene clusters per phylum and COG category. The “P” facet label refers to “poorly characterized.” (C) Number of archaeal gene clusters per class (all belonging to the Euryarchaeota) and COG category. (D) Number of viral gene clusters per COG category. (E) Number of clusters annotated as each CAZy family. For clarity, only phyla with ≥100 clusters are shown. Labels next to each bar denote the number of clusters. (F) Number of clusters per CAZy family, broken down by phylum. CAZy families and phyla are ordered by most to least clusters. For clarity, only CAZy families and phyla with ≥100 total clusters are shown.
FIG 6
FIG 6
Enrichment of gene clusters grouped by phylum and COG category (A), KEGG pathway (B), or CAZy family (C). Only groupings significantly enriched in abundance (DESeq2 adjusted P value of <1e−5) in either biome are shown. Only gene clusters observed in at least 25% of the metagenomes were included. For clarity, only KEGG pathways enriched in >7 phyla are shown, and only CAZy families enriched in >1 phylum are shown. Note that the axes are flipped in panel B relative to panels A and C. See Tables S5A to C in the supplemental material for all DESeq2 results. TCA, tricarboxylic acid.

Similar articles

Cited by

References

    1. Nayfach S, Rodriguez-Mueller B, Garud N, Pollard KS. 2016. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res 26:1612–1625. doi:10.1101/gr.201863.115. - DOI - PMC - PubMed
    1. Thomas AM, Segata N. 2019. Multiple levels of the unknown in microbiome research. BMC Biol 17:48. doi:10.1186/s12915-019-0667-z. - DOI - PMC - PubMed
    1. Wang W-L, Xu S-Y, Ren Z-G, Tao L, Jiang J-W, Zheng S-S. 2015. Application of metagenomics in the human gut microbiome. World J Gastroenterol 21:803–814. doi:10.3748/wjg.v21.i3.803. - DOI - PMC - PubMed
    1. Zou Y, Xue W, Luo G, Deng Z, Qin P, Guo R, Sun H, Xia Y, Liang S, Dai Y, Wan D, Jiang R, Su L, Feng Q, Jie Z, Guo T, Xia Z, Liu C, Yu J, Lin Y, Tang S, Huo G, Xu X, Hou Y, Liu X, Wang J, Yang H, Kristiansen K, Li J, Jia H, Xiao L. 2019. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat Biotechnol 37:179–185. doi:10.1038/s41587-018-0008-8. - DOI - PMC - PubMed
    1. Forster SC, Kumar N, Anonye BO, Almeida A, Viciani E, Stares MD, Dunn M, Mkandawire TT, Zhu A, Shao Y, Pike LJ, Louie T, Browne HP, Mitchell AL, Neville BA, Finn RD, Lawley TD. 2019. A human gut bacterial genome and culture collection for improved metagenomic analyses. Nat Biotechnol 37:186–192. doi:10.1038/s41587-018-0009-7. - DOI - PMC - PubMed