Cell子刊:人类微生物组参考基因集中的单体基因
人类肠道和口腔微生物组中遗传基因全貌
The Landscape of Genetic Content in the Gut and Oral Human Microbiome
Cell Host and Microbe [IF:15.753]
2019-08-14 Resource
DOI: https://doi.org/10.1016/j.chom.2019.07.008
第一作者:Braden T. Tierney1,2,3,4
通讯作者:Chirag J. Patel4,*, Aleksandar D. Kostic1,2,3,*
其它作者:Zhen Yang,1,2,3,5 Jacob M. Luber,1,2,3,4 Marc Beaudin,1,2,3,6 Marsha C. Wibowo,1,2,3 Christina Baek,7 Eleanor Mehlenbacher,8
作者单位:
1 美国马萨诸塞州波士顿市,乔斯林糖尿病中心,病理生理学和分子药理学部(1Section on Pathophysiology and Molecular Pharmacology, Joslin Diabetes Center, Boston, MA, USA)
2 美国马萨诸塞州波士顿市,乔斯林糖尿病中心,胰岛细胞和再生生物学部(Section on Islet Cell and Regenerative Biology, Joslin Diabetes Center, Boston, MA, USA)
3 美国马萨诸塞州波士顿市,哈佛医学院,微生物学与免疫生物学系(Department of Microbiology and Immunobiology, Harvard Medical School, Boston, MA, USA)
4 美国马萨诸塞州波士顿市,哈佛医学院,生物医学信息学系(Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA)
热心肠日报
https://www.mr-gut.cn/papers/read/1085985827
Cell子刊:人体中一半的微生物基因是“独特的”
创作:李丹宜 审核:李丹宜 08月21日
原标题:人类肠道和口腔微生物组中遗传内容景观
纳入3655个人类宏基因组样本(口腔1473个、肠道2182个)进行荟萃分析,鉴定出>4566万个非冗余的微生物基因;
其中半数基因具有个体特异性(仅在1个样本中出现过),命名为“单基因(singleton)”;
单基因在分类学和功能方面与非单基因有明显差异,单基因具有更多未知功能,可能有更多的生态位特异性(口腔vs肠道)功能,参与多样化的生物合成和降解途径;
单基因源于稀有菌株,这类稀有菌株可作为个体的“微生物指纹”。
主编评语:人类微生物组中有多少微生物基因?Cell Host and Microbe上周发表的一项研究分析了3600多个人口腔或肠道的宏基因组,鉴定出超过4500万个微生物基因,发现每个人有一半的微生物基因是独特的,这些基因具有多样化的潜在功能,主要存在于稀有菌株中。这些发现揭示了人类微生物组基因巨大的多样性,为进一步分析微生物组基因和功能与宿主健康的关系打下基础。
摘要
尽管人类微生物组的物种多样性及其在疾病中的作用具有重要意义,但其遗传多样性的规模(这是破译人类 - 微生物相互作用的基础)尚未量化。在这里,我们对来自两个人体生理位置(口腔和肠道)的宏基因组进行了交叉研究综合分析,涵盖了来自13项研究的3,655个样本。我们在数据集中发现了惊人的遗传异质性,在95%的相似度水平上鉴定了总共45,666,334个非冗余基因(口腔23,961,508和肠道22,254,436)。所有基因的百分之五十是“单体”,即单个宏基因组样本独有。与非单体相比,单体丰富不同的功能,并且来自亚群特异性微生物菌株。总之,这些结果为在微生物组衍生的人表型中观察到的无法解释的异质性提供了潜在的基础。以这些数据的基础,我们构建了一个资源,可以通过 https://microbial-genes.bio 检索。
Despite substantial interest in the species diversity of the human microbiome and its role in disease, the scale of its genetic diversity, which is fundamental to deciphering human-microbe interactions, has not been quantified. Here, we conducted a cross-study meta-analysis of metagenomes from two human body niches, the mouth and gut, covering 3,655 samples from 13 studies. We found staggering genetic heterogeneity in the dataset, identifying a total of 45,666,334 non-redundant genes (23,961,508 oral and 22,254,436 gut) at the 95% identity level. Fifty percent of all genes were “singletons,” or unique to a single metagenomic sample. Singletons were enriched for different functions (compared with non-singletons) and arose from sub-population-specific microbial strains. Overall, these results provide potential bases for the unexplained heterogeneity observed in microbiome-derived human phenotypes. One the basis of these data, we built a resource, which can be accessed at https://microbial-genes.bio.
主要结果
表1. 文章中名词定义
Table 1. Table of Definitions Used in the Paper
图1. 综合分析口腔和肠道微生物组
Figure 1. Meta-analysis of the Oral and Gut Microbiomes
(A和B)我们汇总了可公开获得的口腔和肠道短读长数据并将其组装成重叠群(本研究中,每个重叠群来自单个样品组装结果)。(C)在组装的重叠群上鉴定基因开放阅读框(ORF)。(D)ORF以95%同一性聚类以鉴定非冗余基因集。(E)数据库内容、后端的和用户界面(UI)的描述(从向往上,箭头方向)。(F-K)下游单体分析流程。在(F)中,我们在数据集中识别单体和非单体,在(G)中比较它们的功能注释。在(H)中,我们然后将基因映射到重叠群,我们将其分为3类:单体重叠群(仅由单体组成),非单体重叠群(仅由非单体构成的重叠群)和混合重叠群(由单体和非单体重叠群组成)(I)中,我们过滤短的重叠群,其它再根据其基因序列的物种分类进行分箱。然后,我们试图将单体的来源鉴定为(J)水平基因转移(HGT)和/或(K)富含单体的稀有微生物菌株。
(A and B) We aggregated publicly available oral and gut short read data and assembled it into contigs (in this example, each contig comes from a single sample). (C) Gene open-reading-frames (ORFs) are identified on assembled contigs. (D) ORFs are clustered at 95% identity to identify a non-redundant gene catalog. (E) Database content, description of backend, description of user interface (UI). (F–K) Downstream singleton analytical pipeline. In (F), we identify singletons and non-singletons in our dataset and in (G) compare their functional annotations. In (H), we then map genes to contigs, which we grouped into 3 categories: singleton-contigs (those consisting of only singletons), non-singleton contigs (those consisting of only non-singletons), and mixture contigs (those consisting of both singletons and non-singletons). In (I), we filter short contigs and bin the remainder according to the taxonomic classification of their gene content. We then attempted to identify the source of singletons as either (J) horizontal gene transfer (HGT) and/or (K) rare, singleton-rich microbial strains.
图2.口腔和肠道微生物组的遗传多样性
Figure 2. The Genetic Diversity of the Oral and Gut Microbiomes
(A)口腔和肠道微生物组之间遗传内容(95%同一性水平)的重叠。
(B)口腔(蓝色)和肠道(红色)基因集中以95%同一性分布ORF簇大小。
(C)氨基酸基因集的迭代聚类。
(D)以50%同一性水平产生的氨基酸基因集的基因簇大小的分布。
(E)Sorensen-Dice指数测量所有个体对之间基因含量的不相似性。
(F)Sorensen-Dice在MetaPhlAn2衍生物种含量方面的个体差异。
(A) The overlap in genetic content (95% identity level) between the oral and gut microbiomes.
(B) Distribution of ORF cluster sizes at 95% identity in our oral (blue) and gut (red) gene catalogs.
(C) Iterative clustering of our amino acid gene catalogs.
(D) Distribution of gene cluster sizes for amino acid gene catalogs generated at the 50% identity level.
(E) Sorensen-Dice index measuring dissimilarity in gene content between all pairs of individuals.
(F) Sorensen-Dice dissimilarity of individuals in terms of MetaPhlAn2-derived species content.
图3. 口腔和肠道微生物组的已知和未知功能多样性
Figure 3. The Known and Unknown Functional Diversity of the Oral and Gut Microbiomes
(A和B)单体(A)和非单体(B)的在口腔和肠道微生物组中功能性注释比例。用途径注释标记的基因用于Minpath分析。
(C)Sorensen-dice在总体通路含量方面的个体差异。
(A and B) Fractions of singletons (A) and non-singletons (B) functionally annotated in the oral and gut microbiomes. Genes labeled with pathway annotations were used in the Minpath analyses.
(C) Sorensen-dice dissimilarity of individuals in terms of overall pathway content.
图4. 单体和非单体在肠道和口腔生态位中富集的功能
Figure 4. Enrichment of Functions in Gut and Oral Niches for Singletons and Non-Singletons
在这里,我们展示了口腔单体(A),口腔非单体(B),肠单体(C)和肠非单体(D)的前50个最富集途径。条形图表示Fisher精确检验的优势比,包括95%置信区间。蓝色条是口腔和肠道非单体富集的途径,红色条是口腔和肠道单体富集的途径,绿色条是口腔单体和肠道非单体富集的途径。
Here, we display the top 50 most enriched pathways for oral singletons (A), oral non-singletons (B), gut singletons (C), and gut non-singletons (D). Bars represent odds ratios from a Fisher’s Exact Test and include 95% confidence intervals. Blue bars are pathways enriched in both oral and gut non-singletons, red bars are pathways enriched in both oral and gut singletons, and the green bar is a pathway enriched in both oral singletons and gut non-singletons.
图5. 单体分类群作为亚群特异性稀有菌株
Figure 5. Singleton Taxa as Sub-population-Specific, Rare Strains
(A)口腔和肠道微生物组中单体和非单体重叠群的分类学注释计数。
(B)宏基因组单体个重叠群和非单个重叠群的在不同分类群中数量显著差异。每个点代表不同的分类标注。
(C)菌株特异性’指纹’的实例。每双行对应于包含至少两个基因的单体和非单体重叠群,这些基因被分类到相同的分类学注释中。列是不同的宏基因组样本(每个对应于不同的个体)。绿色框对应单例重叠群。红色框对应于非单体重叠群。
(A) Counts of taxonomic annotations for singleton and non-singleton contigs in the oral and gut microbiomes.
(B) Number of metagenomes singleton contigs and non-singleton contigs are present in for different taxonomies. Each point represents a different taxonomic annotation.
(C) Examples of strain-specific ‘‘fingerprints.’’ Each pair of rows corresponds to singleton and non-singleton contigs containing at least two genes that were binned into the same taxonomic annotation. Columns are different metagenomic samples (each corresponding to a different individual). Green boxes correspond to singleton contigs. Red boxes correspond to non-singleton contigs.
图6. 估计人类微生物组的基因含量
Figure 6. Extrapolating the Gene Content of the Human Microbiome
(A和B)使用曲估计基因问题符合我们的口腔微生物组数据(A)和肠道微生物组数据(B)。黄色虚线划分了每个样本观察某些新单体百分比所需的采样。紫色虚线标志着这项研究的规模。绿色虚线是口腔微生物组中基因的渐近数。
(C和D)用于估计口腔/肠道生态位中总基因含量的替代的,更保守的外推法。
(A and B) Extrapolation of the universe of genes using curves fit to our oral microbiome data (A) and gut microbiome data (B). Yellow dashed lines demarcate sampling required to observe certain percentages of new singletons per sample. Purple dashed line marks size of this study. Green dashed line is the asymptotic number of genes in the oral microbiome.
(C and D) Alternative, more conservative extrapolation methods for estimating total gene content in the oral/gut niches.
总结
本文之所以分析公共数据,即可以发表在15分+的Cell子刊,它的优点如下:
之前的基因集更多关注肠道,本文使用了大量口腔样本。
作为消化道的头和尾,都是研究的重点,同时也有大量积累的公共数据;
单体Singletons在扩增子研究中肯定是要被扔掉的,之前的研究也一直以为可能是污染、低丰度等。
本文的角度新,逻辑合理,让大家看到了想而没做的事;
历史时机合适,近年来快速积累的大项目和大数据,为本研究提供了可能。
目前的数据已经很多,更深入的挖掘并发现规律才是王道。
数据是不是你测的并不重要。
近期基于公共数据发表的文章非常多,尤其是Cell和期子刊。大家都什么好的想法,抓紧搞起来,机会只给有准备的人。
下面就有几篇基于公共数据挖掘的顶级文章,供参考:
你花几千万没有冲上Cell,别人挖挖公共数据接二连三发Cell,是不是值得思考。
Reference
Tierney, B. T. et al. The Landscape of Genetic Content in the Gut and Oral Human Microbiome. Cell Host & Microbe 26, 283-295.e288, doi:10.1016/j.chom.2019.07.008 (2019).
猜你喜欢
10000+:菌群分析 宝宝与猫狗 梅毒狂想曲 提DNA发Nature Cell专刊 肠道指挥大脑
文献阅读 热心肠 SemanticScholar Geenmedical
16S功能预测 PICRUSt FAPROTAX Bugbase Tax4Fun
生物科普: 肠道细菌 人体上的生命 生命大跃进 细胞暗战 人体奥秘
写在后面
为鼓励读者交流、快速解决科研困难,我们建立了“宏基因组”专业讨论群,目前己有国内外5000+ 一线科研人员加入。参与讨论,获得专业解答,欢迎分享此文至朋友圈,并扫码加主编好友带你入群,务必备注“姓名-单位-研究方向-职称/年级”。PI请明示身份,另有海内外微生物相关PI群供大佬合作交流。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍未解决群内讨论,问题不私聊,帮助同行。
学习16S扩增子、宏基因组科研思路和分析实战,关注“宏基因组”