基于phyloseq的排序分析

Original 文涛聊科研微生信生物 2022-05-08

wentao

2019年5月16日

载入本次分析需要的包

phyloseq的排序函数为我们的排序分析进行了极大的简化，保证计算的同时兼顾可视化工作。

library("phyloseq"); packageVersion("phyloseq")## [1] '1.26.1'

data(GlobalPatterns)
library("ggplot2"); packageVersion("ggplot2")

## [1] '3.1.0'library("cluster"); packageVersion("cluster")## [1] '2.0.7.1'library("plyr"); packageVersion("plyr")## [1] '1.8.4'

设置主题准备数据

theme_set(theme_bw())

数据得到并进行过滤

count数量在一半样品中出现抵低于5的OTU

GP = GlobalPatterns
wh0 = genefilter_sample(GP, filterfun_sample(function(x) x > 5), A=0.5*nsamples(GP))
GP1 = prune_taxa(wh0, GP)

相对丰度标准化，这里使用10的六次方

GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x))

仅仅保留主要的五个门类的OTU

phylum.sum = tapply(taxa_sums(GP1), tax_table(GP1)[, "Phylum"], sum, na.rm=TRUE)
top5phyla = names(sort(phylum.sum, TRUE))[1:5]
GP1 = prune_taxa((tax_table(GP1)[, "Phylum"] %in% top5phyla), GP1)

get_variable 函数返回mapping文件中的目标列

get_variable函数结合需要判断的变量，将分组分为两类，并添加到mapping文件中

human = get_variable(GP1, "SampleType") %in% c("Feces", "Mock", "Skin", "Tongue")
sample_data(GP1)$human <- factor(human)

phyloaeq封装的排序脚本使用

ordinate函数，我们只需要定义使用的排序方法：method，和使用的距离参数 distance,下面是基于brayZ距离的NMDS排序

下面展示loading矩阵信息

GP.ord <- ordinate(GP1, "NMDS", "bray")

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1333468 
## Run 1 stress 0.1333468 
## ... Procrustes: rmse 4.32893e-06  max resid 7.783055e-06 
## ... Similar to previous best
## Run 2 stress 0.1637672 
## Run 3 stress 0.1844742 
## Run 4 stress 0.1529686 
## Run 5 stress 0.1498725 
## Run 6 stress 0.1469145 
## Run 7 stress 0.1505794 
## Run 8 stress 0.1891198 
## Run 9 stress 0.1452642 
## Run 10 stress 0.1488978 
## Run 11 stress 0.1469145 
## Run 12 stress 0.1685942 
## Run 13 stress 0.1333468 
## ... Procrustes: rmse 5.818375e-06  max resid 1.112148e-05 
## ... Similar to previous best
## Run 14 stress 0.1675647 
## Run 15 stress 0.171298 
## Run 16 stress 0.1385323 
## Run 17 stress 0.1710784 
## Run 18 stress 0.1385323 
## Run 19 stress 0.1498724 
## Run 20 stress 0.1385323 
## *** Solution reached

p1 = plot_ordination(GP1, GP.ord, type="taxa", color="Phylum", title="taxa")
print(p1)

### 可选分面loading展示，更加清晰直观

p1 + facet_wrap(~Phylum, 3)

### 对样品进行排序

type 选择samples 对样品进行排序，这也是我们使用更多的一种方法。

p2 = plot_ordination(GP1, GP.ord, type="samples", color="SampleType", shape="human") 
p2 + geom_polygon(aes(fill=SampleType)) + geom_point(size=5) + ggtitle("samples")

### biplot 选项将样品和loading一同展示

但是重叠很严重

p3 = plot_ordination(GP1, GP.ord, type="biplot", color="SampleType", shape="Phylum", title="biplot")
# Some stuff to modify the automatic shape scale
GP1.shape.names = get_taxa_unique(GP1, "Phylum")
GP1.shape <- 15:(15 + length(GP1.shape.names) - 1)
names(GP1.shape) <- GP1.shape.names
GP1.shape["samples"] <- 16
p3 + scale_shape_manual(values=GP1.shape)

### split 通过将两者分开分，面展示解决问题

p4 = plot_ordination(GP1, GP.ord, type="split", color="Phylum", shape="human", label="SampleType", title="split") 
p4

### 自定义颜色

gg_color_hue <- function(n){
    hues = seq(15, 375, length=n+1)
    hcl(h=hues, l=65, c=100)[1:n]
}
color.names <- levels(p4$data$Phylum)
p4cols <- gg_color_hue(length(color.names))
names(p4cols) <- color.names
p4cols["samples"] <- "black"
p4 + scale_color_manual(values=p4cols)

### 下面使用多种排序方法进行排序，并比对

使用llply函数

dist = "bray"
ord_meths = c("DCA", "CCA", "RDA", "DPCoA", "NMDS", "MDS", "PCoA")
plist = llply(as.list(ord_meths), function(i, physeq, dist){
        ordi = ordinate(physeq, method=i, distance=dist)
        plot_ordination(physeq, ordi, "samples", color="SampleType")
}, GP1, dist)

## Square root transformation
## Wisconsin double standardization
## Run 0 stress 0.1333468 
## Run 1 stress 0.1528906 
## Run 2 stress 0.1827625 
## Run 3 stress 0.1333468 
## ... New best solution
## ... Procrustes: rmse 3.155156e-06  max resid 7.816854e-06 
## ... Similar to previous best
## Run 4 stress 0.1468477 
## Run 5 stress 0.1333468 
## ... New best solution
## ... Procrustes: rmse 2.975489e-06  max resid 7.840301e-06 
## ... Similar to previous best
## Run 6 stress 0.1681228 
## Run 7 stress 0.16608 
## Run 8 stress 0.1521722 
## Run 9 stress 0.1385323 
## Run 10 stress 0.1570722 
## Run 11 stress 0.1455574 
## Run 12 stress 0.1510167 
## Run 13 stress 0.1653386 
## Run 14 stress 0.1658681 
## Run 15 stress 0.1385327 
## Run 16 stress 0.1507161 
## Run 17 stress 0.169588 
## Run 18 stress 0.1469145 
## Run 19 stress 0.180481 
## Run 20 stress 0.1518735 
## *** Solution reached

提取作图所需数据

names(plist) <- ord_meths
pdataframe = ldply(plist, function(x){
    df = x$data[, 1:2]
    colnames(df) = c("Axis_1", "Axis_2")
    return(cbind(df, x$data))
})
names(pdataframe)[1] = "method"

p = ggplot(pdataframe, aes(Axis_1, Axis_2, color=SampleType, shape=human, fill=SampleType))
p = p + geom_point(size=4) + geom_polygon()
p = p + facet_wrap(~method, scales="free")
p = p + scale_fill_brewer(type="qual", palette="Set1")
p = p + scale_colour_brewer(type="qual", palette="Set1")
p

### 选择其中一个展示

plist[[2]]

### 对展示图形进行修饰

p = plist[[2]] + scale_colour_brewer(type="qual", palette="Set1")
p = p + scale_fill_brewer(type="qual", palette="Set1")
p = p + geom_point(size=5) + geom_polygon(aes(fill=SampleType))
p

### 使用unifrac距离进行PCoa分析

ordu = ordinate(GP1, "PCoA", "unifrac", weighted=TRUE)
plot_ordination(GP1, ordu, color="SampleType", shape="human")

展示排序结果

p = plot_ordination(GP1, ordu, color="SampleType", shape="human")
p = p + geom_point(size=7, alpha=0.75)
p = p + scale_colour_brewer(type="qual", palette="Set1")
p + ggtitle("MDS/PCoA on weighted-UniFrac distance, GlobalPatterns")

当我们将排序做完之后

这里仅仅做了排序的分析，其实我们的排序需要进行检验。

震撼！上海开放Google、Meta等国际平台，中国数字化迈出历史性一步！

战争烈度再升级，特朗普加征10%对华关税

特朗普称计划对中国商品征收25%的关税，外交部回应

从地方目标看全国！今年经济增长还是有很高要求

二姐李思林早知道李玟有轻生的念头，居然没一点防备措施！

基于phyloseq的排序分析

wentao

2019年5月16日

载入本次分析需要的包

设置主题准备数据

数据得到并进行过滤

相对丰度标准化，这里使用10的六次方

仅仅保留主要的五个门类的OTU

get_variable 函数返回mapping文件中的目标列

phyloaeq封装的排序脚本使用

提取作图所需数据

展示排序结果

当我们将排序做完之后

欢迎加入微生信生物讨论群，扫描下方二维码添加小编微信，小编带你入伙啦，大牛如云，让交流变得简单。

历史目录

R语言分析技术

扩增子专题

基于phyloseq的微生物群落分析

代谢组专题

当科研遇见python

科学知识图谱

杂谈

您可能也对以下帖子感兴趣

震撼！上海开放Google、Meta等国际平台，中国数字化迈出历史性一步！

战争烈度再升级，特朗普加征10%对华关税

特朗普称计划对中国商品征收25%的关税，外交部回应

从地方目标看全国！今年经济增长还是有很高要求

二姐李思林早知道李玟有轻生的念头，居然没一点防备措施！

生成图片，分享到微信朋友圈

基于phyloseq的排序分析

wentao

2019年5月16日

载入本次分析需要的包

设置主题 准备数据

数据得到 并进行过滤

相对丰度标准化，这里使用10的六次方

仅仅保留主要的五个门类的OTU

get_variable 函数返回mapping文件中的目标列

phyloaeq封装的排序脚本使用

提取作图所需数据

展示排序结果

当我们将排序做完之后

欢迎加入微生信生物讨论群，扫描下方二维码添加小编微信，小编带你入伙啦，大牛如云，让交流变得简单。

历史目录

R语言分析技术

扩增子专题

基于phyloseq的微生物群落分析

代谢组专题

当科研遇见python

科学知识图谱

杂谈

您可能也对以下帖子感兴趣

设置主题准备数据

数据得到并进行过滤