phyloseq | 用 R 分析微生物组数据及可视化 (一)
phyloseq 包是一个集OTU 数据导入,存储,分析和图形可视化于一体的工具。它不但利用了 R 中许多经典的工具进行生态学和系统发育分析(例如:vegan,ade4,ape, picante),同时还结合 ggplot2 以轻松生成发表级别的可视化结果。phyloseq 使用的S4类将一个研究所有相关的测序数据及元数据存储为单个对象,从而更容易共享数据并重复结果。
GitHub 地址:https://github.com/joey711/phyloseq
安装
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("phyloseq", version = "3.8")
数据导入
一个 phyloseq 类,通常由以下几个部分组成:
otu_table
:一个数字矩阵matrix
,包含了 OTU 在每个样本中的丰度信息;sample_data
:一个data.frame
,包含了所有样本的表型信息,行名必须匹配otu_table
中的样本名;tax_table
:一个字符矩阵matrix
,包含了 OTU 的物种信息,行名必须匹配otu_table
中的 OTU 名。
在下面的例子中,我用 R 随机创造了一些模拟数据,要是有真实数据的话,也可以直接导入。
# Create a pretend OTU table that you read from a file, called otumat
otumat = matrix(sample(1:100, 100, replace = TRUE), nrow = 10, ncol = 10)
rownames(otumat) <- paste0("OTU", 1:nrow(otumat))
colnames(otumat) <- paste0("Sample", 1:ncol(otumat))
# Create a pretend taxonomy table
taxmat = matrix(sample(letters, 70, replace = TRUE), nrow = nrow(otumat), ncol = 7)
rownames(taxmat) <- rownames(otumat)
colnames(taxmat) <- c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species")
看一下我们导入的 OTU 丰度信息和 OTU 分类信息:
otumat
# Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8
# OTU1 96 50 36 35 59 80 83 63
# OTU2 52 67 39 39 37 57 20 15
# OTU3 94 18 15 11 14 75 1 12
# OTU4 27 88 98 100 59 27 30 30
# OTU5 26 66 93 85 41 30 100 41
# OTU6 17 16 97 86 18 25 94 31
# OTU7 63 19 16 43 89 25 17 63
# OTU8 31 92 22 14 58 1 45 2
# OTU9 100 33 19 77 43 1 14 69
# OTU10 13 35 80 43 34 45 24 47
# Sample9 Sample10
# OTU1 38 35
# OTU2 64 94
# OTU3 42 58
# OTU4 94 78
# OTU5 92 100
# OTU6 62 37
# OTU7 15 82
# OTU8 25 35
# OTU9 42 18
# OTU10 71 72
taxmat
接下来,我们需要将他们组合成一个 phyloseq 对象:
library("phyloseq")
OTU = otu_table(otumat, taxa_are_rows = TRUE)
TAX = tax_table(taxmat)
OTU
# OTU Table: [10 taxa and 10 samples]
# taxa are rows
# Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7 Sample8
# OTU1 96 50 36 35 59 80 83 63
# OTU2 52 67 39 39 37 57 20 15
# OTU3 94 18 15 11 14 75 1 12
# OTU4 27 88 98 100 59 27 30 30
# OTU5 26 66 93 85 41 30 100 41
# OTU6 17 16 97 86 18 25 94 31
# OTU7 63 19 16 43 89 25 17 63
# OTU8 31 92 22 14 58 1 45 2
# OTU9 100 33 19 77 43 1 14 69
# OTU10 13 35 80 43 34 45 24 47
# Sample9 Sample10
# OTU1 38 35
# OTU2 64 94
# OTU3 42 58
# OTU4 94 78
# OTU5 92 100
# OTU6 62 37
# OTU7 15 82
# OTU8 25 35
# OTU9 42 18
# OTU10 71 72
TAX
physeq = phyloseq(OTU, TAX)
physeq
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 10 taxa and 10 samples ]
# tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]
plot_bar(physeq, fill = "Family")
下面导入样本的表型信息:
# Create random sample data
sampledata = sample_data(data.frame(
Location = sample(LETTERS[1:4], size=nsamples(physeq), replace=TRUE),
Depth = sample(50:1000, size=nsamples(physeq), replace=TRUE),
row.names=sample_names(physeq),
stringsAsFactors=FALSE
))
sampledata
# Location Depth
# Sample1 D 337
# Sample2 B 74
# Sample3 D 68
# Sample4 C 397
# Sample5 B 142
# Sample6 D 970
# Sample7 D 69
# Sample8 C 253
# Sample9 A 497
# Sample10 D 237
使用ape
包建立 OTU 系统发育树并导入 phyloseq 对象:
library("ape")
random_tree = rtree(ntaxa(physeq), rooted=TRUE, tip.label=taxa_names(physeq))
plot(random_tree)
现在,我们有了otu_table
,sample_data
,tax_table
,phy_tree
这四类数据,以下两种方法都可以将他们合并为一个 phyloseq 对象:1. 使用merge_phyloseq
函数在之前创建的physeq
对象中加入sample_data
和phy_tree
数据;2. 使用physeq
函数重新创建一个physeq
对象。
physeq1 = merge_phyloseq(physeq, sampledata, random_tree)
physeq1
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 10 taxa and 10 samples ]
# sample_data() Sample Data: [ 10 samples by 2 sample variables ]
# tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]
# phy_tree() Phylogenetic Tree: [ 10 tips and 9 internal nodes ]
使用physeq
函数重新创建一个physeq
对象:
physeq2 = phyloseq(OTU, TAX, sampledata, random_tree)
physeq2
# phyloseq-class experiment-level object
# otu_table() OTU Table: [ 10 taxa and 10 samples ]
# sample_data() Sample Data: [ 10 samples by 2 sample variables ]
# tax_table() Taxonomy Table: [ 10 taxa by 7 taxonomic ranks ]
# phy_tree() Phylogenetic Tree: [ 10 tips and 9 internal nodes ]
看看用这两种方法创建的physeq
对象是否一样:
identical(physeq1, physeq2)
# [1] TRUE
尝试对这些数据进行一些可视化:
plot_tree(physeq1, color="Location", label.tips="taxa_names", ladderize="left", plot.margin=0.3)
plot_tree(physeq1, color="Depth", shape="Location", label.tips="taxa_names", ladderize="right", plot.margin=0.3)
再做一些热图:
plot_heatmap(physeq1)
plot_heatmap(physeq1, taxa.label="Phylum")
Reference
https://joey711.github.io/phyloseq/import-data.html
猜你喜欢
生信菜鸟团-专题学习目录(6)
生信菜鸟团-专题学习目录(7)
还有更多文章,请移步公众号阅读
▼ 如果你生信基本技能已经入门,需要提高自己,请关注下面的生信技能树,看我们是如何完善生信技能,成为一个生信全栈工程师。
▼ 如果你是初学者,请关注下面的生信菜鸟团,了解生信基础名词,概念,扎实的打好基础,争取早日入门。