cellphonedb 及其可视化
分享是一种态度
作者 | 周运来
男,
一个长大了才会遇到的帅哥,
稳健,潇洒,大方,靠谱。
一段生信缘,一棵技能树。
生信技能树核心成员,单细胞天地特约撰稿人,简书创作者,单细胞数据科学家。
相关阅读
细胞配受体通识以及常见细胞分泌信号通路
CellChat:细胞间相互作用分析利器
CellChat:细胞间相互作用在线内容
Cell-Cell Interaction Database|| 单细胞配受体库你还在文章的附录里找吗?
celltalker:单细胞转录组配受体分析
Network在单细胞转录组数据分析中的应用
正文
在单细胞数据分析过程中,上游的拆库定量,中游的分群注释,目前的门槛是很低的了。但是解析细胞类型异质性不应止于这些,还可以看细胞群之间的通讯。当然,这方面我们介绍过CellChat:细胞间相互作用分析利器。CellChat是以信号通路为单位来计算细胞间交流状态的,很多同学用cellphonedb来做基于配受体对的细胞间交流。今天,我们就来用经典的pbmc3k数据跑一下cellphonedb,并尝试可视化。
先放一张想象图:
来自文章:Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma
照例,请出我们的pbmc3k数据:
library(Seurat)
library(SeuratData)
pbmc3k
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features, 0 variable features)
head(pbmc3k@meta.data)
orig.ident nCount_RNA nFeature_RNA seurat_annotations
AAACATACAACCAC pbmc3k 2419 779 Memory CD4 T
AAACATTGAGCTAC pbmc3k 4903 1352 B
AAACATTGATCAGC pbmc3k 3147 1129 Memory CD4 T
AAACCGTGCTTCCG pbmc3k 2639 960 CD14+ Mono
AAACCGTGTATGCG pbmc3k 980 521 NK
AAACGCACTGGTAC pbmc3k 2163 781 Memory CD4 T
可以看出已经注释好了的,接下来我们写出跑cellphonedb的文件:表达谱和细胞分组文件。
write.table(as.matrix(pbmc3k@assays$RNA@data), 'cellphonedb_count.txt', sep='\t', quote=F)
meta_data <- cbind(rownames(pbmc3k@meta.data), pbmc3k@meta.data[,'seurat_annotations', drop=F])
meta_data <- as.matrix(meta_data)
meta_dat[is.na(meta_data)] = "Unkown" # 细胞类型中不能有NA
write.table(meta_data, 'cellphonedb_meta.txt', sep='\t', quote=F, row.names=F)
假设诸君已经安装好cellphonedb ,并加入环境变量了,我们可以这样跑:
cellphonedb method statistical_analysis cellphonedb_meta.txt cellphonedb_count.txt --counts-data=gene_name # 如果是直接写出基因名的加这个参数,转化为基因ID的话不用加。
#cellphonedb 自己的绘图
cellphonedb plot dot_plot
cellphonedb plot heatmap_plot cellphonedb_meta.txt
cellphonedb 结果文件需要挨个打开看看,不要只看名字。更多细节可以看官网的介绍,记住,官网对你是开放的。
https://www.cellphonedb.org/documentation
ree out/
out/
├── count_network.txt # 绘制网络图文件
├── deconvoluted.txt
├── heatmap_count.pdf
├── heatmap_log_count.pdf
├── interaction_count.txt
├── means.txt # 绘制点图文件
├── plot.pdf
├── pvalues.txt # 绘制点图文件
└── significant_means.txt
下面在R中对结果进行可视化演练,主要是网络图和点图。点图我们知道ggplot的同学都不陌生了,就是网络图一看就让人头大,好在之前你们家运来小哥哥抄完了网络数据的统计分析:R语言实战一书,算是知道些皮毛,可以在这里用上啦。
pbmc='D:\\SingleCell\\out\\' ## outs 文件放在这里了。
library(psych)
library(qgraph)
library(igraph)
netf<- "count_network.txt"
mynet <- read.delim(paste0(pbmc,"count_network.txt"), check.names = FALSE)
head(mynet)
SOURCE TARGET count
1 Memory CD4 T Memory CD4 T 3
2 Memory CD4 T B 10
3 Memory CD4 T CD14+ Mono 14
4 Memory CD4 T NK 15
5 Memory CD4 T CD8 T 6
6 Memory CD4 T Naive CD4 T 3
net<- graph_from_data_frame(mynet)
plot(net)
读入网络数据,构建网络,并绘制最简单的一张网络图:
为了给这个网络图的边点mapping上不同的属性我们引入一串颜色。
allcolour=c("#DC143C","#0000FF","#20B2AA","#FFA500","#9370DB",
"#98FB98","#F08080","#1E90FF","#7CFC00","#FFFF00",
"#808000","#FF00FF","#FA8072","#7B68EE","#9400D3",
"#800080","#A0522D","#D2B48C","#D2691E","#87CEEB",
"#40E0D0","#5F9EA0","#FF1493",
"#FFE4B5","#8A2BE2","#228B22","#E9967A","#4682B4",
"#32CD32","#F0E68C","#FFFFE0","#EE82EE","#FF6347",
"#6A5ACD","#9932CC","#8B008B","#8B4513","#DEB887")
karate_groups <- cluster_optimal(net)
coords <- layout_in_circle(net, order =
order(membership(karate_groups))) # 设置网络布局
E(net)$width <- E(net)$count/10 # 边点权重(粗细)
plot(net, edge.arrow.size=.1,
edge.curved=0,
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color="black",
layout = coords,
vertex.label.cex=.7)
经过以上一番修饰后,我们得到的网络图如下:
但是边的颜色和点的颜色还是对应不上,我们来修改一番边的属性。
net2 <- net # 复制一份备用
for (i in 1: length(unique(mynet$SOURCE)) ){
E(net)[map(unique(mynet$SOURCE),function(x) {
get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
})%>% unlist()]$color <- allcolour[i]
} # 这波操作谁有更好的解决方案?
plot(net, edge.arrow.size=.1,
edge.curved=0,
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color="black",
layout = coords,
vertex.label.cex=.7)
至此,文章开头的第一张图就复现出来了。我们观察到由于箭头是双向的,所以两个细胞类型之间的线条会被后来画上去的覆盖掉(开头那张网络图也有这个问题吗?),这里我们把线条做成曲线:
plot(net, edge.arrow.size=.1,
edge.curved=0.2, # 只是调了这个参数
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color="black",
layout = coords,
vertex.label.cex=.7)
这下清晰很多了。
接下来,我们来绘制第二组类型贝壳一样的网络图。
dev.off()
length(unique(mynet$SOURCE)) # 查看需要绘制多少张图,以方便布局
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))
for (i in 1: length(unique(mynet$SOURCE)) ){
net1<-net2
E(net1)[map(unique(mynet$SOURCE),function(x) {
get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
})%>% unlist()]$color <- allcolour[i]
plot(net1, edge.arrow.size=.1,
edge.curved=0.4,
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color="black",
layout = coords,
vertex.label.cex=1)
}
这样是不是清晰了许多呢?
但是,细胞间通讯的频数(count)并没有绘制在图上,略显不足,这就补上吧。
dev.off()
length(unique(mynet$SOURCE))
par(mfrow=c(2,5), mar=c(.3,.3,.3,.3))
for (i in 1: length(unique(mynet$SOURCE)) ){
net1<-net2
E(net1)$count <- ""
E(net1)[map(unique(mynet$SOURCE),function(x) {
get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
})%>% unlist()]$count <- E(net2)[map(unique(mynet$SOURCE),function(x) {
get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
})%>% unlist()]$count # 故技重施
E(net1)[map(unique(mynet$SOURCE),function(x) {
get.edge.ids(net,vp = c(unique(mynet$SOURCE)[i],x))
})%>% unlist()]$color <- allcolour[i]
plot(net1, edge.arrow.size=.1,
edge.curved=0.4,
edge.label = E(net1)$count, # 绘制边的权重
vertex.color=allcolour,
vertex.frame.color="#555555",
vertex.label.color="black",
layout = coords,
vertex.label.cex=1
)
}
需要再做一些调整就可以啦。
找两条边验证一下对应关系。
mynet %>% filter(SOURCE == 'B',TARGET == 'DC')
SOURCE TARGET count
1 B DC 18
mynet %>% filter(SOURCE == 'Memory CD4 T',TARGET == 'B')
SOURCE TARGET count
1 Memory CD4 T B 10
用ggplot2 绘制点图,关键是把两张表合并到一起。
mypvals <- read.delim(paste0(pbmc,"pvalues.txt"), check.names = FALSE)
mymeans <- read.delim(paste0(pbmc,"means.txt"), check.names = FALSE)
# 这些基因list很有意思啊,建议保存
chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR", mymeans$interacting_pair,value = T)
chemokines <- grep("^CXC|CCL|CCR|CX3|XCL|XCR", mymeans$interacting_pair,value = T)
th1 <- grep("IL2|IL12|IL18|IL27|IFNG|IL10|TNF$|TNF |LTA|LTB|STAT1|CCR5|CXCR3|IL12RB1|IFNGR1|TBX21|STAT4",
mymeans$interacting_pair,value = T)
th2 <- grep("IL4|IL5|IL25|IL10|IL13|AREG|STAT6|GATA3|IL4R",
mymeans$interacting_pair,value = T)
th17 <- grep("IL21|IL22|IL24|IL26|IL17A|IL17A|IL17F|IL17RA|IL10|RORC|RORA|STAT3|CCR4|CCR6|IL23RA|TGFB",
mymeans$interacting_pair,value = T)
treg <- grep("IL35|IL10|FOXP3|IL2RA|TGFB", mymeans$interacting_pair,value = T)
costimulatory <- grep("CD86|CD80|CD48|LILRB2|LILRB4|TNF|CD2|ICAM|SLAM|LT[AB]|NECTIN2|CD40|CD70|CD27|CD28|CD58|TSLP|PVR|CD44|CD55|CD[1-9]",
mymeans$interacting_pair,value = T)
coinhibitory <- grep("SIRP|CD47|ICOS|TIGIT|CTLA4|PDCD1|CD274|LAG3|HAVCR|VSIR",
mymeans$interacting_pair,value = T)
niche <- grep("CSF", mymeans$interacting_pair,value = T)
mymeans %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK")) %>%
reshape2::melt() -> meansdf
colnames(meansdf)<- c("interacting_pair","CC","means")
mypvals %>% dplyr::filter(interacting_pair %in% costimulatory)%>%
dplyr::select("interacting_pair",starts_with("NK"),ends_with("NK"))%>%
reshape2::melt()-> pvalsdf
colnames(pvalsdf)<- c("interacting_pair","CC","pvals")
pvalsdf$joinlab<- paste0(pvalsdf$interacting_pair,"_",pvalsdf$CC)
meansdf$joinlab<- paste0(meansdf$interacting_pair,"_",meansdf$CC)
pldf <- merge(pvalsdf,meansdf,by = "joinlab")
summary((filter(pldf,means >1))$means)
pldf%>% filter(means >1) %>%
ggplot(aes(CC.x,interacting_pair.x) )+
geom_point(aes(color=means,size=-log10(pvals+0.0001)) ) +
scale_size_continuous(range = c(1,3))+
scale_color_gradient2(high="red",mid = "yellow",low ="darkblue",midpoint = 25 )+ theme_bw()+
theme(axis.text.x = element_text(angle = -45,hjust = -0.1,vjust = 0.8))
这部分写的略显臃肿,勉强可用吧。
番外篇:在不能用--counts-data=gene_name
时,我们可以如何转化基因ID呢。这里提供一种思路:用Seurat读两次数据,一次是基因SYMBOL,一次是ENTREZID(cellranger提供的便利)。方法是从10X的输出中直接读取带有ENTREZID的矩阵,如下从这个矩阵中匹配出目标矩阵。
读入数据的时候gene.column = 1
默认为2(SYMBOL)。
Read10X(
data.dir = NULL,
gene.column = 1,
unique.features = TRUE,
strip.suffix = FALSE
)
seurat2cellphonedb<- function(seosym,seo,groups,group1,celltype){
Idents(seo) <- groups
myseogroup <- subset(seo,idents =group1)
myseogroup1 <- subset(seosym,cells=colnames(myseogroup ))
count_raw <- myseogroup1@assays$RNA@counts
count_norm <- apply(count_raw, 2, function(x) (x/sum(x))*10000)
write.table(count_norm, paste0(group1,"_cellphonedb_count.txt"), sep='\t', quote=F)
meta_data <- cbind(rownames(myseogroup@meta.data),myseogroup@meta.data[,celltype, drop=F])
write.table(meta_data, paste0(group1,'_cellphonedb_meta.txt'), sep='\t', quote=F, row.names=F)
}
$cellphonedb method statistical_analysis cseocellphonedb_meta.txt C_cellphonedb_count.txt
$cellphonedb plot dot_plot
$cellphonedb plot heatmap_plot yourmeta.txt
References
[1]
细胞配受体通识以及常见细胞分泌信号通路: https://www.jianshu.com/p/df4721d29a91[2]
CellChat:细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41[3]
CellChat:细胞间相互作用在线内容: https://www.jianshu.com/p/e5ff8140f388[4]
Cell-Cell Interaction Database|| 单细胞配受体库你还在文章的附录里找吗?: https://www.jianshu.com/p/49613adce465[5]
celltalker:单细胞转录组配受体分析: https://www.jianshu.com/p/2c2f94b4f072[6]
Network在单细胞转录组数据分析中的应用: https://www.jianshu.com/p/23e46ad14a3e[7]
CellChat:细胞间相互作用分析利器: https://www.jianshu.com/p/da145cff3d41[8]
Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma:https://www.nature.com/articles/s41422-020-0374-x#Abs1 [9]
https://www.cellphonedb.org/documentation[10]
网络数据的统计分析:R语言实战: https://www.jianshu.com/nb/47898774[11]
https://www.cellphonedb.org/faq-and-troubleshooting[12]
https://github.com/Teichlab/cellphonedb/blob/master/cellphonedb/src/plotters/R/plot_heatmaps.R[13]
https://github.com/zktuong/ktplots/
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注