单细胞分析十八般武艺12:metacell
单细胞测序技术的发展日新月异,新的分析工具也层出不穷。每个工具都有它的优势与不足,在没有权威工具和流程的单细胞生信江湖里,多掌握几种分析方法和工具,探索数据时常常会有意想不到的惊喜。
往期专题
单细胞转录组基础分析专题
metacell简介
单细胞分析最基础且非常关键的一步是细胞聚类,我们期望把相同类型(cell type)的细胞聚在一起,最好还能让不同状态(cell state)的细胞容易区分。我在项目分析过程中尝试过很多种方法的组合,例如分大群的时候用PCA+KNN+SNN,分亚群的时候会尝试NMF+最大loading值,或者使用主题模型的方法。最近有朋友介绍了一种全新的方法——metacell,不少高分文章用到了此方法,我自己实测效果也挺好。
Github主页及官方教程
https://github.com/tanaylab/metacell
https://tanaylab.github.io/metacell/articles/a-basic_pbmc8k.html
应用案例
Li et al., Cell 2018 (scRNA-seq of immune cells from human melanoma tumors)
Giladi et al., Nat Cell Biol 2018 (Mouse hematopoiesis scRNA-seq)
Sebe-pedros et al., NEE 2018 (whole organisms scRNA-seq)
Sebe-pedros et al., Cell 2018 (whole organism scRNA-seq)
Bornstein et al., Nature 2018 (thymic stroma scRNA-seq)
Cohen et al., Cell 2018 (lung scRNA-seq)
分析原理
metacell有一套自己的数据处理逻辑,它要求输入原始的UMI counts矩阵。数据输入后它会对细胞和基因进行过滤,然后选择用于计算细胞相似性的特征基因(类似于seurat的高变基因),之后基于这些特征基因的表达值构建平衡KNN图。它最核心的算法是将平衡KNN图重复抽样(默认一次抽取75%的细胞)聚类,然后将多次重复中都能聚在一起的细胞确定为metacell,最后会根据lfp值过滤离群值得到最终的metacell。metacell内部包含的cells被认为是同质化的,不同的metacell之间可能是不同cell type,也可能是用一个cell type不同的cell state,总而言之,就是metacell之间是异质的。
安装metacell
if (!require("BiocManager")) install.packages('BiocManager')
BiocManager::install("tanaylab/metacell")
metacell实践
metacell在质控方面相比seurat有所欠缺,最后得到的metacell注释起来也非常繁琐。我在metacell实践经验的基础上对原流程做了比较大调整,质控及特征基因的选择采用seurat的流程,生成metacell的核心的步骤原汁原味地保留,最后的metacell注释又回到seurat常用的cluster注释方法。
数据来源
第一步:创建seurat对象并质控
library(metacell)
library(Seurat)
library(tidyverse)
library(patchwork)
rm(list = ls())
###### 分析初始化
dir.create("metacell")
setwd("metacell")
# 设置存放数据的目录
if(!dir.exists("scdb")) dir.create("scdb")
scdb_init("scdb", force_reinit=T)
# 设置存放图形的目录
if(!dir.exists("figs")) dir.create("figs")
scfigs_init("figs")
###### 创建seurat对象
fn <- paste0("GSE139324/", c("GSM4138110", "GSM4138111", "GSM4138162", "GSM4138168"))
sn <- c('HNC01PBMC', 'HNC01TIL', 'PBMC1', 'Tonsil1')
scRNAlist <- list()
for(i in 1:length(fn)){
counts <- Read10X(data.dir = fn[i])
#不设置min.cells过滤基因会导致CellCycleScoring报错:
#Insufficient data values to produce 24 bins.
scRNAlist[[i]] <- CreateSeuratObject(counts, project=sn[i],
min.cells=2, min.features = 200)
#给细胞barcode加个前缀,防止合并后barcode重名
scRNAlist[[i]] <- RenameCells(scRNAlist[[i]], add.cell.id = sn[i])
#计算线粒体基因比例
if(T){
scRNAlist[[i]][["percent.mt"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^MT-")
}
#计算核糖体基因比例
if(T){
scRNAlist[[i]][["percent.rb"]] <- PercentageFeatureSet(scRNAlist[[i]], pattern = "^RP[SL]")
}
#计算红细胞基因比例
if(T){
HB.genes <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ")
HB.genes <- CaseMatch(HB.genes, rownames(scRNAlist[[i]]))
scRNAlist[[i]][["percent.HB"]]<-PercentageFeatureSet(scRNAlist[[i]], features=HB.genes)
}
}
scRNA <- merge(scRNAlist[[1]], scRNAlist[2:length(scRNAlist)])
###### 数据质控
minGene=500
maxGene=4000
maxUMI=15000
pctMT=10
pctHB=1
scRNA <- subset(scRNA, subset = nCount_RNA < maxUMI & nFeature_RNA > minGene &
nFeature_RNA < maxGene & percent.mt < pctMT & percent.HB < pctHB)
第二步:seurat转metacell
###### 构建metacell对象
## 提取高变基因
scRNA <- SCTransform(scRNA)
var.genes <- VariableFeatures(scRNA)
var.genes <- structure(rep(1:length(var.genes)), names=var.genes)
var.genes <- gset_new_gset(sets = var.genes, desc = "seurat variable genes")
scdb_add_gset("pbmc", var.genes)
## 提取counts矩阵
sce = as.SingleCellExperiment(scRNA)
mat = scm_import_sce_to_mat(sce)
scdb_add_mat("pbmc", mat)
###### 构建平衡KNN图
mcell_add_cgraph_from_mat_bknn(mat_id = "pbmc",
gset_id = "pbmc",
graph_id = "pbmc_k100",
K = 100,
dsamp = F) # 20,000 cells之内不必抽样
###### 生成metacell
#### 共聚类
mcell_coclust_from_graph_resamp(coc_id = "pbmc_n1000", graph_id = "pbmc_k100",
min_mc_size = 20, p_resamp = 0.75, n_resamp=1000)
#### 生成初级metacell
mcell_mc_from_coclust_balanced(coc_id = "pbmc_n1000", mat_id = "pbmc", mc_id = "pbmc",
K = 20, min_mc_size = 20, alpha = 2)
#### 修剪metacell
mcell_plot_outlier_heatmap(mc_id = "pbmc", mat_id = "pbmc", T_lfc = 3)
mcell_mc_split_filt(new_mc_id = "pbmc", mc_id = "pbmc", mat_id = "pbmc", T_lfc = 3, plot_mats = T)
#### metacel配色
mc_f <- scdb_mc("pbmc")
my.col <- c("darkgray","burlywood1","chocolate4","orange","red","purple","blue","darkgoldenrod3","cyan")
mc_f@colors <- colorRampPalette(my.col)(ncol(mc_f@mc_fp))
scdb_add_mc("pbmc", mc_f)
mc_f <- scdb_mc("pbmc")
#### Markers热图(热图显示的log2(lfp)值)
mcell_gset_from_mc_markers(gset_id = "pbmc_markers", mc_id = "pbmc")
# 单细胞水平
mcell_mc_plot_marks(mc_id = "pbmc", gset_id = "pbmc_markers", mat_id = "pbmc", plot_cells = T)
# metacell水平
mcell_mc_plot_marks(mc_id = "pbmc", gset_id = "pbmc_markers", mat_id = "pbmc", plot_cells = F)
#### 2D图展示Cells与MCs
#download.file("http://www.wisdom.weizmann.ac.il/~arnau/metacell_data/Amphimedon_adult/config.yaml","config.yaml")
#tgconfig::override_params("config.yaml","metacell") #使用这个就不用下面的2-3行代码设置参数
mcell_mc2d_force_knn(mc2d_id="pbmc", mc_id="pbmc", graph_id="pbmc_k100")
tgconfig::set_param("mcell_mc2d_height", 1000, "metacell")
tgconfig::set_param("mcell_mc2d_width", 1000, "metacell")
mcell_mc2d_plot(mc2d_id = "pbmc")
Marker基因lfp值热图,行是基因,列是metacell编号
metacell的细胞降维图
第三步:metacell结果导入seurat
#### metacell数据转为seurat对象
source("~/sc_script/function.R")
# 上一行代码导入的是不宜公开的内部函数
sco <- mc2sco(mat_id = "pbmc", mc_id = "pbmc", mc2d_id = "pbmc", min.features = 2)
sco <- NormalizeData(sco)
sco <- SCTransform(sco)
sco <- RunPCA(sco, verbose = F)
sco <- RunHarmony(sco, group.by.vars="orig.ident", assay.use="SCT")
sco <- FindNeighbors(sco, reduction = "harmony", dims = 1:30) %>% FindClusters()
sco <- RunUMAP(sco, reduction = "harmony", dims = 1:30)
#### SingleR鉴定细胞类型
load("~/database/SingleR_ref/ref_Hematopoietic.RData")
sco <- cell_identify2(sco, ref_Hematopoietic)
## 降维图对比
p1 <- DimPlot(sco)
p2 <- DimPlot(sco, reduction = "mcsc", group.by = "mc_id")
p <- p1|p2
ggsave("embedding_comparison.png", p, width = 16, height = 6.5)
## 批次效应对比
p1 <- DimPlot(sco, group.by = "orig.ident")
p2 <- DimPlot(sco, reduction = "mcsc", group.by = "orig.ident")
p <- p1|p2
ggsave("batches_comparison.png", p, width = 16, height = 6.5)
seurat与metacell降维结果对比
批次效应处理结果对比
## 分群效果对比
markers <- c('CD3D','CD3E','CD4','IL2RA','FOXP3','CTLA4','CD8A','CD8B','MS4A1','CD79A',
'GNLY','NKG7','PPBP','CLEC4C','FCER1A','CD14','FCGR3A','S100A8')
### 散点图
p <- FeaturePlot(sco, features = markers, ncol = 3)
ggsave("seurat_markers_featureplot.png", p, width = 12, height = 18)
p <- FeaturePlot(sco, reduction = "mcsc", features = markers, ncol = 3)
ggsave("metacell_markers_featureplot.png", p, width = 12, height = 18)
### 小提琴图
p <- VlnPlot(sco, features = markers, group.by = "mc_id", stack = T, flip = T)
ggsave("metacell_markers_vlnplot.png", p, width = 18, height = 12)
p <- VlnPlot(sco, features = markers, group.by = "seurat_clusters", stack = T, flip = T)
ggsave("seurat_markers_vlnplot.png", p, width = 18, height = 12)
Seurat处理结果展示
metacell处理结果展示
p1 <- plot_mc_sc(sco, "pbmc") + NoLegend()
p2 <- DimPlot(sco, reduction = "mcsc", group.by = "SingleR", label = T) + NoLegend()
p3 <- DimPlot(sco, reduction = "mcsc", group.by = "seurat_clusters", label = T) + NoLegend()
p = p1|p2
ggsave("metacell_celltype.png", p, width = 15, height = 6.5)
p = p1|p3
ggsave("metacell_cluster.png", p, width = 15, height = 6.5)
p1 <- DimPlot(sco, reduction = "umap", group.by = "SingleR", label = T) + NoLegend()
p2 <- DimPlot(sco, reduction = "umap", group.by = "seurat_clusters", label = T) + NoLegend()
p = p1|p2
ggsave("celltype_cluster_umap.png", p, width = 15, height = 6.5)
Celltype与metacell的对应关系
seurat_cluster与metacell的对应关系
seurat_cluster与SingleR鉴定结果的对应关系
如果您阅读此文有所疑惑,或有不同见解,亦或其他问题,欢迎添加我的微信探讨。我工作的公司2016年开始单细胞测序服务,至今已完成近万例样本的单细胞测序,服务质量经过了10X Genomics公司的权威认证。欢迎大家联系Kinesin洽谈单细胞测序及数据分析业务!