Seurat 官网单细胞教程三 (RPCA 快速整合数据)
黑的多了,白就是一种罪
1引言
前面的数据整合是使用 典型相关分析(“CCA”)
来识别锚点,作者优化了一些方法,提出了 reciprocal PCA (‘RPCA’)
, RPCA 确定任意两个数据集之间的锚点时,将每个数据集投影到其他 PCA 空间中,并通过相同的 相互邻域 要求来约束锚点。两个工作流的命令基本相同
,但是这两种方法可以在不同的场景中应用。
CCA 非常适合在
细胞类型保守
的情况下识别锚点,但不同实验条件的基因表达存在很大差异。因此,当 实验条件 或 疾病状态 引入非常强烈的表达变化时,或者当 跨模式和物种整合数据集时,基于 CCA 的整合能够进行整合分析。然而,基于 CCA 的集成也可能 导致过度校正,尤其是当大部分细胞在数据集之间不重叠时。
基于 RPCA 的整合 运行速度明显更快,也代表了一种 更保守 的方法,其中不同生物状态的细胞在整合后不太可能“对齐”
。因此,我们建议在综合分析期间使用 RPCA,例如:
1.一个数据集中的大部分细胞在另一个数据集中没有匹配类型。 2.数据集源自同一平台(10x genomics 的多个通道)。 3.有大量数据集或要整合的细胞。
文档链接:
https://satijalab.org/seurat/articles/integration_rpca.html
2整合前预处理
library(Seurat)
library(SeuratData)
library(patchwork)
# load dataset
ifnb <- UpdateSeuratObject(LoadData("ifnb"))
# split the dataset into a list of two seurat objects (stim and CTRL)
ifnb.list <- SplitObject(ifnb, split.by = "stim")
# normalize and identify variable features for each dataset independently
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- NormalizeData(x)
x <- FindVariableFeatures(x, selection.method = "vst", nfeatures = 2000)
})
3整合
注意这里的 reduction = "rpca":
# select features that are repeatedly variable across datasets for integration run PCA on each
# dataset using these features
features <- SelectIntegrationFeatures(object.list = ifnb.list)
ifnb.list <- lapply(X = ifnb.list, FUN = function(x) {
x <- ScaleData(x, features = features, verbose = FALSE)
x <- RunPCA(x, features = features, verbose = FALSE)
})
# reduction = "rpca"
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list,
anchor.features = features,
reduction = "rpca")
# this command creates an 'integrated' data assay
immune.combined <- IntegrateData(anchorset = immune.anchors)
# specify that we will perform downstream analysis on the corrected data note that the
# original unmodified data still resides in the 'RNA' assay
DefaultAssay(immune.combined) <- "integrated"
4下游常规分析
# Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap",
group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap",
group.by = "seurat_annotations",
label = TRUE,
repel = TRUE)
p1 + p2
可以看到整合力度稍微有点不足。
5调整整合力度
使用 k.anchor 参数调整,默认是 5
,这里设置为 20
试试:
##########################################################
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list,
anchor.features = features,
reduction = "rpca",
k.anchor = 20)
immune.combined <- IntegrateData(anchorset = immune.anchors)
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:30)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)
# Visualization
p1 <- DimPlot(immune.combined, reduction = "umap",
group.by = "stim")
p2 <- DimPlot(immune.combined, reduction = "umap",
label = TRUE, repel = TRUE)
p1 + p2
可以看到比上次好一些了。
6结合 SCTransform 归一化方法整合
整合处理:
# SCTransform
ifnb.list <- SplitObject(ifnb, split.by = "stim")
ifnb.list <- lapply(X = ifnb.list, FUN = SCTransform, method = "glmGamPoi")
features <- SelectIntegrationFeatures(object.list = ifnb.list, nfeatures = 3000)
ifnb.list <- PrepSCTIntegration(object.list = ifnb.list, anchor.features = features)
ifnb.list <- lapply(X = ifnb.list, FUN = RunPCA, features = features)
# normalization.method = "SCT"
immune.anchors <- FindIntegrationAnchors(object.list = ifnb.list,
normalization.method = "SCT",
anchor.features = features,
dims = 1:30,
reduction = "rpca",
k.anchor = 20)
immune.combined.sct <- IntegrateData(anchorset = immune.anchors,
normalization.method = "SCT",
dims = 1:30)
常规流程:
# normal steps
immune.combined.sct <- RunPCA(immune.combined.sct, verbose = FALSE)
immune.combined.sct <- RunUMAP(immune.combined.sct, reduction = "pca", dims = 1:30)
# Visualization
p1 <- DimPlot(immune.combined.sct, reduction = "umap",
group.by = "stim")
p2 <- DimPlot(immune.combined.sct, reduction = "umap",
group.by = "seurat_annotations",
label = TRUE,
repel = TRUE)
p1 + p2
7结尾
大家应该根据自己数据集的特点选择不同的方法进行整合,或者多采取几种方法看看整合效果如何,从而选择合适的方法进行分析。
欢迎加入生信交流群。加我微信我也拉你进 微信群聊 老俊俊生信交流群
(微信交流群需收取20元入群费用(防止骗子和便于管理)
)。
群二维码:
老俊俊微信:
知识星球:
所以今天你学习了吗?
今天的分享就到这里了,敬请期待下一篇!
最后欢迎大家分享转发,您的点赞是对我的鼓励和肯定!
如果觉得对您帮助很大,赏杯快乐水喝喝吧!
往期回顾
◀...