单细胞转录组基础分析二:数据质控与标准化
The following article is from 生信会客厅 Author Kinesin
本节及之后的三、四节主要介绍单细胞表达矩阵到细胞类型鉴定的分析步骤,流程图如下:
10X官网下载数据
10X官网演示数据:
https://support.10xgenomics.com/single-cell-gene-expression/datasets
官方演示数据集的页面如下,Chromium Connect是自动化系统,官方介绍操作造成的批次效应较小。Chromium Next GEM最新版的芯片,由老版的双十字微流控芯片改成了单十字微流控芯片。目前国内的10X试剂基本都是V3版本的了,因此也不要下载V1和V2试剂的数据了。为了保证笔记本电脑能运行,我们下载细胞数比较少的数据,下图红色箭头所示:
本教程的分析都是基于箭头标示的数据。点击之后需要提交个人信息,提交之后就进入数据下载界面 了。
下载红框标示的数据,Seurat分析的输入文件在这里,解压后如下图所示:
数据质控与标准化
library(Seurat)
library(tidyverse)
rm(list=ls())
dir.create("QC")
##创建seurat对象
scRNA.counts <- Read10X(data.dir = "filtered_feature_bc_matrix")
try({scRNA = CreateSeuratObject(scRNA.counts[['Gene Expression']])},silent=T)
if(exists('scRNA')){} else {scRNA = CreateSeuratObject(scRNA.counts)}
#table(scRNA@meta.data$orig.ident) #查看样本的细胞数量
##计算质控指标
#计算细胞中核糖体基因比例
scRNA[["percent.mt"]] <- PercentageFeatureSet(scRNA, pattern = "^MT-")
#计算红细胞比例
HB.genes <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ")
HB_m <- match(HB.genes, rownames(scRNA@assays$RNA))
HB.genes <- rownames(scRNA@assays$RNA)[HB_m]
HB.genes <- HB.genes[!is.na(HB.genes)]
scRNA[["percent.HB"]]<-PercentageFeatureSet(scRNA, features=HB.genes)
#head(scRNA@meta.data)
col.num <- length(levels(scRNA@active.ident))
violin <- VlnPlot(scRNA,
features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"),
cols =rainbow(col.num),
pt.size = 0.01, #不需要显示点,可以设置pt.size = 0
ncol = 4) +
theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank())
ggsave("QC/vlnplot_before_qc.pdf", plot = violin, width = 12, height = 6)
ggsave("QC/vlnplot_before_qc.png", plot = violin, width = 12, height = 6)
plot1 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot3 <- FeatureScatter(scRNA, feature1 = "nCount_RNA", feature2 = "percent.HB")
pearplot <- CombinePlots(plots = list(plot1, plot2, plot3), nrow=1, legend="none")
ggsave("QC/pearplot_before_qc.pdf", plot = pearplot, width = 12, height = 5)
ggsave("QC/pearplot_before_qc.png", plot = pearplot, width = 12, height = 5)
##设置质控标准
print(c("请输入允许基因数和核糖体比例,示例如下:", "minGene=500", "maxGene=4000", "pctMT=20"))
minGene=500
maxGene=4000
pctMT=15
##数据质控
scRNA <- subset(scRNA, subset = nFeature_RNA > minGene & nFeature_RNA < maxGene & percent.mt < pctMT)
col.num <- length(levels(scRNA@active.ident))
violin <-VlnPlot(scRNA,
features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"),
cols =rainbow(col.num),
pt.size = 0.1,
ncol = 4) +
theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank())
ggsave("QC/vlnplot_after_qc.pdf", plot = violin, width = 12, height = 6)
ggsave("QC/vlnplot_after_qc.png", plot = violin, width = 12, height = 6)
scRNA <- NormalizeData(scRNA, normalization.method = "LogNormalize", scale.factor = 10000)
##保存中间结果
saveRDS(scRNA, file="scRNA.rds")
获取帮助
本教程的目的在于把常用的单细胞分析流程串起来,适合有一定R语言基础的朋友参考。分析原理和代码我没有详细解释,官网有详细的教程和权威的解释,翻译好的中文教程也有多个版本,有兴趣的可以搜索一下。如果您学习本教程有一定困难,可以点击 “阅读原文” 找到作者联系方式,获取帮助。
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
生信爆款入门-第8期(线上直播4周,马拉松式陪伴,带你入门)你的生物信息入门课
数据挖掘学习班第6期(线上直播3周,马拉松式陪伴,带你入门) 医学生/医生首选技能提高课
生信技能树的2019年终总结 你的生物信息成长宝藏
看完记得顺手点个“在看”哦!
长按扫码可关注