其他
如何过滤线粒体基因表达过高的细胞(进阶版)
如何删除线粒体基因表达过高的细胞
Part1:加载包并读入Rdata文件
读入Rdata文件并查看Rdata文件保存的变量
查看保存变量名后,读入Rdata文件
Part2:对多样本中的每个样本查看质量并过滤
Part3:对多样本中的样本整体查看质量并过滤
Part4:可视化质控前后的细胞线粒体基因分布情况
如何删除线粒体基因表达过高的细胞
前面给大家介绍了 过滤线粒体基因表达过高的细胞 基础版。今天给大家分享下进一步优化的代码(文中示例数据可在基础版推文找到)。
因为损伤细胞和死细胞常表现出过大量的线粒体污染,因此需要过滤高线粒体基因表达的细胞。过滤原则为,移去线粒体基因表达比例过高的细胞,但是不能大量丢失样本细胞信息。
综上所述,考虑的过滤条件有两点:第一,过滤线粒体基因表达比例超过20%的细胞;第二,至少过滤5%的线粒体基因表达比例异常高的离群细胞。
Part1:加载包并读入Rdata文件
读入Rdata文件并查看Rdata文件保存的变量
load("H:/Scripts_scRNA_seq_2020/seurat_object.small.Rdata")
attach("H:/Scripts_scRNA_seq_2020/seurat_object.small.Rdata")
ls("file:H:/Scripts_scRNA_seq_2020/seurat_object.small.Rdata")
## [1] "object"
查看保存变量名后,读入Rdata文件
library(easypackages)
load_packages <- c('Seurat', 'MASS', 'dplyr', 'ggplot2')
libraries(load_packages)
load('H:/Scripts_scRNA_seq_2020/seurat_object.small.Rdata')
table(object@active.ident)
##
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 100 100 100 100 100 100 100 100 100 100 100 100 100 100 33 92
head(object@meta.data)
## nCount_RNA nFeature_RNA percent.mito percent.ribo cluster
## B-AAACGAAGTGAGTAGC 4000 1284 10.525000 18.700000 12
## B-AAACGCTGTAGTCACT 7268 2502 9.506122 11.019397 10
## B-AAACGCTGTATTTCCT 10096 3035 21.018225 5.645800 3
## B-AAAGTGAAGTAACCTC 1741 1022 4.075775 6.142365 6
## B-AAAGTGACAGCACAAG 3720 1276 2.607527 15.967742 12
## B-AAAGTGAGTCGAGTGA 27745 5055 4.080014 9.511624 16
## sample
## B-AAACGAAGTGAGTAGC B
## B-AAACGCTGTAGTCACT B
## B-AAACGCTGTATTTCCT B
## B-AAAGTGAAGTAACCTC B
## B-AAAGTGACAGCACAAG B
## B-AAAGTGAGTCGAGTGA B
seurat_data <- object
Part2:对多样本中的每个样本查看质量并过滤
## Part1: Quality control of each sample
# - must remove mito ratio highly than 20% cells;
# - must remove at least 5% outlier cells.
Filter_each_samples_high_mito_genes <-
function(seurat_object, type, max_quantile, max_mito_ratio){
seurat_list <- SplitObject(seurat_object, split.by = "sample")
names(seurat_list)
for (i in names(seurat_list)){
print(i)
## Step1: Fit normal distribution
metadata <- seurat_list[[i]]@meta.data
fit <- fitdistr(metadata[, type], densfun='normal')
mito_quantile <- pnorm(max_mito_ratio, mean=fit$estimate[1], sd=fit$estimate[2])
mito_ratio <- qnorm(max_quantile, mean=fit$estimate[1], sd=fit$estimate[2])
## Step2: Judge samples quality
if(mito_ratio <= max_mito_ratio){
print('Good sample!')
}else{
print('Check sample!')
print(paste0('Just keep ', round(100*mito_quantile,2), ' Cells.'))
}
}
## Step3: Set filtering threshold
final_ratio_ratio <- min(mito_ratio, max_mito_ratio)
return(final_ratio_ratio)
}
# Set threshold
m_raw_max <- Filter_each_samples_high_mito_genes(seurat_data, 'percent.mito', 0.95, 20)
## [1] "B"
## [1] "Check sample!"
## [1] "Just keep 89.2 Cells."
## [1] "A"
## [1] "Check sample!"
## [1] "Just keep 67.71 Cells."
print(m_raw_max)
## [1] 20
clean_data1 <- subset(seurat_data, percent.mito <= m_raw_max)
print(ncol(seurat_data))
## [1] 1525
print(ncol(clean_data1))
## [1] 1328
# Decreasing cell ratio is:
print(100*(1 - ncol(clean_data1)/ncol(seurat_data)))
## [1] 12.91803
Part3:对多样本中的样本整体查看质量并过滤
## Part2: Quality control as whole
# - must remove mito ratio highly than 20% cells;
# - must remove at least 5% outlier cells.
Filter_whole_samples_high_mito_genes <-
function(seurat_object, type, max_quantile, max_mito_ratio){
## Step1: Fit normal distribution
metadata <- seurat_object@meta.data
fit <- fitdistr(metadata[, type], densfun='normal')
mito_quantile <- pnorm(max_mito_ratio, mean=fit$estimate[1], sd=fit$estimate[2])
mito_ratio <- qnorm(max_quantile, mean=fit$estimate[1], sd=fit$estimate[2])
## Step2: Judge samples quality
if(mito_ratio <= max_mito_ratio){
print('Good sample!')
}else{
print('Check sample!')
print(paste0('Just keep ', round(100*mito_quantile,2), ' Cells.'))
}
## Step3: Set filtering threshold
final_ratio_ratio <- min(mito_ratio, max_mito_ratio)
return(final_ratio_ratio)
}
# Set threshold
m_raw_max <- Filter_whole_samples_high_mito_genes(seurat_data, 'percent.mito', 0.95, 20)
## [1] "Check sample!"
## [1] "Just keep 81.95 Cells."
print(m_raw_max)
## [1] 20
clean_data2 <- subset(seurat_data, percent.mito <= m_raw_max)
print(ncol(seurat_data))
## [1] 1525
print(ncol(clean_data2))
## [1] 1328
# Decreasing cell ratio is:
print(100*(1 - ncol(clean_data2)/ncol(seurat_data)))
## [1] 12.91803
Part4:可视化质控前后的细胞线粒体基因分布情况
鉴于两次过滤选择的线粒体基因表达比例阈值一致,得到的细胞也一致,所以只需要展示一组质控前后的线粒体基因分布图。
mito_distribution <- function(seurat_object){
metadata <- seurat_object@meta.data
metadata %>%
ggplot(aes(color=sample, x=percent.mito, fill=sample)) +
geom_density(alpha = 0.2) +
scale_x_continuous(breaks = seq(0, max(object$percent.mito), by=5)) +
theme_classic()
}
p1 <- mito_distribution(seurat_data)
p2 <- mito_distribution(clean_data1)
library(cowplot)
plot_grid(p1, p2, labels=c('original', 'clean data'))
RNA Velocity and Beyond 系列1—Introduction
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注