genesorteR 快速准确鉴定亚群 Marker 基因

Original JunJunLab 老俊俊的生信笔记 2023-06-15

收录于合集

送走

1引言

介绍一个 R 包 genesorteR:

在单细胞实验中确定的标记基因在特定的细胞类型具有 高度特异性，并在该 细胞类型中高表达。通过差异表达分析检测一个基因不一定满足这两个条件，而且对于大细胞数量，通常是昂贵的计算。在这里，我们提出了 genesorteR，一个 R 包，对单细胞数据中的特征进行排序, 其结果与实验生物学研究中标记基因比较一致。我们使用不同的数据集对基因序列进行了基准测试，并表明与其他方法相比，它在 大型单细胞数据集上明显更准确。GenesorteR 比目前实现的差异表达分析方法要快一个数量级，可以操作包含数百万个细胞的数据，适用于单细胞 RNA-Seq 和单细胞 ATAC-Seq 数据。

具体原理去看文章,文章标题:

GENESORTER: FEATURE RANKING IN CLUSTERED SINGLE CELL DATA

2安装

githup 地址:

https://github.com/mahmoudibrahim/genesorteR

#install devtools package from CRAN
install.packages("devtools")

#install genesorteR from the Github repository
devtools::install_github("mahmoudibrahim/genesorteR")

3使用

加载内置数据集:

library(genesorteR)

data(kidneyTabulaMuris) #three cell types from kidney (Tabula Muris data)

包含两个 list,一个 dgcMatrix 矩阵和细胞类型。

基因排序:

sg = sortGenes(kidneyTabulaMuris$exp, kidneyTabulaMuris$cellType)
# Warning message:
#   In sortGenes(kidneyTabulaMuris$exp, kidneyTabulaMuris$cellType) :
#   A Friendly Warning: Some genes were removed because they were zeros
#   in all cells after binarization.
#   You probably don't need to do anything but you might
#   want to look into this. Maybe you forgot to pre-filter
#   the genes? You can also use a different binarization method.
#   Excluded genes are available in the output under '$removed'.

查看基因在每个 cluster 的得分:

head(sg$specScore) #specificity scores for each gene in each cluster

# 6 x 3 sparse Matrix of class "dgCMatrix"
#               endothelial cell kidney collecting duct epithelial cell  leukocyte
# 0610005C13Rik       .                                     0.012820513 .
# 0610007C21Rik       0.22262881                            0.038690476 0.01904762
# 0610007L01Rik       0.04314064                            0.024291498 0.03157895
# 0610007N19Rik       .                                     0.041025641 0.01333333
# 0610007P08Rik       0.01475410                            0.002564103 0.01333333
# 0610007P14Rik       0.05828780                            0.001424501 .

计算 marker 基因,和绘图:

#define a small set of markers
mm = getMarkers(sg, quant = 0.99)

#cluster genes and make a heatmap
pp = plotMarkerHeat(sg$inputMat,
                    sg$inputClass,
                    mm$markers,
                    clusterGenes=TRUE, outs = TRUE)

查看 marker 基因的 cluster:

pp$gene_class_info #gene clusters

1190002H23Rik 1810011O10Rik        Atp1b1        Atp5g1         Atp5o           B2m
            2             2             3             3             3             2
        Brp44        Brp44l          Bst2        Calcrl           Cd2         Cd200
            3             3             2             2             1             2

4seurat 对象对接

seurat 对象提取normalized data和cluster信息即可:

# if "seuratObject" is the Seurat object that contains your data,
# I think this should work:
gs = sortGenes(seuratObject@assays$RNA@data,
               Idents(seuratObject))

欢迎加入生信交流群。加我微信我也拉你进微信群聊 老俊俊生信交流群 (微信交流群需收取20元入群费用(防止骗子和便于管理))。