查看原文
其他

scRepertoire||单细胞免疫组库分析:R语言应用(二)

周运来 单细胞天地 2022-08-10

分享是一种态度

作者 |  周运来

男,

一个长大了才会遇到的帅哥,

稳健,潇洒,大方,靠谱。

一段生信缘,一棵技能树,

一枚大型测序工厂的螺丝钉,

一个随机森林中提灯觅食的津门旅客。



克隆空间内稳态

通过检查克隆空间,我们可以有效地观察特定比例下克隆所占的相对空间。另一种思考方法是把整个免疫受体测序看作一个量杯(总量)。在这个杯子里,我们将填充不同粘度的液体-或不同数量的克隆比例。克隆空间内稳态是问以不同比例(或者是不同粘度的液体,类推)填充的克隆杯子的百分比是多少。比例切点在函数中cloneType变量下设置,可以进行调整,以下为基准:

cloneTypes + Rare = .0001 + Small = .001 + Medium = .01 + Large = .1 + Hyperexpanded = 1

?clonalHomeostasi
clonalHomeostasis(combined, cloneCall = "gene")+  scale_fill_npg()

克隆的比例

与上述克隆空间稳态一样,克隆比例作用是将克隆放入单独的箱(bin)中。关键的区别在于,clonalProportion()函数不会查看克隆与总数的相对比例,而是根据总数对克隆进行排序,并将它们放入箱子(bin)中。

split表示克隆型按拷贝数或出现频率的排序,即1:10为每个样本中前10个克隆型。默认的bin位于函数中的split变量之下,可以进行调整,但在基线上它们如下所示。

split + 10 + 100 + 1000 + 10000 + 30000 + 100000

clonalProportion(combined, cloneCall = "gene")   +  scale_fill_npg()

如果您对加载到scRepertoire中的样本之间的相似性度量感兴趣,那么使用clonalOverlap()可以帮助可视化。目前有两种方法可用于clonalOverlap()重叠系数和莫里西塔指数。前者是在较小的样本中以独特的克隆型的长度来衡量克隆型的重叠。森里西塔指数更为复杂,它是对一个种群中个体分散程度的一种生态测量,包含了种群的规模。

clonalOverlap(combined, cloneCall = "gene+nt", method = "morisita"

scRepertoire的另一个新特性是使用powerTCR R包改编的clonesizeDistribution()通过克隆大小分布对示例进行聚类。如果用这个函数来分析样本克隆大小分布的相似性,请阅读并引用相应的引文。在这个函数中,method是指层次聚类的方法。

clonesizeDistribution(combined, cloneCall = "gene+nt", method="ward.D2")

多样性分析

多样性也可以通过样本或其他变量来衡量。多样性是通过四个度量来计算的: 
1)Shannon,
2) inverse Simpson,
3) Chao1和4)based Coverage Estimator (ACE)。
前两者一般用于估计基线多样性,Chao/ACE指数用于估计样本的丰富度。
?clonalDiversity
clonalDiversity(combined, cloneCall = "gene", group = "samples")+  scale_fill_npg()

与表达谱数据整合

这里我就换一套数据来展示了,因为原教程的数据没有下载到,所以这里有用不知道从哪来的一套VDJ和RNA的数据来展示。需要说明的是,这里不是多个样本,而是一个样本了。也就是常说的,做了5‘转录组和VDJ 的项目。

csv1 <- read.csv("F:\\VDJ\\filtered_contig_annotations.csv", stringsAsFactors = F)

mycsv1 = list(ex=csv1)
?combineBCR
combined <- combineBCR(mycsv1, samples = c("PY"), ID = c("P"))
head(combined$PY_P)

                  barcode sample ID                         IGH              cdr3_aa1
1 PY_P_ACGCAGCCATTCCTCG-1     PY  P    IGHV1-18.IGHJ4.None.IGHM          CARDRKDVLDHW
2 PY_P_AGTAGTCGTAATAGCA-1     PY  P    IGHV4-61.IGHJ2.None.IGHM CARDKSCINGVCHRSWYFDLW
3 PY_P_TACGGATGTCAGCTAT-1     PY  P    IGHV5-51.IGHJ4.None.IGHM         CARRGNGEMATYW
4 PY_P_AACTCAGTCAAACCAC-1     PY  P IGHV2-5.IGHJ4.IGHD2-21.IGHM  CAHRLNTYCRGDCYASFDYW
5 PY_P_GTCTCGTCAAAGAATC-1     PY  P    IGHV3-23.IGHJ3.None.IGHM   CAKDLRGTRVTSYDAFDIW
6 PY_P_CGGAGTCAGCACCGCT-1     PY  P    IGHV4-39.IGHJ4.None.IGHM          CATEYSSSPALW
                                                         cdr3_nt1                 IGLC
1                            TGTGCGAGAGATAGGAAAGATGTCCTGGACCACTGG  IGLV2-8.IGLJ2.IGLC2
2 TGTGCGAGAGACAAATCTTGTATTAATGGTGTGTGCCACAGGTCTTGGTACTTCGATCTCTGG  IGLV2-8.IGLJ1.IGLC1
3                         TGTGCGCGACGTGGAAATGGAGAGATGGCTACTTATTGG IGLV2-14.IGLJ1.IGLC1
4    TGTGCACACCGCCTCAACACATATTGTCGTGGTGACTGCTACGCCTCCTTTGACTACTGG IGLV2-14.IGLJ2.IGLC2
5       TGTGCGAAAGATCTTCGGGGGACTAGGGTGACTTCTTATGATGCTTTTGATATCTGG IGLV2-14.IGLJ2.IGLC2
6                            TGTGCGACAGAATATAGTAGCTCGCCGGCCCTCTGG IGLV2-14.IGLJ2.IGLC2
        cdr3_aa2                                   cdr3_nt2
1    CNSYAGGNKVF          TGCAACTCATATGCAGGCGGCAACAAGGTATTC
2   CNSYVDNNSYVF       TGCAACTCCTATGTAGACAACAACAGTTATGTCTTC
3  CNSYTRSTTLYVF    TGCAATTCATATACACGCAGCACCACTCTTTATGTCTTC
4   CTSYTSSSTLLF       TGCACCTCATATACAAGCAGCAGCACTCTGCTGTTC
5  CTSYSSTYTLVLI    TGCACCTCATATTCAAGCACCTACACTCTCGTGCTGATC
6 CSSHATSSTAYVIF TGCAGCTCACACGCAACCAGCAGCACTGCCTACGTGATATTC
                                            CTgene
1     IGHV1-18.IGHJ4.None.IGHM_IGLV2-8.IGLJ2.IGLC2
2     IGHV4-61.IGHJ2.None.IGHM_IGLV2-8.IGLJ1.IGLC1
3    IGHV5-51.IGHJ4.None.IGHM_IGLV2-14.IGLJ1.IGLC1
4 IGHV2-5.IGHJ4.IGHD2-21.IGHM_IGLV2-14.IGLJ2.IGLC2
5    IGHV3-23.IGHJ3.None.IGHM_IGLV2-14.IGLJ2.IGLC2
6    IGHV4-39.IGHJ4.None.IGHM_IGLV2-14.IGLJ2.IGLC2
                                                                                                  CTnt
1                               TGTGCGAGAGATAGGAAAGATGTCCTGGACCACTGG_TGCAACTCATATGCAGGCGGCAACAAGGTATTC
2 TGTGCGAGAGACAAATCTTGTATTAATGGTGTGTGCCACAGGTCTTGGTACTTCGATCTCTGG_TGCAACTCCTATGTAGACAACAACAGTTATGTCTTC
3                      TGTGCGCGACGTGGAAATGGAGAGATGGCTACTTATTGG_TGCAATTCATATACACGCAGCACCACTCTTTATGTCTTC
4    TGTGCACACCGCCTCAACACATATTGTCGTGGTGACTGCTACGCCTCCTTTGACTACTGG_TGCACCTCATATACAAGCAGCAGCACTCTGCTGTTC
5    TGTGCGAAAGATCTTCGGGGGACTAGGGTGACTTCTTATGATGCTTTTGATATCTGG_TGCACCTCATATTCAAGCACCTACACTCTCGTGCTGATC
6                      TGTGCGACAGAATATAGTAGCTCGCCGGCCCTCTGG_TGCAGCTCACACGCAACCAGCAGCACTGCCTACGTGATATTC
                                CTaa                          CTstrict cellType
1           CARDRKDVLDHW_CNSYAGGNKVF    IGH.53_IGHV1-18_IGLC51_IGLV2-8        B
2 CARDKSCINGVCHRSWYFDLW_CNSYVDNNSYVF    IGH.98_IGHV4-61_IGLC99_IGLV2-8        B
3        CARRGNGEMATYW_CNSYTRSTTLYVF IGH.534_IGHV5-51_IGLC489_IGLV2-14        B
4  CAHRLNTYCRGDCYASFDYW_CTSYTSSSTLLF    IGH.12_IGHV2-5_IGLC13_IGLV2-14        B
5  CAKDLRGTRVTSYDAFDIW_CTSYSSTYTLVLI IGH.504_IGHV3-23_IGLC465_IGLV2-14        B
6        CATEYSSSPALW_CSSHATSSTAYVIF IGH.287_IGHV4-39_IGLC268_IGLV2-14        B

我们观察到Barcode被改变名称了,所以我们:

combined$PY_P <- stripBarcode(combined$PY_P , column = 1, connector = "_", num_connects = 3)

读入RNA数据并执行Seurat标准流程:

?Read10X
seo<- Read10X(data.dir = 'F:\\VDJ\\filtered_feature_bc_matrix')
colnames(seo)
srto <- CreateSeuratObject(counts = seo<article class="_2rhmJa"Gene Expression`)
srto ->pbmc
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
pbmc <- RunUMAP(object = pbmc,dims = 1:15)
pbmc -> seurat

seurat
An object of class Seurat 
33538 features across 7231 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 3 dimensional reductions calculated: pca, tsne, umap

DimPlot(seurat, label = T) + NoLegend() + scale_fill_npg()

接下来,我们可以获取clonotypic信息,并使用combineExpression()函数将其附加到Seurat对象中。重要的是,附件的主要需求是匹配contig细胞条形码和seurat或SCE对象的元数据行名中的条形码。如果这些不匹配,将失败。方便起见,我们建议您对Seurat对象行名称进行更改。

?combineExpression
seurat1 <- combineExpression(combined, seurat, cloneCall="gene")

作为核心函数,这个源代码我们是要背诵的了

combineExpression
function (df, sc, cloneCall = "gene+nt", groupBy = "none"
    cloneTypes = c(None = 0, Single = 1, Small = 5, Medium = 20, 
        Large = 100, Hyperexpanded = 500), filterNA = FALSE) 
{
    df <- checkList(df)
    cloneCall <- theCall(cloneCall)
    Con.df <- NULL
    meta <- grabMeta(sc)
    cell.names <- rownames(meta)
    if (groupBy == "none") {
        for (i in seq_along(df)) {
            data <- data.frame(df[[i]], stringsAsFactors = FALSE)
            data2 <- unique(data[, c("barcode", cloneCall)])
            data2 <- na.omit(data2[data2[, "barcode"] %in
                cell.names, ])
            data2 <- data2 %>% group_by(data2[, cloneCall]) %>% 
                summarise(Frequency = n())
            colnames(data2)[1] <- cloneCall
            data <- merge(data, data2, by = cloneCall, all = TRUE)
            Con.df <- rbind.data.frame(Con.df, data)
        }
    }
    else if (groupBy != "none") {
        data <- data.frame(bind_rows(df), stringsAsFactors = FALSE)
        data2 <- na.omit(unique(data[, c("barcode", cloneCall, 
            groupBy)]))
        data2 <- data2[data2[, "barcode"] %in% cell.names, 
            ]
        data2 <- as.data.frame(data2 %>% group_by(data2[, cloneCall], 
            data2[, groupBy]) %>% summarise(Frequency = n()))
        colnames(data2)[c(1, 2)] <- c(cloneCall, groupBy)
        x <- unique(data[, groupBy])
        for (i in seq_along(x)) {
            sub1 <- subset(data, data[, groupBy] == x[i])
            sub2 <- subset(data2, data2[, groupBy] == x[i])
            merge <- merge(sub1, sub2, by = cloneCall)
            Con.df <- rbind.data.frame(Con.df, merge)
        }
    }
    Con.df$cloneType <- NA
    for (x in seq_along(cloneTypes)) {
        names(cloneTypes)[x] <- paste0(names(cloneTypes[x]), 
            " (", cloneTypes[x - 1], " < X <= "
            cloneTypes[x], ")")
    }
    for (i in 2:length(cloneTypes)) {
        Con.df$cloneType <- ifelse(Con.df$Frequency > cloneTypes[i - 
            1] & Con.df$Frequency <= cloneTypes[i], names(cloneTypes[i]), 
            Con.df$cloneType)
    }
    PreMeta <- unique(Con.df[, c("barcode""CTgene"
        "CTnt""CTaa""CTstrict""Frequency"
        "cloneType")])
    rownames(PreMeta) <- PreMeta$barcode
    if (inherits(x = sc, what = "Seurat")) {
        sc <- AddMetaData(sc, PreMeta)
    }
    else {
        rownames <- rownames(colData(sc))
        colData(sc) <- cbind(colData(sc), PreMeta[rownames, ])[, 
            union(colnames(colData(sc)), colnames(PreMeta))]
        rownames(colData(sc)) <- rownames
    }
    if (filterNA == TRUE) {
        sc <- filteringNA(sc)
    }
    return(sc)
}
<bytecode: 0x00000224fac95a18>
<environment: namespace:scRepertoire>

head(seurat1@meta.data)
                      orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 seurat_clusters barcode
AAACCTGAGATCTGAA-1 SeuratProject       4386         1206               3               3    <NA>
AAACCTGAGGAACTGC-1 SeuratProject       6258         1723               4               4    <NA>
AAACCTGAGGAGTCTG-1 SeuratProject       4243         1404               1               1    <NA>
AAACCTGAGGCTCTTA-1 SeuratProject       5205         1622               4               4    <NA>
AAACCTGAGTACGTTC-1 SeuratProject       7098         2123              11              11    <NA>
AAACCTGGTATCACCA-1 SeuratProject       1095          604              10              10    <NA>
                   CTgene CTnt CTaa CTstrict Frequency cloneType highlight
AAACCTGAGATCTGAA-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>
AAACCTGAGGAACTGC-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>
AAACCTGAGGAGTCTG-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>
AAACCTGAGGCTCTTA-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>
AAACCTGAGTACGTTC-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>
AAACCTGGTATCACCA-1   <NA> <NA> <NA>     <NA>        NA      <NA>      <NA>

为什么那么多NA?我很是困惑,不会做错了吧?

table(rownames(seurat1@meta.data) %in% combined$PY_P$barcode)

FALSE  TRUE 
 6469   762 

head(seurat1@meta.data %>% filter(CTgene  != "<NA>"))
                      orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
AAACCTGTCACTGGGC-1 SeuratProject       7539         2215               7
AAACCTGTCAGGTAAA-1 SeuratProject       1017          532              12
AAACCTGTCTCTTATG-1 SeuratProject       5974         1838               7
AAAGATGAGCGATTCT-1 SeuratProject       5352         1434               7
AAAGCAAAGAGGTTAT-1 SeuratProject       5878         1899               8
AAAGTAGCAATCCAAC-1 SeuratProject       5083         1463               7
                   seurat_clusters            barcode
AAACCTGTCACTGGGC-1               7 AAACCTGTCACTGGGC-1
AAACCTGTCAGGTAAA-1              12 AAACCTGTCAGGTAAA-1
AAACCTGTCTCTTATG-1               7 AAACCTGTCTCTTATG-1
AAAGATGAGCGATTCT-1               7 AAAGATGAGCGATTCT-1
AAAGCAAAGAGGTTAT-1               8 AAAGCAAAGAGGTTAT-1
AAAGTAGCAATCCAAC-1               7 AAAGTAGCAATCCAAC-1
                                                           CTgene
AAACCTGTCACTGGGC-1  IGHV4-59.IGHJ5.None.IGHM_IGKV1D-33.IGKJ4.IGKC
AAACCTGTCAGGTAAA-1                         NA_IGKV3-15.IGKJ2.IGKC
AAACCTGTCTCTTATG-1 IGHV7-4-1.IGHJ4.None.IGHM_IGLV3-10.IGLJ3.IGLC3
AAAGATGAGCGATTCT-1                   IGHV4-59.IGHJ4.None.IGHA1_NA
AAAGCAAAGAGGTTAT-1  IGHV3-74.IGHJ3.None.IGHM_IGLV2-23.IGLJ3.IGLC3
AAAGTAGCAATCCAAC-1 IGHV3-23.IGHJ4.None.IGHG2_IGLV1-44.IGLJ3.IGLC3
                                                                                                                   CTnt
AAACCTGTCACTGGGC-1                               TGTGCGAGAGGCGGGAACAGTGGCTTAGACCCCTGG_TGTCAACAGTATGATAATCTCCCGCTCACTTTC
AAACCTGTCAGGTAAA-1                                                              NA_TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
AAACCTGTCTCTTATG-1                TGTGCGAGAGGCGAACTAGAGAGGACGGTGACTACGGACTACTGG_TGTTACTCAACAGACAGCAGTGATAATCTGGGGGTGTTC
AAAGATGAGCGATTCT-1                         TGTGCGAGGGAAGGCGCGCAGTACCTGGTACTTGAGTACTGG_TGTCAGCAGTATAATAACTGGCCTTTCACCTTC
AAAGCAAAGAGGTTAT-1                            TGTGCAAGAGATCTTCCCGGTGCTTTTGATATCTGG_TGCTGCTCATATGCAGGTAGTAGCACCCGGGTGTTC
AAAGTAGCAATCCAAC-1 TGTGCGAAAGATTGGGTGGACAACAGTGAGTGGTCCTCATTCGCCGTTTTTGACTACTGG_TGTGCAACATGGGATGACAGCCTGAAAGGTTGGGTGTTT
                                                 CTaa
AAACCTGTCACTGGGC-1           CARGGNSGLDPW_CQQYDNLPLTF
AAACCTGTCAGGTAAA-1                    NA_CQQYDNWPPYTF
AAACCTGTCTCTTATG-1      CARGELERTVTTDYW_CYSTDSSDNLGVF
AAAGATGAGCGATTCT-1         CAREGAQYLVLEYW_CQQYNNWPFTF
AAAGCAAAGAGGTTAT-1          CARDLPGAFDIW_CCSYAGSSTRVF
AAAGTAGCAATCCAAC-1 CAKDWVDNSEWSSFAVFDYW_CATWDDSLKGWVF
                                         CTstrict Frequency
AAACCTGTCACTGGGC-1 IGH.1_IGHV4-59_IGLC1_IGKV1D-33         1
AAACCTGTCAGGTAAA-1           NA_NA_IGLC2_IGKV3-15        12
AAACCTGTCTCTTATG-1 IGH.2_IGHV7-4-1_IGLC3_IGLV3-10         1
AAAGATGAGCGATTCT-1  IGH.3_IGHV4-59_IGLC4_IGKV3-15         1
AAAGCAAAGAGGTTAT-1  IGH.4_IGHV3-74_IGLC5_IGLV2-23         1
AAAGTAGCAATCCAAC-1  IGH.5_IGHV3-23_IGLC6_IGLV1-44         1
                              cloneType  highlight
AAACCTGTCACTGGGC-1  Single (0 < X <= 1)       <NA>
AAACCTGTCAGGTAAA-1 Medium (5 < X <= 20) Clonotype2
AAACCTGTCTCTTATG-1  Single (0 < X <= 1)       <NA>
AAAGATGAGCGATTCT-1  Single (0 < X <= 1)       <NA>
AAAGCAAAGAGGTTAT-1  Single (0 < X <= 1)       <NA>
AAAGTAGCAATCCAAC-1  Single (0 < X <= 1)       <NA>

所以很可能是我们的数据中没有那么多B细胞,很多匹配不上啊。

slot(seurat1, "meta.data")$cloneType <- factor(slot(seurat1, "meta.data")$cloneType
                                              levels = c("Hyperexpanded (100 < X <= 500)""Large (20 < X <= 100)"
                                                         "Medium (5 < X <= 20)""Small (1 < X <= 5)"

                                                                                                                  "Single (0 < X <= 1)", NA))
DimPlot(seurat1, group.by = "cloneType") +
    scale_color_manual(values = colorblind_vector(5), na.value="grey")

为了验证我们的猜想,我们做一个差异分析看看特征基因不就知道了,主要是7,9 群,如果他们果然是B细胞那就很好了。或者做一个B细胞的marker 点图。


library(Seurat)
library(SeuratData)
library(ggplot2)

data("pbmc3k.final")
modify_vlnplot<- function(obj,
                          feature,
                          pt.size = 0,
                          plot.margin = unit(c(-0.75, 0, -0.75, 0), "cm"),
                          ...) {
    p<- VlnPlot(obj, features = feature, pt.size = pt.size, ... )  +
        xlab("") + ylab(feature) + ggtitle("") +
        theme(legend.position = "none",
              axis.text.x = element_blank(),
              axis.text.y = element_blank(),
              axis.ticks.x = element_blank(),
              axis.ticks.y = element_line(),
              axis.title.y = element_text(size = rel(1), angle = 0, vjust = 0.5),
              plot.margin = plot.margin )
    return(p)
}

## main function
StackedVlnPlot<- function(obj, features,
                          pt.size = 0,
                          plot.margin = unit(c(-0.75, 0, -0.75, 0), "cm"),
                          ...) {

    plot_list<- purrr::map(features, function(x) modify_vlnplot(obj = obj,feature = x, ...))
    plot_list[[length(plot_list)]]<- plot_list[[length(plot_list)]] +
        theme(axis.text.x=element_text(), axis.ticks.x = element_line())

    p<- patchwork::wrap_plots(plotlist = plot_list, ncol = 1)
    return(p)
}
my36colors <-c('#E5D2DD''#53A85F''#F1BB72''#F3B1A0''#D6E7A3''#57C3F3''#476D87',
               '#E95C59''#E59CC4''#AB3282''#23452F''#BD956A''#8C549C''#585658',
               '#9FA3A8''#E0D4CA''#5F3D69''#C5DEBA''#58A4C3''#E4C755''#F7F398',
               '#AA9A59''#E63863''#E39A35''#C1E6F3''#6778AE''#91D0BE''#B53E2B',
               '#712820''#DCC1DD''#CCE0F5',  '#CCC9E6''#625D9E''#68A180''#3A6963',
               '#968175'
)

classmk <-c("IL7R""CCR7",     "IL7R""S100A4","CD14""LYZ","MS4A1","CD8A","FCGR3A""MS4A7",    "GNLY",
            "NKG7","FCER1A""CST3","PPBP")
StackedVlnPlot(seurat1, c(classmk), pt.size=0, cols=my36colors)

我们知道,MS4A1是B细胞的一个明显的marker,果不其然,这俩群是B细胞没错了。然后我们再看一下差异分析的结果:

seurat1@misc$mk <- FindAllMarkers(seurat1,only.pos = T)
 seurat1@misc$mk %>% filter(cluster == 9) %>% top_n(20,avg_logFC ) -> topgene
topgene
                   p_val avg_logFC pct.1 pct.2     p_val_adj cluster      gene
TCL1A       0.000000e+00  2.670484 0.947 0.014  0.000000e+00       9     TCL1A
CD79A1      0.000000e+00  2.245267 0.986 0.093  0.000000e+00       9     CD79A
IGHM1       0.000000e+00  2.178957 0.961 0.069  0.000000e+00       9      IGHM
FCER21      0.000000e+00  2.031561 0.870 0.033  0.000000e+00       9     FCER2
MS4A11      0.000000e+00  1.987625 0.975 0.084  0.000000e+00       9     MS4A1
LINC009261  0.000000e+00  1.871576 0.863 0.060  0.000000e+00       9 LINC00926
PLPP51      0.000000e+00  1.708546 0.803 0.093  0.000000e+00       9     PLPP5
IGHD1       0.000000e+00  1.635674 0.722 0.027  0.000000e+00       9      IGHD
CD79B1     1.925974e-299  1.784911 0.937 0.151 6.459332e-295       9     CD79B
HLA-DQA11  5.403560e-293  1.778771 0.901 0.139 1.812246e-288       9  HLA-DQA1
HLA-DQB11  8.293830e-278  1.872821 0.975 0.189 2.781585e-273       9  HLA-DQB1
HLA-DRA3   3.962852e-230  1.945424 0.996 0.279 1.329061e-225       9   HLA-DRA
HLA-DQA23  2.739902e-219  1.611850 0.954 0.231 9.189083e-215       9  HLA-DQA2
HLA-DRB53  8.598615e-212  1.690653 0.986 0.272 2.883803e-207       9  HLA-DRB5
HLA-DRB13  4.137877e-208  1.765784 1.000 0.305 1.387761e-203       9  HLA-DRB1
HLA-DPB12  2.529860e-181  1.673158 0.993 0.369 8.484643e-177       9  HLA-DPB1
CD742      1.774900e-165  2.103404 1.000 0.847 5.952659e-161       9      CD74
IGHV3-21   1.128949e-111  1.904073 0.144 0.005 3.786270e-107       9  IGHV3-21
IGHV3-30    3.301247e-62  1.627276 0.113 0.007  1.107172e-57       9  IGHV3-30
IGKV3-201   1.622764e-42  1.774558 0.120 0.013  5.442425e-38       9  IGKV3-20

seurat1@misc$mk %>% filter(cluster == 7) %>% top_n(20,avg_logFC ) -> topgene
topgene
clu2featur(seurat1,idents = 7 ,features  = topgene$gene[1:8])

差异分析的结果,也证实了我们的猜想:主要有cluster 7,9 是B细胞

现在我们可以通过首先将克隆型作为一个因素排序来查看克隆型容器的分布,这可以防止颜色按字母顺序排列。接下来,我们使用Seurat中的DimPlot()函数调用和scale_color_manual附加图层。

seurat1 <- highlightClonotypes(seurat1, cloneCall= "aa", sequence = c("CAKDRAEWLPQYYYGMDVW_CQQYDNLPPLFTF",                                                                      "NA_CQQYDNWPPYTF"))
DimPlot(seurat1, group.by = "highlight")

由于我们的克隆型实在太少,所以看的不是很清楚:

 table(table(seurat1$CTgene))

  1   2   3   4   5  12  14 
646  30   4   2   2   1   1

我们还可以使用occupiedscRepertoire()函数并选择x.axis来显示细胞对象的元数据中的集群或其他变量,从而查看分配到特定频率范围的集群的单元数。

occupiedscRepertoire(seurat1, x.axis = "seurat_clusters")

最后,在修改了所有元数据之后,我们可以使用alluvialClonotypes()函数查看跨多个类别的clonotypes。要理解这种绘图方法的基本概念,我强烈推荐阅读这篇文章this post(https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html)本质上我们能够使用绘图来检查分类变量的流向。因为这个函数将生成一个图形,其中每个克隆型按所谓的分层进行排列,这将花费一些时间,具体时间取决于细胞总数的大小。为了加快速度,我们实际上将在使用alluvialClonotypes()之前对seurat对象进行子集化。

alluvialClonotypes(seurat1, cloneCall = "gene"
                   y.axes = c("Frequency""cloneType"), 
                   color = "seurat_clusters"

image

对于希望在Seurat对象中使用元数据来执行scRepertoire提供的分析的用户来说,还有一个选项是使用expression2List()函数,该函数将获取元数据并按亚群将数据输出为列表。也就是人们常说的,细胞类型特异的克隆型分析。


combined2 <- expression2List(seurat1, group = "seurat_clusters")
length(combined2) #now listed by cluster

clonalDiversity(combined2, cloneCall = "nt")

clonalHomeostasis(combined2, cloneCall = "nt")

clonalOverlap(combined2, cloneCall="aa", method="overlap")

这是对scRepertoire从最初处理和可视化到附加到Seurat对象中的mRNA表达值的能力的一般概述。如果您有任何问题、意见或建议,请访问github:https://ncborcherding.github.io/vignettes/vignette.html




如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程




看完记得顺手点个“在看”哦!


生物 | 单细胞 | 转录组丨资料每天都精彩

长按扫码可关注

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存