查看原文
其他

单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据

周运来 单细胞天地 2022-06-07

分享是一种态度

作者 |  周运来

男,

一个长大了才会遇到的帅哥,

稳健,潇洒,大方,靠谱。

一段生信缘,一棵技能树,

一枚大型测序工厂的螺丝钉,

一个随机森林中提灯觅食的津门旅客。




在做10X单细胞免疫组库分析的是往往是做一部分BCR、TCR做一部分5‘转录组,那么怎样才能把两者结合到一起呢?

今天我们尝试用我们的趁手工具做一下整合分析。

首先是下载数据,我们从10X官方的dataset中下载数据:https://support.10xgenomics.com/single-cell-vdj/datasets/3.1.0/vdj_v1_hs_pbmc3

在下载页面有关于这个样本的基本介绍,如这个数据集根据单细胞V(D)J试剂试剂盒使用指南和细胞表面蛋白特征条形码技术(CG000186),从标记的细胞中扩增出cDNA,生成5’基因表达、细胞表面蛋白、TCR富集和Ig富集文库。这里是不是应该把本文的题目改成:Seurat整合单细胞转录组与VDJ数据与膜蛋白数据啊。

读入数据:

1rm(list =ls())
2library(Seurat)
3data <-   Read10X(data.dir = "F:\\VDJ\\filtered_feature_bc_matrix")
410X data contains more than one type and is being returned as a list containing matrices of each type.

上来就报错:

1seurat_object = CreateSeuratObject(counts = data)
2Error in CreateAssayObject(counts = counts, min.cells = min.cells, min.features = min.features) : 
3  No cell names (colnames) names present in the input matrix
4

原来我们的feature中不仅有基因表达还有膜蛋白的信息,分开来读好了:

1seurat_object = CreateSeuratObject(counts = data$`Gene Expression`)
2seurat_object[['Protein']] = CreateAssayObject(counts = data$`Antibody Capture`)

我们熟悉的seurat对象又回来了:

1seurat_object
2
3An object of class Seurat 
433555 features across 7231 samples within 2 assays 
5Active assay: RNA (33538 features)
6 1 other assay present: Protein

注意这里是两个数据, RNA ,Protein:

当我们需要分析哪一部分的时候,用DefaultAssay(seurat_object)来指定,这里我们只分析RNA的数据啊,茂密的森林伸出两条路,我们也看一眼膜蛋白的数据吧:

1 seurat_object@assays$Protein
2Assay data with 17 features for 7231 cells
3First 10 features:
4 CD3-UCHT1-TotalC, CD19-HIB19-TotalC, CD45RA-HI100-TotalC, CD4-RPA-T4-TotalC, CD8a-RPA-T8-TotalC, CD14-M5E2-TotalC, CD16-3G8-TotalC, CD56-QA17A16-TotalC, CD25-BC96-TotalC,
5CD45RO-UCHL1-TotalC 

接着我们读入BCR的数据:

1bcr <- read.csv("F:\\VDJ\\filtered_contig_annotations.csv")
2bcr[1:4,]
3 barcode is_cell                   contig_id high_confidence length chain    v_gene d_gene j_gene c_gene full_length productive             cdr3
41 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_1            True    556   IGK IGKV1D-33   None  IGKJ4   IGKC        True       True      CQQYDNLPLTF
52 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_2            True    516   IGH  IGHV4-59   None  IGHJ5   IGHM        True       True     CARGGNSGLDPW
63 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_3            True    569   IGK IGKV2D-30   None  IGKJ1   IGKC        True      False CWGLLLHARYTLAWTF
74 AAACCTGTCAGGTAAA-1    True AAACCTGTCAGGTAAA-1_contig_1            True    548   IGK  IGKV1-12   None  IGKJ4   IGKC        True       True      CQQANSFPLTF
8                                           cdr3_nt reads umis raw_clonotype_id       raw_consensus_id
91                TGTCAACAGTATGATAATCTCCCGCTCACTTTC  7410   61       clonotype7 clonotype7_consensus_1
102             TGTGCGAGAGGCGGGAACAGTGGCTTAGACCCCTGG  1458   18       clonotype7 clonotype7_consensus_2
113 TGTTGGGGTTTATTACTGCATGCAAGGTACACACTGGCCTGGACGTTC  2045   17       clonotype7                   None
124                TGTCAACAGGCTAACAGTTTCCCGCTCACTTTC   140    1       clonotype2 clonotype2_consensus_2

为了保持barcode一致,处理一下barcode的命名:

1bcr$barcode <- gsub("-1""", bcr$barcode)
2bcr <- bcr[!duplicated(bcr$barcode), ]
3names(bcr)[names(bcr) == "raw_clonotype_id"] <- "clonotype_id"
4table(bcr$v_gene)
5
6    IGHV1-18      IGHV1-2     IGHV1-24      IGHV1-3     IGHV1-45     IGHV1-46     IGHV1-58     IGHV1-69    IGHV1-69D      IGHV1-8 IGHV1/OR15-1 IGHV1/OR21-1     IGHV2-26      IGHV2-5     IGHV2-70 
7          30           48            5           14            1           18            2            1           26            8            1            1            9           28            8 
8    IGHV3-11     IGHV3-13     IGHV3-15     IGHV3-16     IGHV3-21     IGHV3-23     IGHV3-30     IGHV3-33     IGHV3-43     IGHV3-48     IGHV3-49     IGHV3-53     IGHV3-64    IGHV3-64D     IGHV3-66 
9          20            7           29            1           43           95           40           50           10           25           10           13            5            4            2 
10     IGHV3-7     IGHV3-72     IGHV3-73     IGHV3-74   IGHV4-30-2     IGHV4-31     IGHV4-34     IGHV4-39      IGHV4-4     IGHV4-59     IGHV4-61   IGHV5-10-1     IGHV5-51      IGHV6-1    IGHV7-4-1 
11          15            6            7           25            9            4           30           55           22           76            9            5           21            4            8 
12    IGKV1-12     IGKV1-16     IGKV1-17     IGKV1-27     IGKV1-33     IGKV1-37     IGKV1-39      IGKV1-5      IGKV1-6      IGKV1-8      IGKV1-9    IGKV1D-13    IGKV1D-33    IGKV1D-37    IGKV1D-39 
13          29           10            8            9            9            2            6           42            3           16           11            1           15            6           65 
14   IGKV1D-43     IGKV1D-8     IGKV2-24     IGKV2-28     IGKV2-30    IGKV2D-26    IGKV2D-28    IGKV2D-29    IGKV2D-30    IGKV2D-40     IGKV3-11     IGKV3-15     IGKV3-20      IGKV3-7    IGKV3D-11 
15           5            4            6           26           14            5            1            9            1            2           39           58           89            2            1 
16   IGKV3D-15    IGKV3D-20      IGKV4-1      IGKV5-2     IGKV6-21    IGKV6D-21     IGLV1-40     IGLV1-44     IGLV1-47     IGLV1-51    IGLV10-54     IGLV2-11     IGLV2-14     IGLV2-18     IGLV2-23 
17           9            2           60            8            2            1           31           37           26           24            3           24           60            1           20 
18     IGLV2-8      IGLV3-1     IGLV3-10     IGLV3-12     IGLV3-16     IGLV3-19     IGLV3-21     IGLV3-25     IGLV3-27      IGLV3-9      IGLV4-3     IGLV4-60     IGLV4-69     IGLV5-37     IGLV5-39 
19          20           35           15            1            2           15           27           16            4            4            2            5           15            7            1 
20    IGLV5-45     IGLV5-48     IGLV5-52     IGLV6-57     IGLV7-43     IGLV7-46     IGLV8-61     IGLV9-49         None 
21           9            1            2           14            5           10           15            9          173 
22>

读入克隆型信息:

1clono <- read.csv("F:\\VDJ\\vdj_v1_hs_pbmc3_b_clonotypes.csv")
2head(clono)
3
4 clonotype_id frequency  proportion                            cdr3s_aa                                                                                cdr3s_nt
51   clonotype1         7 0.009067358                    IGK:CQQYDNWPPYTF                                                IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
62   clonotype2         3 0.003886010    IGK:CQQANSFPLTF;IGK:CQQYDNWPPYTF          IGK:TGTCAACAGGCTAACAGTTTCCCGCTCACTTTC;IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
73   clonotype5         2 0.002590674                     IGL:CQARDSSTVVF                                                   IGL:TGTCAGGCGCGGGACAGCAGCACTGTGGTATTC
84   clonotype6         2 0.002590674 IGH:CARPGTTGTTGLKNW;IGK:CQQYNNWPLTF IGH:TGTGCGAGACCCGGTACAACTGGAACGACGGGTTTAAAAAACTGG;IGK:TGTCAGCAGTATAATAACTGGCCTCTCACCTTC
95   clonotype4         2 0.002590674 IGH:CAKSFFTGTGQFHYW;IGK:CQQYSTYSQTF IGH:TGTGCGAAATCATTTTTTACCGGGACAGGACAGTTTCACTATTGG;IGK:TGCCAACAGTATAGTACTTATTCTCAAACGTTC
106   clonotype3         2 0.002590674                    IGL:CQSYDSSLSGVF                                                IGL:TGCCAGTCCTATGACAGCAGCCTGAGTGGGGTGTTC

整合到seurat之中:

1# Slap the AA sequences onto our original table by clonotype_id.
2bcr <- merge(bcr, clono)
3
4# Reorder so barcodes are first column and set them as rownames.
5rownames(bcr) <- bcr[,2]
6#tcr[,1] <- NULL
7
8# Add to the Seurat object's metadata.
9clono_seurat <- AddMetaData(object=seurat_object, metadata=bcr)
10
11head(clono_seurat@meta.data)
12                    orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id barcode is_cell contig_id high_confidence length chain v_gene d_gene j_gene c_gene full_length
13AAACCTGAGATCTGAA SeuratProject       4386         1206           5092               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
14AAACCTGAGGAACTGC SeuratProject       6258         1723           6480               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
15AAACCTGAGGAGTCTG SeuratProject       4243         1404           3830               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
16AAACCTGAGGCTCTTA SeuratProject       5205         1622           3589               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
17AAACCTGAGTACGTTC SeuratProject       7098         2123           5250               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
18AAACCTGGTATCACCA SeuratProject       1095          604           3573               17         <NA>    <NA>    <NA>      <NA>            <NA>     NA  <NA>   <NA>   <NA>   <NA>   <NA>        <NA>
19                 productive cdr3 cdr3_nt reads umis raw_consensus_id frequency proportion cdr3s_aa cdr3s_nt
20AAACCTGAGATCTGAA       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>
21AAACCTGAGGAACTGC       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>
22AAACCTGAGGAGTCTG       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>
23AAACCTGAGGCTCTTA       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>
24AAACCTGAGTACGTTC       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>
25AAACCTGGTATCACCA       <NA> <NA>    <NA>    NA   NA             <NA>        NA         NA     <NA>     <NA>

为什么这么多NA?因为转录组中的细胞很多,而免疫细胞(BCR)只有一部分。

1 table(rownames(clono_seurat@meta.data)  %in% rownames(bcr))
2
3FALSE  TRUE 
4 6469   762 
5> clono_seurat<- subset(clono_seurat,cells =  rownames(bcr))
6> head(clono_seurat@meta.data)
7                    orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id          barcode is_cell                   contig_id high_confidence length chain   v_gene d_gene
8CTTTGCGCAAGGACTG SeuratProject       1266          642            696               17   clonotype1 CTTTGCGCAAGGACTG    True CTTTGCGCAAGGACTG-1_contig_1            True    479   IGK IGKV3-15   None
9CAGCTAACACCGATAT SeuratProject        935          484            545               17   clonotype1 CAGCTAACACCGATAT    True CAGCTAACACCGATAT-1_contig_1            True    549   IGK IGKV3-15   None
10TATTACCGTCGCTTCT SeuratProject       1352          594            422               17   clonotype1 TATTACCGTCGCTTCT    True TATTACCGTCGCTTCT-1_contig_1            True    565   IGK IGKV3-15   None
11CTGAAGTGTGAAGGCT SeuratProject       1259          626            450               17   clonotype1 CTGAAGTGTGAAGGCT    True CTGAAGTGTGAAGGCT-1_contig_1            True    554   IGK IGKV3-15   None
12CCATGTCAGACCGGAT SeuratProject       1088          555            741               17   clonotype1 CCATGTCAGACCGGAT    True CCATGTCAGACCGGAT-1_contig_1            True    556   IGK IGKV3-15   None
13CGTTCTGAGCTCAACT SeuratProject        808          461           4896               17   clonotype1 CGTTCTGAGCTCAACT    True CGTTCTGAGCTCAACT-1_contig_1            True    565   IGK IGKV3-15   None
14                 j_gene c_gene full_length productive         cdr3                              cdr3_nt reads umis       raw_consensus_id frequency  proportion         cdr3s_aa
15CTTTGCGCAAGGACTG  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   197    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
16CAGCTAACACCGATAT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   274    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
17TATTACCGTCGCTTCT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   299    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
18CTGAAGTGTGAAGGCT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   250    3 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
19CCATGTCAGACCGGAT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   324    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
20CGTTCTGAGCTCAACT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   187    3 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
21                                                 cdr3s_nt
22CTTTGCGCAAGGACTG IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
23CAGCTAACACCGATAT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
24TATTACCGTCGCTTCT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
25CTGAAGTGTGAAGGCT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
26CCATGTCAGACCGGAT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
27CGTTCTGAGCTCAACT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT

seurat标准过程:

1library(tidyverse)
2clono_seurat %>% NormalizeData() %>% FindVariableFeatures() %>% 
3 ScaleData(assay="RNA")  %>% 
4 RunPCA(npcs = 50,assay="RNA")%>% 
5 FindNeighbors(assay="RNA")%>%
6 FindClusters(assay="RNA")%>%
7 RunUMAP(1:30) -> clono_seurat
8
9head(clono_seurat@meta.data)
10
11                  orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id          barcode is_cell
12CTTTGCGCAAGGACTG SeuratProject       1266          642            696               17   clonotype1 CTTTGCGCAAGGACTG    True
13CAGCTAACACCGATAT SeuratProject        935          484            545               17   clonotype1 CAGCTAACACCGATAT    True
14TATTACCGTCGCTTCT SeuratProject       1352          594            422               17   clonotype1 TATTACCGTCGCTTCT    True
15CTGAAGTGTGAAGGCT SeuratProject       1259          626            450               17   clonotype1 CTGAAGTGTGAAGGCT    True
16CCATGTCAGACCGGAT SeuratProject       1088          555            741               17   clonotype1 CCATGTCAGACCGGAT    True
17CGTTCTGAGCTCAACT SeuratProject        808          461           4896               17   clonotype1 CGTTCTGAGCTCAACT    True
18                                  contig_id high_confidence length chain   v_gene d_gene j_gene c_gene full_length productive
19CTTTGCGCAAGGACTG CTTTGCGCAAGGACTG-1_contig_1            True    479   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
20CAGCTAACACCGATAT CAGCTAACACCGATAT-1_contig_1            True    549   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
21TATTACCGTCGCTTCT TATTACCGTCGCTTCT-1_contig_1            True    565   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
22CTGAAGTGTGAAGGCT CTGAAGTGTGAAGGCT-1_contig_1            True    554   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
23CCATGTCAGACCGGAT CCATGTCAGACCGGAT-1_contig_1            True    556   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
24CGTTCTGAGCTCAACT CGTTCTGAGCTCAACT-1_contig_1            True    565   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
25                        cdr3                              cdr3_nt reads umis       raw_consensus_id frequency  proportion
26CTTTGCGCAAGGACTG CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   197    2 clonotype1_consensus_1         7 0.009067358
27CAGCTAACACCGATAT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   274    2 clonotype1_consensus_1         7 0.009067358
28TATTACCGTCGCTTCT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   299    2 clonotype1_consensus_1         7 0.009067358
29CTGAAGTGTGAAGGCT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   250    3 clonotype1_consensus_1         7 0.009067358
30CCATGTCAGACCGGAT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   324    2 clonotype1_consensus_1         7 0.009067358
31CGTTCTGAGCTCAACT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   187    3 clonotype1_consensus_1         7 0.009067358
32                        cdr3s_aa                                 cdr3s_nt RNA_snn_res.0.8 seurat_clusters
33CTTTGCGCAAGGACTG IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
34CAGCTAACACCGATAT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
35TATTACCGTCGCTTCT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
36CTGAAGTGTGAAGGCT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
37CCATGTCAGACCGGAT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
38CGTTCTGAGCTCAACT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
1p1 <- DimPlot(clono_seurat)
2p2 <-DimPlot(clono_seurat,group.by = "chain")
3p3 <-FeaturePlot(clono_seurat,features="frequency")
4
5library(patchwork)
6p1+p2+p3
1library(sankeywheel)
2sankeywheel(
3  from = clono_seurat@meta.data$chain, to = clono_seurat@meta.data$v_gene,
4  weight = clono_seurat@meta.data$nCount_Protein,#type = "sankey",
5  title = "sk",
6  subtitle = "chain &  v_gene",
7  seriesName = "", width = "100%", height = "600px"
8)

计算不同链的差异基因:

1DT::datatable(FindMarkers(clono_seurat,ident.1 = 'IGK',group.by = "chain"))
2mk<-FindMarkers(clono_seurat,ident.1 = 'IGK',group.by = "chain")
1DotPlot(clono_seurat,features=rownames(mk),group.by = "chain") + RotatedAxis()
1RidgePlot(clono_seurat,features=rownames(mk),group.by = "chain")

References

[1] Tutorial: Integrating VDJ sequencing data with Seurathttps://links.jianshu.com/go?to=https%3A%2F%2Fwww.biostars.org%2Fp%2F384640%2F
[2] Tutorial: Associating VDJ clonotyping data with scRNA-seq in Seurathttps://links.jianshu.com/go?to=https%3A%2F%2Fwww.biostars.org%2Fp%2F383217%2F


往期回顾

BatchBench比较scRNA批次矫正方法

特殊物种cellranger基因组质量评估

使用igblast进行免疫组库分析

使用MiXCR进行免疫组库分析

scRNA-seq聚类分析(一)

10X Cell Ranger ATAC 算法概述

转录组公益课(含单细胞)(名额有限,先到先得)

sc-ATAC-seq细胞类型注释策略

阿兹海默症和小胶质细胞:人和小鼠脑细胞核转录组测序

10X Genomics单细胞免疫组库VDJ分析必知必会






如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程




看完记得顺手点个“在看”哦!


生物 | 单细胞 | 转录组丨资料每天都精彩

长按扫码可关注



您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存