scRepertoire||单细胞免疫组库分析:R语言应用(二)
分享是一种态度
作者 | 周运来
男,
一个长大了才会遇到的帅哥,
稳健,潇洒,大方,靠谱。
一段生信缘,一棵技能树,
一枚大型测序工厂的螺丝钉,
一个随机森林中提灯觅食的津门旅客。
克隆空间内稳态
通过检查克隆空间,我们可以有效地观察特定比例下克隆所占的相对空间。另一种思考方法是把整个免疫受体测序看作一个量杯(总量)。在这个杯子里,我们将填充不同粘度的液体-或不同数量的克隆比例。克隆空间内稳态是问以不同比例(或者是不同粘度的液体,类推)填充的克隆杯子的百分比是多少。比例切点在函数中cloneType变量下设置,可以进行调整,以下为基准:
cloneTypes + Rare = .0001 + Small = .001 + Medium = .01 + Large = .1 + Hyperexpanded = 1
?clonalHomeostasi
clonalHomeostasis(combined, cloneCall = "gene")+ scale_fill_npg()
克隆的比例
与上述克隆空间稳态一样,克隆比例作用是将克隆放入单独的箱(bin)中。关键的区别在于,clonalProportion()
函数不会查看克隆与总数的相对比例,而是根据总数对克隆进行排序,并将它们放入箱子(bin)中。
split表示克隆型按拷贝数或出现频率的排序,即1:10为每个样本中前10个克隆型。默认的bin位于函数中的split变量之下,可以进行调整,但在基线上它们如下所示。
split + 10 + 100 + 1000 + 10000 + 30000 + 100000
clonalProportion(combined, cloneCall = "gene") + scale_fill_npg()
如果您对加载到scRepertoire中的样本之间的相似性度量感兴趣,那么使用clonalOverlap()
可以帮助可视化。目前有两种方法可用于clonalOverlap()
重叠系数和莫里西塔指数。前者是在较小的样本中以独特的克隆型的长度来衡量克隆型的重叠。森里西塔指数更为复杂,它是对一个种群中个体分散程度的一种生态测量,包含了种群的规模。
clonalOverlap(combined, cloneCall = "gene+nt", method = "morisita")
scRepertoire的另一个新特性是使用powerTCR R包改编的clonesizeDistribution()
通过克隆大小分布对示例进行聚类。如果用这个函数来分析样本克隆大小分布的相似性,请阅读并引用相应的引文。在这个函数中,method是指层次聚类的方法。
clonesizeDistribution(combined, cloneCall = "gene+nt", method="ward.D2")
多样性分析
2) inverse Simpson,
3) Chao1和4)based Coverage Estimator (ACE)。
?clonalDiversity
clonalDiversity(combined, cloneCall = "gene", group = "samples")+ scale_fill_npg()
与表达谱数据整合
这里我就换一套数据来展示了,因为原教程的数据没有下载到,所以这里有用不知道从哪来的一套VDJ和RNA的数据来展示。需要说明的是,这里不是多个样本,而是一个样本了。也就是常说的,做了5‘转录组和VDJ 的项目。
csv1 <- read.csv("F:\\VDJ\\filtered_contig_annotations.csv", stringsAsFactors = F)
mycsv1 = list(ex=csv1)
?combineBCR
combined <- combineBCR(mycsv1, samples = c("PY"), ID = c("P"))
head(combined$PY_P)
barcode sample ID IGH cdr3_aa1
1 PY_P_ACGCAGCCATTCCTCG-1 PY P IGHV1-18.IGHJ4.None.IGHM CARDRKDVLDHW
2 PY_P_AGTAGTCGTAATAGCA-1 PY P IGHV4-61.IGHJ2.None.IGHM CARDKSCINGVCHRSWYFDLW
3 PY_P_TACGGATGTCAGCTAT-1 PY P IGHV5-51.IGHJ4.None.IGHM CARRGNGEMATYW
4 PY_P_AACTCAGTCAAACCAC-1 PY P IGHV2-5.IGHJ4.IGHD2-21.IGHM CAHRLNTYCRGDCYASFDYW
5 PY_P_GTCTCGTCAAAGAATC-1 PY P IGHV3-23.IGHJ3.None.IGHM CAKDLRGTRVTSYDAFDIW
6 PY_P_CGGAGTCAGCACCGCT-1 PY P IGHV4-39.IGHJ4.None.IGHM CATEYSSSPALW
cdr3_nt1 IGLC
1 TGTGCGAGAGATAGGAAAGATGTCCTGGACCACTGG IGLV2-8.IGLJ2.IGLC2
2 TGTGCGAGAGACAAATCTTGTATTAATGGTGTGTGCCACAGGTCTTGGTACTTCGATCTCTGG IGLV2-8.IGLJ1.IGLC1
3 TGTGCGCGACGTGGAAATGGAGAGATGGCTACTTATTGG IGLV2-14.IGLJ1.IGLC1
4 TGTGCACACCGCCTCAACACATATTGTCGTGGTGACTGCTACGCCTCCTTTGACTACTGG IGLV2-14.IGLJ2.IGLC2
5 TGTGCGAAAGATCTTCGGGGGACTAGGGTGACTTCTTATGATGCTTTTGATATCTGG IGLV2-14.IGLJ2.IGLC2
6 TGTGCGACAGAATATAGTAGCTCGCCGGCCCTCTGG IGLV2-14.IGLJ2.IGLC2
cdr3_aa2 cdr3_nt2
1 CNSYAGGNKVF TGCAACTCATATGCAGGCGGCAACAAGGTATTC
2 CNSYVDNNSYVF TGCAACTCCTATGTAGACAACAACAGTTATGTCTTC
3 CNSYTRSTTLYVF TGCAATTCATATACACGCAGCACCACTCTTTATGTCTTC
4 CTSYTSSSTLLF TGCACCTCATATACAAGCAGCAGCACTCTGCTGTTC
5 CTSYSSTYTLVLI TGCACCTCATATTCAAGCACCTACACTCTCGTGCTGATC
6 CSSHATSSTAYVIF TGCAGCTCACACGCAACCAGCAGCACTGCCTACGTGATATTC
CTgene
1 IGHV1-18.IGHJ4.None.IGHM_IGLV2-8.IGLJ2.IGLC2
2 IGHV4-61.IGHJ2.None.IGHM_IGLV2-8.IGLJ1.IGLC1
3 IGHV5-51.IGHJ4.None.IGHM_IGLV2-14.IGLJ1.IGLC1
4 IGHV2-5.IGHJ4.IGHD2-21.IGHM_IGLV2-14.IGLJ2.IGLC2
5 IGHV3-23.IGHJ3.None.IGHM_IGLV2-14.IGLJ2.IGLC2
6 IGHV4-39.IGHJ4.None.IGHM_IGLV2-14.IGLJ2.IGLC2
CTnt
1 TGTGCGAGAGATAGGAAAGATGTCCTGGACCACTGG_TGCAACTCATATGCAGGCGGCAACAAGGTATTC
2 TGTGCGAGAGACAAATCTTGTATTAATGGTGTGTGCCACAGGTCTTGGTACTTCGATCTCTGG_TGCAACTCCTATGTAGACAACAACAGTTATGTCTTC
3 TGTGCGCGACGTGGAAATGGAGAGATGGCTACTTATTGG_TGCAATTCATATACACGCAGCACCACTCTTTATGTCTTC
4 TGTGCACACCGCCTCAACACATATTGTCGTGGTGACTGCTACGCCTCCTTTGACTACTGG_TGCACCTCATATACAAGCAGCAGCACTCTGCTGTTC
5 TGTGCGAAAGATCTTCGGGGGACTAGGGTGACTTCTTATGATGCTTTTGATATCTGG_TGCACCTCATATTCAAGCACCTACACTCTCGTGCTGATC
6 TGTGCGACAGAATATAGTAGCTCGCCGGCCCTCTGG_TGCAGCTCACACGCAACCAGCAGCACTGCCTACGTGATATTC
CTaa CTstrict cellType
1 CARDRKDVLDHW_CNSYAGGNKVF IGH.53_IGHV1-18_IGLC51_IGLV2-8 B
2 CARDKSCINGVCHRSWYFDLW_CNSYVDNNSYVF IGH.98_IGHV4-61_IGLC99_IGLV2-8 B
3 CARRGNGEMATYW_CNSYTRSTTLYVF IGH.534_IGHV5-51_IGLC489_IGLV2-14 B
4 CAHRLNTYCRGDCYASFDYW_CTSYTSSSTLLF IGH.12_IGHV2-5_IGLC13_IGLV2-14 B
5 CAKDLRGTRVTSYDAFDIW_CTSYSSTYTLVLI IGH.504_IGHV3-23_IGLC465_IGLV2-14 B
6 CATEYSSSPALW_CSSHATSSTAYVIF IGH.287_IGHV4-39_IGLC268_IGLV2-14 B
我们观察到Barcode被改变名称了,所以我们:
combined$PY_P <- stripBarcode(combined$PY_P , column = 1, connector = "_", num_connects = 3)
读入RNA数据并执行Seurat标准流程:
?Read10X
seo<- Read10X(data.dir = 'F:\\VDJ\\filtered_feature_bc_matrix')
colnames(seo)
srto <- CreateSeuratObject(counts = seo<article class="_2rhmJa"Gene Expression`)
srto ->pbmc
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
pbmc <- RunUMAP(object = pbmc,dims = 1:15)
pbmc -> seurat
seurat
An object of class Seurat
33538 features across 7231 samples within 1 assay
Active assay: RNA (33538 features, 2000 variable features)
3 dimensional reductions calculated: pca, tsne, umap
DimPlot(seurat, label = T) + NoLegend() + scale_fill_npg()
接下来,我们可以获取clonotypic信息,并使用combineExpression()
函数将其附加到Seurat对象中。重要的是,附件的主要需求是匹配contig细胞条形码和seurat或SCE对象的元数据行名中的条形码。如果这些不匹配,将失败。方便起见,我们建议您对Seurat对象行名称进行更改。
?combineExpression
seurat1 <- combineExpression(combined, seurat, cloneCall="gene")
作为核心函数,这个源代码我们是要背诵的了
combineExpression
function (df, sc, cloneCall = "gene+nt", groupBy = "none",
cloneTypes = c(None = 0, Single = 1, Small = 5, Medium = 20,
Large = 100, Hyperexpanded = 500), filterNA = FALSE)
{
df <- checkList(df)
cloneCall <- theCall(cloneCall)
Con.df <- NULL
meta <- grabMeta(sc)
cell.names <- rownames(meta)
if (groupBy == "none") {
for (i in seq_along(df)) {
data <- data.frame(df[[i]], stringsAsFactors = FALSE)
data2 <- unique(data[, c("barcode", cloneCall)])
data2 <- na.omit(data2[data2[, "barcode"] %in%
cell.names, ])
data2 <- data2 %>% group_by(data2[, cloneCall]) %>%
summarise(Frequency = n())
colnames(data2)[1] <- cloneCall
data <- merge(data, data2, by = cloneCall, all = TRUE)
Con.df <- rbind.data.frame(Con.df, data)
}
}
else if (groupBy != "none") {
data <- data.frame(bind_rows(df), stringsAsFactors = FALSE)
data2 <- na.omit(unique(data[, c("barcode", cloneCall,
groupBy)]))
data2 <- data2[data2[, "barcode"] %in% cell.names,
]
data2 <- as.data.frame(data2 %>% group_by(data2[, cloneCall],
data2[, groupBy]) %>% summarise(Frequency = n()))
colnames(data2)[c(1, 2)] <- c(cloneCall, groupBy)
x <- unique(data[, groupBy])
for (i in seq_along(x)) {
sub1 <- subset(data, data[, groupBy] == x[i])
sub2 <- subset(data2, data2[, groupBy] == x[i])
merge <- merge(sub1, sub2, by = cloneCall)
Con.df <- rbind.data.frame(Con.df, merge)
}
}
Con.df$cloneType <- NA
for (x in seq_along(cloneTypes)) {
names(cloneTypes)[x] <- paste0(names(cloneTypes[x]),
" (", cloneTypes[x - 1], " < X <= ",
cloneTypes[x], ")")
}
for (i in 2:length(cloneTypes)) {
Con.df$cloneType <- ifelse(Con.df$Frequency > cloneTypes[i -
1] & Con.df$Frequency <= cloneTypes[i], names(cloneTypes[i]),
Con.df$cloneType)
}
PreMeta <- unique(Con.df[, c("barcode", "CTgene",
"CTnt", "CTaa", "CTstrict", "Frequency",
"cloneType")])
rownames(PreMeta) <- PreMeta$barcode
if (inherits(x = sc, what = "Seurat")) {
sc <- AddMetaData(sc, PreMeta)
}
else {
rownames <- rownames(colData(sc))
colData(sc) <- cbind(colData(sc), PreMeta[rownames, ])[,
union(colnames(colData(sc)), colnames(PreMeta))]
rownames(colData(sc)) <- rownames
}
if (filterNA == TRUE) {
sc <- filteringNA(sc)
}
return(sc)
}
<bytecode: 0x00000224fac95a18>
<environment: namespace:scRepertoire>
head(seurat1@meta.data)
orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 seurat_clusters barcode
AAACCTGAGATCTGAA-1 SeuratProject 4386 1206 3 3 <NA>
AAACCTGAGGAACTGC-1 SeuratProject 6258 1723 4 4 <NA>
AAACCTGAGGAGTCTG-1 SeuratProject 4243 1404 1 1 <NA>
AAACCTGAGGCTCTTA-1 SeuratProject 5205 1622 4 4 <NA>
AAACCTGAGTACGTTC-1 SeuratProject 7098 2123 11 11 <NA>
AAACCTGGTATCACCA-1 SeuratProject 1095 604 10 10 <NA>
CTgene CTnt CTaa CTstrict Frequency cloneType highlight
AAACCTGAGATCTGAA-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
AAACCTGAGGAACTGC-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
AAACCTGAGGAGTCTG-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
AAACCTGAGGCTCTTA-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
AAACCTGAGTACGTTC-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
AAACCTGGTATCACCA-1 <NA> <NA> <NA> <NA> NA <NA> <NA>
为什么那么多NA?我很是困惑,不会做错了吧?
table(rownames(seurat1@meta.data) %in% combined$PY_P$barcode)
FALSE TRUE
6469 762
head(seurat1@meta.data %>% filter(CTgene != "<NA>"))
orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8
AAACCTGTCACTGGGC-1 SeuratProject 7539 2215 7
AAACCTGTCAGGTAAA-1 SeuratProject 1017 532 12
AAACCTGTCTCTTATG-1 SeuratProject 5974 1838 7
AAAGATGAGCGATTCT-1 SeuratProject 5352 1434 7
AAAGCAAAGAGGTTAT-1 SeuratProject 5878 1899 8
AAAGTAGCAATCCAAC-1 SeuratProject 5083 1463 7
seurat_clusters barcode
AAACCTGTCACTGGGC-1 7 AAACCTGTCACTGGGC-1
AAACCTGTCAGGTAAA-1 12 AAACCTGTCAGGTAAA-1
AAACCTGTCTCTTATG-1 7 AAACCTGTCTCTTATG-1
AAAGATGAGCGATTCT-1 7 AAAGATGAGCGATTCT-1
AAAGCAAAGAGGTTAT-1 8 AAAGCAAAGAGGTTAT-1
AAAGTAGCAATCCAAC-1 7 AAAGTAGCAATCCAAC-1
CTgene
AAACCTGTCACTGGGC-1 IGHV4-59.IGHJ5.None.IGHM_IGKV1D-33.IGKJ4.IGKC
AAACCTGTCAGGTAAA-1 NA_IGKV3-15.IGKJ2.IGKC
AAACCTGTCTCTTATG-1 IGHV7-4-1.IGHJ4.None.IGHM_IGLV3-10.IGLJ3.IGLC3
AAAGATGAGCGATTCT-1 IGHV4-59.IGHJ4.None.IGHA1_NA
AAAGCAAAGAGGTTAT-1 IGHV3-74.IGHJ3.None.IGHM_IGLV2-23.IGLJ3.IGLC3
AAAGTAGCAATCCAAC-1 IGHV3-23.IGHJ4.None.IGHG2_IGLV1-44.IGLJ3.IGLC3
CTnt
AAACCTGTCACTGGGC-1 TGTGCGAGAGGCGGGAACAGTGGCTTAGACCCCTGG_TGTCAACAGTATGATAATCTCCCGCTCACTTTC
AAACCTGTCAGGTAAA-1 NA_TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
AAACCTGTCTCTTATG-1 TGTGCGAGAGGCGAACTAGAGAGGACGGTGACTACGGACTACTGG_TGTTACTCAACAGACAGCAGTGATAATCTGGGGGTGTTC
AAAGATGAGCGATTCT-1 TGTGCGAGGGAAGGCGCGCAGTACCTGGTACTTGAGTACTGG_TGTCAGCAGTATAATAACTGGCCTTTCACCTTC
AAAGCAAAGAGGTTAT-1 TGTGCAAGAGATCTTCCCGGTGCTTTTGATATCTGG_TGCTGCTCATATGCAGGTAGTAGCACCCGGGTGTTC
AAAGTAGCAATCCAAC-1 TGTGCGAAAGATTGGGTGGACAACAGTGAGTGGTCCTCATTCGCCGTTTTTGACTACTGG_TGTGCAACATGGGATGACAGCCTGAAAGGTTGGGTGTTT
CTaa
AAACCTGTCACTGGGC-1 CARGGNSGLDPW_CQQYDNLPLTF
AAACCTGTCAGGTAAA-1 NA_CQQYDNWPPYTF
AAACCTGTCTCTTATG-1 CARGELERTVTTDYW_CYSTDSSDNLGVF
AAAGATGAGCGATTCT-1 CAREGAQYLVLEYW_CQQYNNWPFTF
AAAGCAAAGAGGTTAT-1 CARDLPGAFDIW_CCSYAGSSTRVF
AAAGTAGCAATCCAAC-1 CAKDWVDNSEWSSFAVFDYW_CATWDDSLKGWVF
CTstrict Frequency
AAACCTGTCACTGGGC-1 IGH.1_IGHV4-59_IGLC1_IGKV1D-33 1
AAACCTGTCAGGTAAA-1 NA_NA_IGLC2_IGKV3-15 12
AAACCTGTCTCTTATG-1 IGH.2_IGHV7-4-1_IGLC3_IGLV3-10 1
AAAGATGAGCGATTCT-1 IGH.3_IGHV4-59_IGLC4_IGKV3-15 1
AAAGCAAAGAGGTTAT-1 IGH.4_IGHV3-74_IGLC5_IGLV2-23 1
AAAGTAGCAATCCAAC-1 IGH.5_IGHV3-23_IGLC6_IGLV1-44 1
cloneType highlight
AAACCTGTCACTGGGC-1 Single (0 < X <= 1) <NA>
AAACCTGTCAGGTAAA-1 Medium (5 < X <= 20) Clonotype2
AAACCTGTCTCTTATG-1 Single (0 < X <= 1) <NA>
AAAGATGAGCGATTCT-1 Single (0 < X <= 1) <NA>
AAAGCAAAGAGGTTAT-1 Single (0 < X <= 1) <NA>
AAAGTAGCAATCCAAC-1 Single (0 < X <= 1) <NA>
所以很可能是我们的数据中没有那么多B细胞,很多匹配不上啊。
slot(seurat1, "meta.data")$cloneType <- factor(slot(seurat1, "meta.data")$cloneType,
levels = c("Hyperexpanded (100 < X <= 500)", "Large (20 < X <= 100)",
"Medium (5 < X <= 20)", "Small (1 < X <= 5)",
"Single (0 < X <= 1)", NA))
DimPlot(seurat1, group.by = "cloneType") +
scale_color_manual(values = colorblind_vector(5), na.value="grey")
为了验证我们的猜想,我们做一个差异分析看看特征基因不就知道了,主要是7,9 群,如果他们果然是B细胞那就很好了。或者做一个B细胞的marker 点图。
library(Seurat)
library(SeuratData)
library(ggplot2)
data("pbmc3k.final")
modify_vlnplot<- function(obj,
feature,
pt.size = 0,
plot.margin = unit(c(-0.75, 0, -0.75, 0), "cm"),
...) {
p<- VlnPlot(obj, features = feature, pt.size = pt.size, ... ) +
xlab("") + ylab(feature) + ggtitle("") +
theme(legend.position = "none",
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.x = element_blank(),
axis.ticks.y = element_line(),
axis.title.y = element_text(size = rel(1), angle = 0, vjust = 0.5),
plot.margin = plot.margin )
return(p)
}
## main function
StackedVlnPlot<- function(obj, features,
pt.size = 0,
plot.margin = unit(c(-0.75, 0, -0.75, 0), "cm"),
...) {
plot_list<- purrr::map(features, function(x) modify_vlnplot(obj = obj,feature = x, ...))
plot_list[[length(plot_list)]]<- plot_list[[length(plot_list)]] +
theme(axis.text.x=element_text(), axis.ticks.x = element_line())
p<- patchwork::wrap_plots(plotlist = plot_list, ncol = 1)
return(p)
}
my36colors <-c('#E5D2DD', '#53A85F', '#F1BB72', '#F3B1A0', '#D6E7A3', '#57C3F3', '#476D87',
'#E95C59', '#E59CC4', '#AB3282', '#23452F', '#BD956A', '#8C549C', '#585658',
'#9FA3A8', '#E0D4CA', '#5F3D69', '#C5DEBA', '#58A4C3', '#E4C755', '#F7F398',
'#AA9A59', '#E63863', '#E39A35', '#C1E6F3', '#6778AE', '#91D0BE', '#B53E2B',
'#712820', '#DCC1DD', '#CCE0F5', '#CCC9E6', '#625D9E', '#68A180', '#3A6963',
'#968175'
)
classmk <-c("IL7R", "CCR7", "IL7R", "S100A4","CD14", "LYZ","MS4A1","CD8A","FCGR3A", "MS4A7", "GNLY",
"NKG7","FCER1A", "CST3","PPBP")
StackedVlnPlot(seurat1, c(classmk), pt.size=0, cols=my36colors)
我们知道,MS4A1是B细胞的一个明显的marker,果不其然,这俩群是B细胞没错了。然后我们再看一下差异分析的结果:
seurat1@misc$mk <- FindAllMarkers(seurat1,only.pos = T)
seurat1@misc$mk %>% filter(cluster == 9) %>% top_n(20,avg_logFC ) -> topgene
topgene
p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
TCL1A 0.000000e+00 2.670484 0.947 0.014 0.000000e+00 9 TCL1A
CD79A1 0.000000e+00 2.245267 0.986 0.093 0.000000e+00 9 CD79A
IGHM1 0.000000e+00 2.178957 0.961 0.069 0.000000e+00 9 IGHM
FCER21 0.000000e+00 2.031561 0.870 0.033 0.000000e+00 9 FCER2
MS4A11 0.000000e+00 1.987625 0.975 0.084 0.000000e+00 9 MS4A1
LINC009261 0.000000e+00 1.871576 0.863 0.060 0.000000e+00 9 LINC00926
PLPP51 0.000000e+00 1.708546 0.803 0.093 0.000000e+00 9 PLPP5
IGHD1 0.000000e+00 1.635674 0.722 0.027 0.000000e+00 9 IGHD
CD79B1 1.925974e-299 1.784911 0.937 0.151 6.459332e-295 9 CD79B
HLA-DQA11 5.403560e-293 1.778771 0.901 0.139 1.812246e-288 9 HLA-DQA1
HLA-DQB11 8.293830e-278 1.872821 0.975 0.189 2.781585e-273 9 HLA-DQB1
HLA-DRA3 3.962852e-230 1.945424 0.996 0.279 1.329061e-225 9 HLA-DRA
HLA-DQA23 2.739902e-219 1.611850 0.954 0.231 9.189083e-215 9 HLA-DQA2
HLA-DRB53 8.598615e-212 1.690653 0.986 0.272 2.883803e-207 9 HLA-DRB5
HLA-DRB13 4.137877e-208 1.765784 1.000 0.305 1.387761e-203 9 HLA-DRB1
HLA-DPB12 2.529860e-181 1.673158 0.993 0.369 8.484643e-177 9 HLA-DPB1
CD742 1.774900e-165 2.103404 1.000 0.847 5.952659e-161 9 CD74
IGHV3-21 1.128949e-111 1.904073 0.144 0.005 3.786270e-107 9 IGHV3-21
IGHV3-30 3.301247e-62 1.627276 0.113 0.007 1.107172e-57 9 IGHV3-30
IGKV3-201 1.622764e-42 1.774558 0.120 0.013 5.442425e-38 9 IGKV3-20
seurat1@misc$mk %>% filter(cluster == 7) %>% top_n(20,avg_logFC ) -> topgene
topgene
clu2featur(seurat1,idents = 7 ,features = topgene$gene[1:8])
差异分析的结果,也证实了我们的猜想:主要有cluster 7,9 是B细胞
现在我们可以通过首先将克隆型作为一个因素排序来查看克隆型容器的分布,这可以防止颜色按字母顺序排列。接下来,我们使用Seurat中的DimPlot()
函数调用和scale_color_manual附加图层。
seurat1 <- highlightClonotypes(seurat1, cloneCall= "aa", sequence = c("CAKDRAEWLPQYYYGMDVW_CQQYDNLPPLFTF", "NA_CQQYDNWPPYTF"))
DimPlot(seurat1, group.by = "highlight")
由于我们的克隆型实在太少,所以看的不是很清楚:
table(table(seurat1$CTgene))
1 2 3 4 5 12 14
646 30 4 2 2 1 1
我们还可以使用occupiedscRepertoire()
函数并选择x.axis来显示细胞对象的元数据中的集群或其他变量,从而查看分配到特定频率范围的集群的单元数。
occupiedscRepertoire(seurat1, x.axis = "seurat_clusters")
最后,在修改了所有元数据之后,我们可以使用alluvialClonotypes()
函数查看跨多个类别的clonotypes。要理解这种绘图方法的基本概念,我强烈推荐阅读这篇文章this post(https://cran.r-project.org/web/packages/ggalluvial/vignettes/ggalluvial.html)本质上我们能够使用绘图来检查分类变量的流向。因为这个函数将生成一个图形,其中每个克隆型按所谓的分层进行排列,这将花费一些时间,具体时间取决于细胞总数的大小。为了加快速度,我们实际上将在使用alluvialClonotypes()
之前对seurat对象进行子集化。
alluvialClonotypes(seurat1, cloneCall = "gene",
y.axes = c("Frequency", "cloneType"),
color = "seurat_clusters")
对于希望在Seurat对象中使用元数据来执行scRepertoire提供的分析的用户来说,还有一个选项是使用expression2List()
函数,该函数将获取元数据并按亚群将数据输出为列表。也就是人们常说的,细胞类型特异的克隆型分析。
combined2 <- expression2List(seurat1, group = "seurat_clusters")
length(combined2) #now listed by cluster
clonalDiversity(combined2, cloneCall = "nt")
clonalHomeostasis(combined2, cloneCall = "nt")
clonalOverlap(combined2, cloneCall="aa", method="overlap")
这是对scRepertoire从最初处理和可视化到附加到Seurat对象中的mRNA表达值的能力的一般概述。如果您有任何问题、意见或建议,请访问github:https://ncborcherding.github.io/vignettes/vignette.html
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
生信爆款入门-第8期(线上直播4周,马拉松式陪伴,带你入门)你的生物信息入门课
数据挖掘学习班第6期(线上直播3周,马拉松式陪伴,带你入门) 医学生/医生首选技能提高课
看完记得顺手点个“在看”哦!
长按扫码可关注