SPOTlight || 用NMF解卷积空间表达数据
作者 | 周运来
男,
一个长大了才会遇到的帅哥,
稳健,潇洒,大方,靠谱。
一段生信缘,一棵技能树。
生信技能树核心成员,单细胞天地特约撰稿人,简书创作者,单细胞数据科学家。
Giotto|| 空间表达数据分析工具箱
Seurat 新版教程:分析空间转录组数据(上)
Seurat 新版教程:分析空间转录组数据(下)
scanpy教程:空间转录组数据分析
10X Visium:空间转录组样本制备到数据分析
空间信息在空间转录组中的运用
定量免疫浸润在单细胞研究中的应用
SPOTlight:Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptome DSTG: Deconvoluting Spatial Transcriptomics Data through Graph-based Artificial Intelligence Charting Tissue Expression Anatomy by Spatial Transcriptome Decomposition Advances in mixed cell deconvolution enable quantification of cell types in spatially-resolved gene expression data Colour deconvolution: stain unmixing in histological imaging Deep learning–based cell composition analysis from tissue expression profiles Updating immune cell deconvolution for the spatial genomics era Stability-driven nonnegative matrix factorization to interpret spatial gene expression and build local gene networks
空间解析基因表达谱是理解组织组织和功能的关键。然而,目前空间转录组分析技术(Spatial Transcriptomics,ST)尚未达到单细胞分辨率,往往需要结合单细胞RNA测序(scRNA-seq)信息来反褶积(或解卷积,Deconvolute )空间数据集。SPOTlight利用这两种数据类型的优势,能够将ST与scRNA-seq数据集成,从而推断出复杂组织中细胞类型和状态的位置。SPOTlight基于一个种子的非负矩阵因子分解回归(Seeded NMF regression ),使用细胞类型标记基因和非负最小二乘(NNLS)初始化,随后去卷积ST捕获位置(spot)。在作者的文章中,在示例数据人类胰腺癌中,成功地将患者切片划分为健康和癌区,并进一步精细绘制正常和肿瘤细胞状态。SPOTlight 流程如下:
非负矩阵分解在基因表达数据中有着广泛的应用,究其原因,是因为表达数据本身是一个非负矩阵。非负矩阵分解,顾名思义就是,将非负的大矩阵分解成两个非负的小矩阵。也可以用来
现在我们跟着SPOTlight在GitHub上面的教程,来走一遍从单细胞到空间的映射,更多详细信息当然是要看他们的文章了。
首先加载我们的R包和单细胞数据。
# install.packages("devtools")
# devtools::install_github("https://github.com/MarcElosua/SPOTlight")
library(SPOTlight)
library(Seurat)
library(dplyr)
# devtools::install_github('satijalab/seurat-data')
library(SeuratData)
sppath <- "D:\\SPOTlight-master\\sample_data\\"
# This file loads single cell experiment objects
cortex_sc <- readRDS(paste0(sppath,"/allen_cortex_dwn.rds"))
看看数据都经过了哪些处理:
cortex_sc@commands
$FindVariableFeatures.RNA
Command: FindVariableFeatures(object = se_obj, nfeatures = hvg)
Time: 2020-05-28 01:37:19
assay : RNA
selection.method : vst
loess.span : 0.3
clip.max : auto
num.bin : 20
binning.method : equal_width
nfeatures : 34617
mean.cutoff : 0.1 8
dispersion.cutoff : 1 Inf
verbose : TRUE
执行Seurat标准分析
#cortex_sc <- SCTransform(cortex_sc, verbose = FALSE)
cortex_sc %>% NormalizeData()%>%
FindVariableFeatures()%>%
ScaleData() %>%
Seurat::RunPCA( verbose = FALSE) %>%
Seurat::RunUMAP( dims = 1:30, verbose = FALSE)%>%
Seurat::FindNeighbors( dims = 1:30, verbose = FALSE)%>%
Seurat::FindClusters(verbose = FALSE) -> cortex_sc
cortex_sc
An object of class Seurat
34617 features across 1404 samples within 1 assay
Active assay: RNA (34617 features, 2000 variable features)
2 dimensional reductions calculated: pca, umap
军中无以为乐,画个umap吧。
ggplot(DimPlot(cortex_sc,label = T)$data,aes(UMAP_1 , UMAP_2,fill=ident)) +
geom_point(shape=21,colour="black",stroke=0.25,alpha=0.8) +
DimPlot(cortex_sc,label = T)$theme +
theme_bw()
载入空间数据
# InstallData("stxBrain")
anterior <- LoadData("stxBrain", type = "anterior1")
anterior
An object of class Seurat
31053 features across 2696 samples within 1 assay
Active assay: Spatial (31053 features, 0 variable features)
也执行标准分析,我们还是推荐使用SCTransform
的,只是为了快,我们对而求其次了。
# anterior <- Seurat::SCTransform(anterior, assay = "Spatial", verbose = FALSE)
anterior %>% NormalizeData()%>%
FindVariableFeatures()%>%
ScaleData() %>%
Seurat::RunPCA( verbose = FALSE) %>%
Seurat::RunUMAP( dims = 1:30, verbose = FALSE)%>%
Seurat::FindNeighbors( dims = 1:30, verbose = FALSE)%>%
Seurat::FindClusters(verbose = FALSE) -> anterior
Seurat::DimPlot(cortex_sc, group.by = "subclass")
p1 <- Seurat::SpatialDimPlot(anterior,label = T) + NoLegend()
p2 <- ggplot(DimPlot(anterior)$data,aes(UMAP_1 , UMAP_2,fill=ident)) +
geom_point(shape=21,colour="black",stroke=0.25,alpha=0.8) +
DimPlot(anterior,label = T)$theme +
theme_bw()
p1 + p2
计算每个群的差异基因,当然是用定义好的数据了。
有这么多metadata信息啊。
colnames(cortex_sc@meta.data)
[1] "orig.ident" "nCount_RNA" "nFeature_RNA"
[4] "sample_id" "sample_type" "organism"
[7] "donor" "sex" "age_days"
[10] "eye_condition" "genotype" "driver_lines"
[13] "reporter_lines" "brain_hemisphere" "brain_region"
[16] "brain_subregion" "injection_label_direction" "injection_primary"
[19] "injection_secondary" "injection_tract" "injection_material"
[22] "injection_exclusion_criterion" "facs_date" "facs_container"
[25] "facs_sort_criteria" "rna_amplification_set" "library_prep_set"
[28] "library_prep_avg_size_bp" "seq_name" "seq_tube"
[31] "seq_batch" "total_reads" "percent_exon_reads"
[34] "percent_intron_reads" "percent_intergenic_reads" "percent_rrna_reads"
[37] "percent_mt_exon_reads" "percent_reads_unique" "percent_synth_reads"
[40] "percent_ecoli_reads" "percent_aligned_reads_total" "complexity_cg"
[43] "genes_detected_cpm_criterion" "genes_detected_fpkm_criterion" "tdt_cpm"
[46] "gfp_cpm" "class" "subclass"
[49] "cluster" "confusion_score" "cluster_correlation"
[52] "core_intermediate_call" "RNA_snn_res.0.8" "seurat_clusters"
set.seed(123)
#### Extract the top marker genes from each cluster ####
Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
# 计算一次差异基因不容易,我们保存在Seurat里。
cluster_markers_all <- cortex_sc@misc$cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc,
assay = "RNA",
slot = "data",
verbose = TRUE,
only.pos = TRUE,
logfc.threshold = 1,
min.pct = 0.9)
核心步骤,执行spotlight_deconvolution
用单细胞数据对空间数据进行解析。用的非负矩阵分析方法:Seeded NMF regression ,也打包在这个函数之中了。对原理当然不能放过,看函数帮助文档和源码吧。spotlight_deconvolution函数接受scRNAseq数据和空间转录组学数据计数矩阵的Seurat对象,并返回反卷积结果。
?spotlight_deconvolution
spotlight_ls <- spotlight_deconvolution(se_sc = cortex_sc,
counts_spatial = anterior@assays$Spatial@counts,
clust_vr = "subclass",
cluster_markers = cluster_markers_all,
cl_n = 50,
hvg = 3000,
ntop = NULL,
transf = "uv",
method = "nsNMF",
min_cont = 0.09)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "Preparing Gene set"
[1] "Normalizing count matrix"
[1] "Seeding initial matrices"
[1] "Training...
"[1] "Time to train NMF model was 20.36mins"
[1] "Deconvoluting spots"
|==========================================================================================================| 100%```
我们来看结果都有什么。
str(spotlight_ls,max.level = 3)
List of 2
$ :List of 2
..$ :Formal class 'NMFfit' [package "NMF"] with 12 slots
..$ : chr [1:1061] "Astro" "Astro" "Astro" "Astro" ...
$ : num [1:2696, 1:24] 0.178 0.531 0.172 0 0.24 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:24] "Astro" "CR" "Endo" "L2.3.IT" ...
其实主要的结果是每个spot可能的细胞类型的占比矩阵,提取解析结果可视化
decon_mtrx <- spotlight_ls[[2]]
dim(decon_mtrx)
[1] 2696 24
quantile(apply(decon_mtrx,1,sum)) # 每个spot中的细胞类型占比之和均为1的某个邻域值。
0% 25% 50% 75% 100%
1.000068 1.008327 1.012321 1.016593 1.078064
decon_mtrx[1:4,1:4]
Astro CR Endo L2.3.IT
[1,] 0.1784328 0 0 0.0000000
[2,] 0.5310432 0 0 0.0000000
[3,] 0.1720674 0 0 0.2691672
[4,] 0.0000000 0 0 0.0000000
cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]
anterior@meta.data <- cbind(anterior@meta.data, decon_mtrx)
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = paste0(sppath,"/spatial/tissue_lowres_image.png"),
pie_scale = 0.4)
展示某一细胞的空间位置。
?spatial_scatterpie
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = paste0(sppath,"/spatial/tissue_lowres_image.png"),
cell_types_interest = "L6b",
pie_scale = 0.5)
可以指定多个细胞类型
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = paste0(sppath,"/spatial/tissue_lowres_image.png"),
cell_types_interest = c('Endo',"L6b",'L2.3.IT'),
pie_scale = 0.5)
SpatialFeaturePlot(anterior,
features = "L6b",
pt.size.factor = 1,
alpha = c(0, 1)) +
ggplot2::scale_fill_gradientn(
colours = heat.colors(10, rev = TRUE),
limits = c(0, 1))
我们知道反卷积矩阵是行为spot标签,列为目的细胞类型的矩阵(每个点中发现的细胞类型),那么基于此我们可以构建一个细胞类型间的网络图,表示空间相互作用。细胞类型之间的链接越强,我们在同一个点中发现它们的频率就越高。这两个种细胞类型更大的可能出现在同一个spot中。
?get_spatial_interaction_graph
dim(decon_mtrx[, cell_types_all])
graph_ntw <- get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])
library(igraph)
library(RColorBrewer)
deg <- degree(graph_ntw, mode="all")
# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance
# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]
# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))
# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)
# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
color = getPalette(length(grad_edge)),
stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
dplyr::left_join(graph_col_df, by = "value") %>%
dplyr::pull(color)
# Open a pdf file
plot(graph_ntw,
# Size of the edge
edge.width = edge_importance,
edge.color = color_edge,
vertex.size = deg,
vertex.color = "red",
vertex.frame.color = "white",
vertex.label.color = "black",
layout = layout.circle)
Oligo,Astro,L5.PT
SPOTlight::spatial_scatterpie(se_obj = anterior,
cell_types_all = cell_types_all,
img_path = paste0(sppath,"/spatial/tissue_lowres_image.png"),
cell_types_interest = c('Oligo',"Astro",'L5.PT'),
pie_scale = 0.5)
我们可以查看cell类型特定主题配置( topic profiles)并评估其独特性, 如果两个细胞具有相似的topic profiles,它们更可能混淆。
nmf_mod_ls <- spotlight_ls[[1]]
nmf_mod <- nmf_mod_ls[[1]]
?coef
h <- NMF::coef(nmf_mod)
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- dot_plot_profiles_fun(h = h,
train_cell_clust = nmf_mod_ls[[2]])
topic_profile_plts[[2]] + theme(axis.text.x = element_text(angle = 90),
axis.text = element_text(size = 12))
思考题:
解卷积可以在算法上解析出一个spot有哪几种细胞类型,但是解析出来的细胞类型的数量(Nct)与每个spot中可能的细胞数(Ns)的关系如何对应起来呢?举个例子,某种技术一个spot中理论上的细胞数(Ns)1~10 ,但是用解卷积可能算出来的细胞数((Nct)大于10,如是本文的23。剩下的13个作何解释?
动动手:
查一查还有哪些解卷积的方法还没有用到空间转录组中?
References
[1]
https://github.com/MarcElosua/SPOTlight[2]
https://bioconductor.org/packages/release/bioc/vignettes/SpatialDecon/inst/doc/SpatialDecon_vignette.html[3]
https://rna-seqblog.com/tag/deconvolution/[4]
https://www.youtube.com/watch?v=rDxznlLH688[5]
https://github.com/BayraktarLab/cell2location
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注