Seurat3教程: 自定义降维方法MDS
分享是一种态度
Seurat - Dimensional Reduction Vignette
我们知道单细胞转录组数据一个主要的特点就是数据稀疏,维度较高。基于此,Seurat提供了不少降维的方法:
主要是PCA,TSNE,UMAP三种,其实降维方法何其的多:
那么,我们如果想对我们的数据应用其他降维方法,我们需要如何操作呢?今天我们就带大家走一走,Seurat对象的【multi-dimensional scaling (MDS)】降维方法。若要求原始空间中样本之间的距离在低维空间中得以保持,即得到"多维缩放" (Multiple Dimensional Scaling,简称 MDS),基于此,来探究降维的一般方法以及进一步了解Seurat的数据结构。
什么,PCA,TSNE,UMAP我还没搞明白呢?MDS是什么意思?看看运来哥上一段感情经历的笔记啊:
数量生态学笔记||非约束排序|NMDS
Seurat3 中的降维结构
在Seurat v3.0中,存储和与维度缩减信息的交互已经被一般化并正式化为DimReduc
对象。每个维度缩减过程作为一个命名列表的元素存储在object@slot
中的DimReduc
对象中。访问这些缩减可以通过[[
操作符调用所需的缩减的名称来完成。例如,在使用RunPCA
运行主成分分析之后,object[['pca']]
将包含pca的结果。通过向列表中添加新元素,用户可以添加额外的、自定义的维度缩减。每个存储的维度缩减包含以下slot:
cell.embeddings:
stores the coordinates for each cell in low-dimensional space.feature.loadings:
stores the weight for each feature along each dimension of the embeddingfeature.loadings.projected:
Seurat typically calculate the dimensional reduction on a subset of genes (for example, high-variance genes), and then project that structure onto the entire dataset (all genes). The results of that projection (calculated withProjectDim
) are stored in this slot. Note that the cell loadings will remain unchanged after projection but there are now feature loadings for all featurestdev:
The standard deviations of each dimension. Most often used with PCA (storing the square roots of the eigenvalues of the covariance matrix) and can be useful when looking at the drop off in the amount of variance that is explained by each successive dimension.key:
Sets the column names for the cell.embeddings and feature.loadings matrices. For example, for PCA, the column names are PC1, PC2, etc., so the key is “PC”.jackstraw:
Stores the results of the jackstraw procedure run using this dimensional reduction technique. Currently supported only for PCA.misc:
Bonus slot to store any other information you might want
为了访问这些插槽,我们提供了Embeddings
、Loadings
和Stdev
函数:
1library(Seurat)
2pbmc_small[["pca"]]
3
4A dimensional reduction object with key PC_
5 Number of dimensions: 19
6 Projected dimensional reduction calculated: TRUE
7 Jackstraw run: TRUE
8 Computed using assay: RNA
我们用相应的函数方法来查看一下啊
1> head(Embeddings(pbmc_small, reduction = "pca")[, 1:5]) # 细胞 PCA坐标值
2 PC_1 PC_2 PC_3 PC_4 PC_5
3ATGCCAGAACGACT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
4CATGGCCTGTGCAT -0.02602702 -0.3466795 0.6651668 0.4182900 0.5853204
5GAACCTGATGAACC -0.45650250 0.1795811 1.3175907 2.0137210 -0.4818851
6TGACTGGATTCTCA -0.81163243 -1.3795340 -1.0019320 0.1390503 -1.5982232
7AGTCAGACTGCACA -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
8TCTGATACACGTGT -0.77403708 -0.8996461 -0.2493078 0.5585948 0.4650838
9> head(Loadings(pbmc_small, reduction = "pca")[, 1:5]) # 基因在每个主成分中的loading值
10 PC_1 PC_2 PC_3 PC_4 PC_5
11PPBP 0.33832535 0.04095778 0.02926261 0.03111034 -0.090420744
12IGLL5 -0.03504289 0.05815335 -0.29906272 0.54744454 0.214603428
13VDAC3 0.11990482 -0.10994433 -0.02386025 0.06015126 -0.809207588
14CD1C -0.04690284 0.19835522 -0.35090617 -0.51112169 -0.130306281
15AKR1C3 -0.03894635 -0.42880452 0.08845847 -0.27274386 0.087791646
16PF4 0.34392057 0.02474860 -0.02519515 -0.01231411 -0.006725932
17> head(Stdev(pbmc_small, reduction = "pca")) # 标准差
18[1] 2.7868782 1.6145733 1.3162945 1.1241143 1.0347596 0.9876531
Seurat提供了RunPCA (pca)和RunTSNE (tsne),并表示了通常应用于scRNA-seq数据的降维技术。当使用这些功能时,所有插槽都会自动填充。
我们还允许用户添加单独计算的自定义维缩减技术的结果(例如,多维缩放(MDS)或零膨胀因子分析)。您所需要的只是一个矩阵,其中包含低维空间中每个单元的坐标,如下所示.
存储自定义维度缩减计算
Classical (Metric) Multidimensional Scaling
Classical multidimensional scaling (MDS) of a data matrix. Also known as principal coordinates analysis (Gower, 1966).
虽然不是作为Seurat包的一部分,但它很容易在r中运行多维缩放(MDS)。如果你有兴趣运行MDS并将输出存储在Seurat对象中:
1# Before running MDS, we first calculate a distance matrix between all pairs of cells. Here we
2# use a simple euclidean distance metric on all genes, using scale.data as input
3d <- dist(t(GetAssayData(pbmc_small, slot = "scale.data")))
4# Run the MDS procedure, k determines the number of dimensions
5mds <- cmdscale(d = d, k = 2)
6
7head(mds)
8 [,1] [,2]
9ATGCCAGAACGACT 0.77403708 -0.8996461
10CATGGCCTGTGCAT 0.02602702 -0.3466795
11GAACCTGATGAACC 0.45650250 0.1795811
12TGACTGGATTCTCA 0.81163243 -1.3795340
13AGTCAGACTGCACA 0.77403708 -0.8996461
14TCTGATACACGTGT 0.77403708 -0.8996461
1# cmdscale returns the cell embeddings, we first label the columns to ensure downstream
2# consistency
3colnames(mds) <- paste0("MDS_", 1:2)
4# We will now store this as a custom dimensional reduction called 'mds'
5pbmc_small[["mds"]] <- CreateDimReducObject(embeddings = mds, key = "MDS_", assay = DefaultAssay(pbmc_small))
6
7pbmc_small
8An object of class Seurat
9230 features across 80 samples within 1 assay
10Active assay: RNA (230 features)
11 3 dimensional reductions calculated: pca, tsne, mds
我们的对象中已经有了mds
这个slot了,下面我们像pca , tsne. umap,那样可视化它:
1# We can now use this as you would any other dimensional reduction in all downstream functions
2DimPlot(pbmc_small, reduction = "mds", pt.size = 0.5)
1pbmc_small <- ProjectDim(pbmc_small, reduction = "mds")
2MDS_ 1
3Positive: HLA-DPB1, HLA-DQA1, S100A9, S100A8, GNLY, RP11-290F20.3, CD1C, AKR1C3, IGLL5, VDAC3
4 PARVB, RUFY1, PGRMC1, MYL9, TREML1, CA2, TUBB1, PPBP, PF4, SDPR
5Negative: SDPR, PF4, PPBP, TUBB1, CA2, TREML1, MYL9, PGRMC1, RUFY1, PARVB
6 VDAC3, IGLL5, AKR1C3, CD1C, RP11-290F20.3, GNLY, S100A8, S100A9, HLA-DQA1, HLA-DPB1
7MDS_ 2
8Positive: HLA-DPB1, HLA-DQA1, S100A8, S100A9, CD1C, RP11-290F20.3, PARVB, IGLL5, MYL9, SDPR
9 PPBP, CA2, RUFY1, TREML1, PF4, TUBB1, PGRMC1, VDAC3, AKR1C3, GNLY
10Negative: GNLY, AKR1C3, VDAC3, PGRMC1, TUBB1, PF4, TREML1, RUFY1, CA2, PPBP
11 SDPR, MYL9, IGLL5, PARVB, RP11-290F20.3, CD1C, S100A9, S100A8, HLA-DQA1, HLA-DPB1
12Warning message:
13In print.DimReduc(x = redeuc, dims = dims.print, nfeatures = nfeatures.print, :
14 Only 2 dimensions have been computed.
1# Display the results as a heatmap
2DimHeatmap(pbmc_small, reduction = "mds", dims = 1, cells = 500, projected = TRUE, balanced = TRUE)
1VlnPlot(pbmc_small, features = "MDS_1")
查看MDS1维度如何与PC1维度相关性:
1# See how the first MDS dimension is correlated with the first PC dimension
2FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "PC_1")
1FeatureScatter(pbmc_small, feature1 = "MDS_1", feature2 = "tSNE_1")
References
[1]
数量生态学笔记||非约束排序|NMDS: https://www.jianshu.com/p/39021ec7d1dd[2]
Dimensional Reduction Vignette: https://links.jianshu.com/go?to=https%3A%2F%2Fsatijalab.org%2Fseurat%2Fv3.0%2Fdim_reduction_vignette.html
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
生信爆款入门-第6期(线上直播4周,马拉松式陪伴,带你入门) 你的生物信息入门课
数据挖掘学习班第4期(线上直播3周,马拉松式陪伴,带你入门) 医学生/医生首选技能提高课
生信技能树的2019年终总结 你的生物信息成长宝藏
看完记得顺手点个“在看”哦!
长按扫码可关注