Seurat新版教程:Guided Clustering Tutorial-(上)
分享是一种态度
序
Seurat - Guided Clustering Tutorial
(https://satijalab.org/seurat/v3.0/pbmc3k_tutorial.html)
Seurat,一个单细胞数据分析工具箱
十大函数
支撑这个鱼骨架的是是下面的十个函数,细心的读者也许已经发现,大师已经插上了小红旗。在Seurat v2到v3的过程中,其实是有函数名变化的,当然最主要的我认为是参数中gene到features的变化,这也看出Seurat强烈的求生欲——既然单细胞不止做转录组那我也就不能单纯地叫做gene了,所有表征单细胞的features均可以用我Seurat来分析了。另外,相对于features而言gene只不过是一个实例而已。所以在升级Seurat的时候一个关键的地方就是函数名以及参数的更改。至于新的功能和算法其实并不多,如果用不到Seurat v3的新功能(如UMAP降维)其实不升级到v3做单细胞转录组是完全没问题的。
据不完全统计Seurat包大约有130多个函数,我们有必要问号一遍吗?不必要,当你有需求的时候去查就好了,但是人类很多时候并不知道自己需要的是什么,所以我建议还是把他的函数说明手册拿出来浏览一遍,至少把目录看一遍,大概知道他能做什么。
1> library(Seurat)
2> packageVersion("Seurat")
3[ ] ‘3.0.1’
4
5pbmc.counts <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
6pbmc <- CreateSeuratObject(counts = pbmc.counts)
7pbmc <- NormalizeData(object = pbmc)
8pbmc <- FindVariableFeatures(object = pbmc)
9pbmc <- ScaleData(object = pbmc)
10pbmc <- RunPCA(object = pbmc)
11pbmc <- FindNeighbors(object = pbmc)
12pbmc <- FindClusters(object = pbmc)
13pbmc <- RunTSNE(object = pbmc)
数据读入
1library(Seurat)
2packageVersion("Seurat")
3[1] ‘3.0.1’
4# https://satijalab.org/seurat/mca.html
5library(dplyr)
6library(ggsci)#我想改变一下配色
学习一个R包当然是把他的标准流程走一遍了,下载它的官网数据放在我的电脑上。
1# Load the PBMC dataset
2list.files("D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
3
4[1] "barcodes.tsv" "genes.tsv" "matrix.mtx"
5?Read10X
6## Not run:
7# For output from CellRanger < 3.0
8data_dir <- 'path/to/data/directory'
9list.files(data_dir) # Should show barcodes.tsv, genes.tsv, and matrix.mtx
10expression_matrix <- Read10X(data.dir = data_dir)
11seurat_object = CreateSeuratObject(counts = expression_matrix)
12
13# For output from CellRanger >= 3.0 with multiple data types
14data_dir <- 'path/to/data/directory'
15list.files(data_dir) # Should show barcodes.tsv.gz, features.tsv.gz, and matrix.mtx.gz
16data <- Read10X(data.dir = data_dir)
17seurat_object = CreateSeuratObject(counts = data$`Gene Expression`)
18seurat_object[['Protein']] = CreateAssayObject(counts = data$`Antibody Capture`)
也就是说Seurat可以直接用Read10X函数读取cellrangerV2 和V3的数据了,省去了不少麻烦是不是?
创建Seurat对象。
1pbmc.data <- Read10X(data.dir = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19")
2# Initialize the Seurat object with the raw (non-normalized data).
3?CreateSeuratObject
4pbmc <- CreateSeuratObject(counts = pbmc.data, project = "pbmc3k", min.cells = 3, min.features = 200)
5> pbmc
6
7An object of class Seurat
813714 features across 2700 samples within 1 assay
9Active assay: RNA (13714 features
创建完其实是有两个S4类数据集的,一个是Seurat对象一个是bgCMatrix,要知道每个对象中存储的是什么数据,可以通过@或者\ $来提取查看:
线粒体细胞和红细胞比例。
1# The [[ operator can add columns to object metadata. This is a great place to stash QC stats
2pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")
3?PercentageFeatureSet
4HB.genes_total <- c("HBA1","HBA2","HBB","HBD","HBE1","HBG1","HBG2","HBM","HBQ1","HBZ") # 人类血液常见红细胞基因
5HB_m <- match(HB.genes_total,rownames(pbmc@assays$RNA))
6
7HB.genes <- rownames(pbmc@assays$RNA)[HB_m]
8HB.genes <- HB.genes[!is.na(HB.genes)]
9pbmc[["percent.HB"]]<-PercentageFeatureSet(pbmc,features=HB.genes)
10
11# write nGene_nUMI_mito_HB
12head(pbmc@meta.data)[,c(2,3,4,5)]
13
14 nCount_RNA nFeature_RNA percent.mt percent.HB
15AAACATACAACCAC 2419 779 3.0177759 0
16AAACATTGAGCTAC 4903 1352 3.7935958 0
17AAACATTGATCAGC 3147 1129 0.8897363 0
18AAACCGTGCTTCCG 2639 960 1.7430845 0
19AAACCGTGTATGCG 980 521 1.2244898 0
20AAACGCACTGGTAC 2163 781 1.6643551 0
Feature、count、线粒体基因、红细胞基因占比可视化。
1# Visualize QC metrics as a violin plot
2VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"), ncol = 4)#+scale_color_npg() 不起作用
3#p1_aaas = p1 + scale_color_aaas()
4 #p2_aaas = p2 + scale_fill_aaas()
注意到没有,之前v2的nUMI和nGene分别改为了nCount_RNA、nFeature_RNA。
这几个指标之间的相关性。
1# FeatureScatter is typically used to visualize feature-feature relationships, but can be used
2# for anything calculated by the object, i.e. columns in object metadata, PC scores etc.
3?FeatureScatter
4plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")+scale_color_npg()
5plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")+scale_color_npg()
6plot3 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.HB")+scale_color_npg()
7CombinePlots(plots = list(plot1, plot2,plot3),legend="none")
8?CombinePlots
CombinePlots函数是不是很好用呢,亲测ggplot2绘制的图即使不是Seurat绘图函数绘制出来的图也是可以用这个函数来组合的。在简单的探索之后,我们来做一下实质性的QC和均一化工作。
均一化与标准化
1pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) #
2
3pbmc <- NormalizeData(pbmc, normalization.method = "LogNormalize", scale.factor = 10000)
4
5Performing log-normalization
60% 10 20 30 40 50 60 70 80 90 100%
7[----|----|----|----|----|----|----|----|----|----|
8**************************************************|
9
注意这里直接用了subset函数,而不是之前的SubsetData。
特征选择:高变基因
1#Identification of highly variable features (feature selection)
2pbmc <- FindVariableFeatures(pbmc, selection.method = "vst", nfeatures = 2000)
3Calculating gene variances
40% 10 20 30 40 50 60 70 80 90 100%
5[----|----|----|----|----|----|----|----|----|----|
6**************************************************|
7Calculating feature variances of standardized and clipped values
80% 10 20 30 40 50 60 70 80 90 100%
9[----|----|----|----|----|----|----|----|----|----|
10**************************************************|
11
12# Identify the 10 most highly variable genes
13top10 <- head(VariableFeatures(pbmc), 10)
14# plot variable features with and without labels
15plot1 <- VariableFeaturePlot(pbmc)
16plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
17CombinePlots(plots = list(plot1, plot2),legend="bottom")
标准化
1# Scaling the data
2all.genes <- rownames(pbmc)
3pbmc <- ScaleData(pbmc, features = all.genes)
4Centering and scaling data matrix
5 |====================================================================================================================================| 100%
6>
特征提取:PCA降维
1
2pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))
3PC_ 1
4Positive: CST3, TYROBP, LST1, AIF1, FTL, FTH1, LYZ, FCN1, S100A9, TYMP
5 FCER1G, CFD, LGALS1, S100A8, CTSS, LGALS2, SERPINA1, IFITM3, SPI1, CFP
6 PSAP, IFI30, SAT1, COTL1, S100A11, NPC2, GRN, LGALS3, GSTP1, PYCARD
7Negative: MALAT1, LTB, IL32, IL7R, CD2, B2M, ACAP1, CD27, STK17A, CTSW
8 CD247, GIMAP5, AQP3, CCL5, SELL, TRAF3IP3, GZMA, MAL, CST7, ITM2A
9 MYC, GIMAP7, HOPX, BEX2, LDLRAP1, GZMK, ETS1, ZAP70, TNFAIP8, RIC3
10PC_ 2
11Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1, HLA-DRA, LINC00926, CD79B, HLA-DRB1, CD74
12 HLA-DMA, HLA-DPB1, HLA-DQA2, CD37, HLA-DRB5, HLA-DMB, HLA-DPA1, FCRLA, HVCN1, LTB
13 BLNK, P2RX5, IGLL5, IRF8, SWAP70, ARHGAP24, FCGR2B, SMIM14, PPP1R14A, C16orf74
14Negative: NKG7, PRF1, CST7, GZMB, GZMA, FGFBP2, CTSW, GNLY, B2M, SPON2
15 CCL4, GZMH, FCGR3A, CCL5, CD247, XCL2, CLIC3, AKR1C3, SRGN, HOPX
16 TTC38, APMAP, CTSC, S100A4, IGFBP7, ANXA1, ID2, IL32, XCL1, RHOC
17PC_ 3
18Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1, HLA-DPA1, CD74, MS4A1, HLA-DRB1, HLA-DRA
19 HLA-DRB5, HLA-DQA2, TCL1A, LINC00926, HLA-DMB, HLA-DMA, CD37, HVCN1, FCRLA, IRF8
20 PLAC8, BLNK, MALAT1, SMIM14, PLD4, LAT2, IGLL5, P2RX5, SWAP70, FCGR2B
21Negative: PPBP, PF4, SDPR, SPARC, GNG11, NRGN, GP9, RGS18, TUBB1, CLU
22 HIST1H2AC, AP001189.4, ITGA2B, CD9, TMEM40, PTCRA, CA2, ACRBP, MMD, TREML1
23 NGFRAP1, F13A1, SEPT5, RUFY1, TSC22D1, MPP1, CMTM5, RP11-367G6.3, MYL9, GP1BA
24PC_ 4
25Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1, CD74, HLA-DPB1, HIST1H2AC, PF4, TCL1A
26 SDPR, HLA-DPA1, HLA-DRB1, HLA-DQA2, HLA-DRA, PPBP, LINC00926, GNG11, HLA-DRB5, SPARC
27 GP9, AP001189.4, CA2, PTCRA, CD9, NRGN, RGS18, GZMB, CLU, TUBB1
28Negative: VIM, IL7R, S100A6, IL32, S100A8, S100A4, GIMAP7, S100A10, S100A9, MAL
29 AQP3, CD2, CD14, FYB, LGALS2, GIMAP4, ANXA1, CD27, FCN1, RBP7
30 LYZ, S100A11, GIMAP5, MS4A6A, S100A12, FOLR3, TRABD2A, AIF1, IL8, IFI6
31PC_ 5
32Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY, CCL4, CST7, PRF1, GZMA, SPON2
33 GZMH, S100A9, LGALS2, CCL3, CTSW, XCL2, CD14, CLIC3, S100A12, CCL5
34 RBP7, MS4A6A, GSTP1, FOLR3, IGFBP7, TYROBP, TTC38, AKR1C3, XCL1, HOPX
35Negative: LTB, IL7R, CKB, VIM, MS4A7, AQP3, CYTIP, RP11-290F20.3, SIGLEC10, HMOX1
36 PTGES3, LILRB2, MAL, CD27, HN1, CD2, GDI2, ANXA5, CORO1B, TUBA1B
37 FAM110A, ATP1A1, TRADD, PPA1, CCDC109B, ABRACL, CTD-2006K23.1, WARS, VMO1, FYB
让我们看一下PCA都有哪些结果。
每个细胞在PC轴上的坐标
1head(pbmc@reductions$pca@cell.embeddings)
2 PC_1 PC_2 PC_3 PC_4 PC_5 PC_6 PC_7 PC_8 PC_9 PC_10 PC_11
3AAACATACAACCAC -4.7296855 -0.5184265 -0.7623220 -2.3156790 -0.07160006 0.1317334 1.6301191 -1.1830836 1.2492647 1.9031544 -0.6730318
4AAACATTGAGCTAC -0.5174029 4.5918957 5.9091921 6.9118856 -1.96243034 2.7866168 1.5096526 -0.3590951 -0.8247207 0.5923264 0.5552679
5AAACATTGATCAGC -3.1891063 -3.4695154 -0.8313710 -2.0019985 -5.10442765 2.1216390 0.3480312 3.7184168 -1.0047037 1.1045846 2.1435963
6AAACCGTGCTTCCG 12.7933021 0.1007166 0.6310221 -0.3687338 0.21838204 -2.8403815 0.8120182 0.1296084 -0.6425828 -3.4380616 1.9194359
7AAACCGTGTATGCG -3.1288078 -6.3481412 1.2507776 3.0191026 7.84739502 -1.3017150 -2.3968660 -0.4419815 -2.9410710 -1.1924961 3.5800224
8AAACGCACTGGTAC -3.1088963 0.9262125 -0.6482331 -2.3244378 -2.00526763 1.4851268 0.2660897 -0.4260322 -0.2045553 -1.5314854 1.3273948
9 PC_12 PC_13 PC_14 PC_15 PC_16 PC_17 PC_18 PC_19 PC_20 PC_21 PC_22
10AAACATACAACCAC 3.6826462 0.8619755 -0.9450621 0.44368696 -0.09859108 -1.278487 1.3471705 0.46616434 -3.32957838 0.6047326 -1.28799604
11AAACATTGAGCTAC 0.7339369 -2.2206068 2.8265252 0.05269715 -3.02766520 1.475587 2.4833998 0.94279874 -1.31018534 2.8866347 3.14488549
12AAACATTGATCAGC 2.4368587 0.4965275 0.2784826 1.02114017 -0.82800163 1.149522 -0.6589667 1.97244531 0.46454913 -1.1989118 -4.00419302
13AAACCGTGCTTCCG 0.5003049 0.6798730 -2.1597502 0.12053353 -1.27353626 -0.175932 2.6215382 -0.23445861 0.01131591 1.0903436 0.03797187
14AAACCGTGTATGCG -0.6615918 -0.8076382 -0.9672315 0.74307188 2.00702431 2.270533 -2.7461609 0.06917022 -0.06584539 -0.7148572 2.84307560
15AAACGCACTGGTAC -1.4131061 -0.7284717 1.0847490 0.69347069 1.50328633 2.899008 -1.7733431 -2.17298340 -0.93092390 1.4390144 -1.37939596
16 PC_23 PC_24 PC_25 PC_26 PC_27 PC_28 PC_29 PC_30 PC_31 PC_32 PC_33
17AAACATACAACCAC -0.7179464 -1.4319395 0.83214783 -0.1679234 -0.7161114 1.84440151 -3.05781318 -1.6074727 -2.1517497 0.8687982 -0.41061522
18AAACATTGAGCTAC -1.8457533 -1.3159421 -0.08628477 1.0921087 1.8536483 -0.85897429 -1.54691012 -2.0578432 -0.3853023 1.2088837 0.38597220
19AAACATTGATCAGC 0.8930977 0.1537439 2.16359773 0.3598690 -2.7626285 -2.11701966 -2.29029854 -0.6499149 -1.8554231 -2.4971457 0.33549022
20AAACCGTGCTTCCG 0.8687202 -2.4761498 0.67236429 -0.5985153 -0.1156671 -1.43199736 -1.55234852 -1.8298758 1.2738518 -0.6358930 -0.53630017
21AAACCGTGTATGCG -0.3910890 -0.4052759 -1.84733460 -0.5151174 0.5780494 -0.99341983 0.06359586 -2.1070341 -3.1590629 -0.5086174 0.02705711
22AAACGCACTGGTAC -1.5354609 -0.1891822 -1.39375039 -0.2486056 0.6756537 0.06547818 -1.61450533 1.2957103 -2.4891537 -0.3063821 -1.12912898
23 PC_34 PC_35 PC_36 PC_37 PC_38 PC_39 PC_40 PC_41 PC_42 PC_43 PC_44
24AAACATACAACCAC -1.2894803 -1.3787980 0.6540711 0.5840483 -2.3962943 -0.08226692 -0.7486769 -2.1045346 0.21276227 -1.5445694 -0.09575044
25AAACATTGAGCTAC -1.1530093 0.3536307 0.4779224 -0.8652969 0.7381044 -0.38738971 0.7657008 -1.2475574 -0.03955515 1.9097925 -2.12198023
26AAACATTGATCAGC 1.4541669 1.3478909 -0.2352528 -3.1292462 0.2888443 0.10283533 -5.2083218 -0.8155861 -0.54565799 0.4010832 -1.21380173
27AAACCGTGCTTCCG 0.8386651 1.2911685 1.2571691 -0.5025198 1.3793049 2.06967998 -2.2411401 -1.9964884 -1.13684041 1.9122370 0.66098397
28AAACCGTGTATGCG -1.6797643 -1.1789580 -1.8789147 1.3673211 1.0768095 1.59383195 -1.5163564 0.9922202 1.59552318 -0.7380092 -0.19908409
29AAACGCACTGGTAC -0.1862429 -3.8669912 0.4143763 0.9287853 1.3974358 2.55985354 -3.5595247 1.6952044 1.62933600 1.5996360 -0.16145240
30 PC_45 PC_46 PC_47 PC_48 PC_49 PC_50
31AAACATACAACCAC -1.5215216 1.2413559 -1.6210075 -0.4655044 1.2091308 -1.8230275
32AAACATTGAGCTAC -0.3750793 0.3240155 -0.3722397 1.3962012 -0.2224967 -0.9573797
33AAACATTGATCAGC 0.7932551 2.8606879 -1.1490237 2.1686602 -1.0267464 -1.7466499
34AAACCGTGCTTCCG -1.4980962 -2.3318838 -0.1933483 -0.7501199 0.3104423 -1.3381283
35AAACCGTGTATGCG 0.6442588 -1.1071806 0.7092294 2.4249639 1.1466671 -1.3304418
36AAACGCACTGGTAC 1.6184979 2.1078466 -2.2183316 -2.3865239 3.8680262 2.6023484
每个基因对每个PC轴的贡献度(loading值)
1head(pbmc@reductions$pca@feature.loadings)
2 PC_1 PC_2 PC_3 PC_4 PC_5 PC_6 PC_7 PC_8 PC_9 PC_10
3PPBP 0.010990202 0.01148426 -0.15176092 0.10403737 0.003299077 0.005342462 0.021318113 -0.008489070 -0.044269116 0.001400899
4LYZ 0.116231706 0.01472515 -0.01280613 -0.04414540 0.049906881 0.065502380 -0.013871113 0.006693022 0.002358130 0.016020633
5S100A9 0.115414362 0.01895146 -0.02368853 -0.05787777 0.085382309 0.045792197 -0.003264718 0.063778055 -0.017075914 0.029052363
6IGLL5 -0.007987473 0.05454239 0.04901533 0.06694722 0.004603231 0.006223648 0.015035064 0.045562328 -0.014507167 0.023561387
7GNLY -0.015238762 -0.13375626 0.04101340 0.06912322 0.104558611 -0.001310612 -0.082875350 0.015749055 0.035273464 0.041223421
8FTL 0.118292572 0.01871142 -0.00984755 -0.01555269 0.038743505 -0.046662904 0.009732039 0.024969600 -0.003831722 0.019832614
9 PC_11 PC_12 PC_13 PC_14 PC_15 PC_16 PC_17 PC_18 PC_19 PC_20
10PPBP 0.044863274 0.03016378 0.047134363 0.029237266 -0.0007364288 -0.0203226407 -0.039660967 -0.015647566 0.005726442 0.0094676092
11LYZ -0.006755543 -0.01058222 0.010219384 -0.003357793 -0.0047166530 0.0021186397 -0.005522048 -0.014290198 -0.001907910 -0.0102060486
12S100A9 -0.011657440 -0.01258808 -0.013741596 0.004428299 -0.0055191933 0.0008913828 -0.002733916 -0.002231082 0.010547068 -0.0008216471
13IGLL5 0.015524704 0.01130811 -0.007237628 -0.012830423 0.0041744575 0.0335608964 -0.021776281 -0.010757306 0.022308563 0.0032278043
14GNLY 0.048009355 0.06699642 0.010345031 -0.011239154 -0.0098440849 0.0203211722 0.020958052 0.007021271 0.002015113 -0.0159543322
15FTL -0.001943726 0.01027191 -0.008383042 -0.003565683 0.0114076594 -0.0001738238 0.002335748 0.005825073 -0.001105492 -0.0088204939
16 PC_21 PC_22 PC_23 PC_24 PC_25 PC_26 PC_27 PC_28 PC_29 PC_30
17PPBP 0.0029964810 -0.018582366 -0.0151428472 -0.011897098 0.0151225125 -0.001076671 0.0053611374 -0.000841606 0.0006853714 0.005872398
18LYZ 0.0001300122 -0.004764486 -0.0002154396 0.002548603 -0.0004441571 0.003031771 0.0113930218 -0.001396761 -0.0103970973 -0.008782445
19S100A9 0.0009857512 -0.001117424 -0.0008416341 -0.001567696 -0.0048855904 -0.001991376 -0.0054592020 -0.002967175 0.0059151260 0.011957343
20IGLL5 -0.0351949335 -0.009983501 0.0241007907 -0.034169906 -0.0407579048 -0.019900995 -0.0208135294 -0.020108972 -0.0162282086 -0.002920229
21GNLY -0.0054222188 0.009659472 0.0022483863 0.000215567 0.0048624598 0.014696711 0.0005396567 0.015794414 0.0092743862 -0.012203335
22FTL 0.0047372372 -0.005878347 -0.0023076679 -0.015293995 0.0054536363 -0.001035276 0.0149072047 0.005597638 0.0063626914 0.009166101
23 PC_31 PC_32 PC_33 PC_34 PC_35 PC_36 PC_37 PC_38 PC_39 PC_40
24PPBP 0.013086886 0.0008607900 0.012128703 -0.0009461223 -1.922878e-02 0.004541394 0.011542981 0.011429708 -1.217387e-05 -0.007032663
25LYZ 0.001540172 -0.0003404526 0.001920834 -0.0010591244 2.964347e-05 0.005862608 0.008998625 -0.009836389 -5.602974e-03 -0.009227851
26S100A9 0.009530345 -0.0151782583 0.003747623 -0.0064760799 2.406278e-03 -0.002071706 0.006369340 -0.016186976 -1.076186e-02 -0.002632939
27IGLL5 0.007157640 -0.0004105958 -0.014691475 0.0436436450 3.019391e-03 -0.009648881 -0.014490356 0.012896714 3.714680e-03 -0.038642695
28GNLY -0.018267660 -0.0065138955 0.007815716 -0.0048936380 9.212739e-04 -0.018786034 0.002183408 -0.012800079 -1.574655e-02 0.001831823
29FTL 0.008757715 -0.0110614625 -0.011354104 0.0097377813 5.344130e-03 0.010270291 -0.002307617 -0.007765635 1.552409e-03 0.003765866
30 PC_41 PC_42 PC_43 PC_44 PC_45 PC_46 PC_47 PC_48 PC_49 PC_50
31PPBP -0.0008612626 -0.011266674 0.015925649 0.012209582 0.0132620332 -0.0168464149 -0.002926878 0.0166900128 -3.071535e-05 -0.006571050
32LYZ -0.0139353433 0.015835626 0.008829158 0.009923354 -0.0005053747 0.0005763965 -0.003656557 -0.0060199063 2.377456e-03 -0.002022636
33S100A9 0.0047445877 0.011376942 0.001033356 -0.015291229 0.0001170734 -0.0011615944 -0.001255734 -0.0018189781 1.341860e-02 -0.004627139
34IGLL5 -0.0231096888 -0.001424071 0.009561528 -0.026310314 0.0240009713 -0.0026671000 -0.001782701 -0.0188853751 1.604000e-03 -0.026632961
35GNLY 0.0012187715 -0.016853940 0.002993127 0.023204504 -0.0110674164 -0.0240057148 -0.021747836 -0.0006874352 9.393104e-03 -0.002300090
36FTL 0.0025105942 0.017913309 -0.002983271 -0.004176980 -0.0131060686 0.0001666315 0.006107927 -0.0076046483 7.912781e-03 0.008658740
···
head(pbmc@reductionspca@assay.used) [1] "RNA" head(pbmc@reductionspca@stdev)
[1] 7.098420 4.495493 3.872592 3.748859 3.171755 2.545292
head(pbmc@reductions$pca@key)
[1] "PC_"
···
对Loading值一番研究。
1 # Get the feature loadings for a given DimReduc
2 t(Loadings(object = pbmc[["pca"]])[1:5,1:5])
3 PPBP LYZ S100A9 IGLL5 GNLY
4PC_1 0.010990202 0.11623171 0.11541436 -0.007987473 -0.01523876
5PC_2 0.011484260 0.01472515 0.01895146 0.054542385 -0.13375626
6PC_3 -0.151760919 -0.01280613 -0.02368853 0.049015330 0.04101340
7PC_4 0.104037367 -0.04414540 -0.05787777 0.066947217 0.06912322
8PC_5 0.003299077 0.04990688 0.08538231 0.004603231 0.10455861
9 # Get the feature loadings for a specified DimReduc in a Seurat object
10 t(Loadings(object = pbmc, reduction = "pca") [1:5,1:5])
11 PPBP LYZ S100A9 IGLL5 GNLY
12PC_1 0.010990202 0.11623171 0.11541436 -0.007987473 -0.01523876
13PC_2 0.011484260 0.01472515 0.01895146 0.054542385 -0.13375626
14PC_3 -0.151760919 -0.01280613 -0.02368853 0.049015330 0.04101340
15PC_4 0.104037367 -0.04414540 -0.05787777 0.066947217 0.06912322
16PC_5 0.003299077 0.04990688 0.08538231 0.004603231 0.10455861
17 # Set the feature loadings for a given DimReduc
18 new.loadings <- Loadings(object = pbmc[["pca"]])
19 new.loadings <- new.loadings + 0.01
20 Loadings(object = pbmc[["pca"]]) <- new.loadings
21 VizDimLoadings(pbmc)
1# Examine and visualize PCA results a few different ways
2print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)
3PC_ 1
4Positive: CST3, TYROBP, LST1, AIF1, FTL
5Negative: MALAT1, LTB, IL32, IL7R, CD2
6PC_ 2
7Positive: CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1
8Negative: NKG7, PRF1, CST7, GZMB, GZMA
9PC_ 3
10Positive: HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1
11Negative: PPBP, PF4, SDPR, SPARC, GNG11
12PC_ 4
13Positive: HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1
14Negative: VIM, IL7R, S100A6, IL32, S100A8
15PC_ 5
16Positive: GZMB, NKG7, S100A8, FGFBP2, GNLY
17Negative: LTB, IL7R, CKB, VIM, MS4A7
1DimPlot(pbmc, reduction = "pca")
1
2DimHeatmap(pbmc, dims = 1:15, cells = 500, balanced = TRUE)
选择多少个维度进行下一阶段的分析呢?基于PAC有以下几种方法可以探索。
1# Determine the ‘dimensionality’ of the dataset
2
3# NOTE: This process can take a long time for big datasets, comment out for expediency. More
4# approximate techniques such as those implemented in ElbowPlot() can be used to reduce
5# computation time
6pbmc <- JackStraw(pbmc, num.replicate = 100)
7pbmc <- ScoreJackStraw(pbmc, dims = 1:20)
8plot1<-JackStrawPlot(pbmc, dims = 1:15)
9plot2<-ElbowPlot(pbmc)
10CombinePlots(plots = list(plot1, plot2),legend="bottom")
可以看出大概在PC为十的时候,每个轴是有区分意义的。
每个轴显著相关的基因,这个也可以作为后面分析选择基因的一个参考。
1#Returns a set of genes, based on the JackStraw analysis, that have statistically significant associations with a set of PCs.
2?PCASigGenes
3head(PCASigGenes(pbmc,pcs.use=2,pval.cut = 0.7))
4[1] "PPBP" "LYZ" "S100A9" "IGLL5" "GNLY" "FTL"
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注