Monocle2 踩坑教程(2)
分享是一种态度
回顾
差异分析
差异基因表达分析是RNA-Seq实验中的一项常见任务。Monocle可以帮助你找到不同细胞群间差异表达的基因,并评估这些变化的统计显著性。这些比较要求您有一种方法将细胞收集到两个或更多组中。这些组由每个CellDataSet的表现型数据表中的列定义。Monocle将评估不同细胞群中每个基因表达水平的显著性。
1marker_genes <- row.names(subset(fData(HSMM),
2 gene_short_name %in% c("MEF2C", "MEF2D", "MYF5",
3 "ANPEP", "PDGFRA","MYOG",
4 "TPM1", "TPM2", "MYH2",
5 "MYH3", "NCAM1", "TNNT1",
6 "TNNT2", "TNNC1", "CDK1",
7 "CDK2", "CCNB1", "CCNB2",
8 "CCND1", "CCNA1", "ID1")))
9
10diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
11 fullModelFormulaStr = "~percent.mt")
12> head(diff_test_res)
13 status family pval qval gene_short_name num_cells_expressed use_for_ordering
14MEF2D OK negbinomial.size 0.10544931 0.4590520 MEF2D 1 FALSE
15CCNB1 OK negbinomial.size 0.45277970 0.7043240 CCNB1 1 FALSE
16MEF2C OK negbinomial.size 0.28733226 0.5028315 MEF2C 2 FALSE
17TPM2 OK negbinomial.size 0.94348541 0.9696949 TPM2 0 FALSE
18CDK1 OK negbinomial.size 0.12768261 0.4590520 CDK1 0 FALSE
19CCND1 OK negbinomial.size 0.04021331 0.4590520 CCND1 0 FALSE
20
21MYOG_ID1 <- HSMM[row.names(subset(fData(HSMM),
22 gene_short_name %in% c("CCND1", "CCNB2"))),]
23plot_genes_jitter(MYOG_ID1, grouping = "seurat_clusters", ncol= 2)
选择对区分细胞类型有意义的基因(Finding Genes that Distinguish Cell Type or State )
1
2to_be_tested <- row.names(subset(fData(HSMM),
3 gene_short_name %in% c("CDC20", "NCAM1", "ANPEP")))
4cds_subset <- HSMM[to_be_tested,]
5
6diff_test_res <- differentialGeneTest(cds_subset,
7 fullModelFormulaStr = "~CellType")
8diff_test_res[,c("gene_short_name", "pval", "qval")]
9
10plot_genes_jitter(cds_subset,
11 grouping = "CellType",
12 color_by = "CellType",
13 nrow= 1,
14 ncol = NULL,
15 plot_trend = TRUE)
1full_model_fits <-
2 fitModel(cds_subset, modelFormulaStr = "~CellType")
3reduced_model_fits <- fitModel(cds_subset, modelFormulaStr = "~1")
4diff_test_res <- compareModels(full_model_fits, reduced_model_fits)
5diff_test_res
6
7 status family pval qval
8CDC20 OK negbinomial.size 0.007822883 0.02346865
9NCAM1 OK negbinomial.size 0.791131484 0.88906005
10ANPEP OK negbinomial.size 0.889060052 0.88906005
Monocle中的差异分析过程非常灵活:您在测试中使用的模型公式可以包含pData表中作为列存在的任何项,包括Monocle在其他分析步骤中添加的列。例如,如果您使用集群细胞,您可以使用集群作为模型公式来测试cluster之间不同的基因。
Finding Genes that Change as a Function of Pseudotime
Monocle的主要工作是将细胞按照生物过程(如细胞分化)的顺序排列,而不事先知道要查看哪些基因。一旦这样做了,你就可以分析细胞,找到随着细胞进步而变化的基因。例如,你可以发现当细胞“成熟”时,基因显著上调。
1to_be_tested <- row.names(subset(fData(HSMM),
2 gene_short_name %in% c("MYH3", "MEF2C", "CCNB2", "TNNT1")))
3cds_subset <- HSMM[to_be_tested,]
4
5diff_test_res <- differentialGeneTest(cds_subset,
6 fullModelFormulaStr = "~sm.ns(Pseudotime)")
7
8diff_test_res[,c("gene_short_name", "pval", "qval")]
9
10 gene_short_name pval qval
11MEF2C MEF2C 2.996841e-02 3.995789e-02
12CCNB2 CCNB2 5.328721e-03 1.065744e-02
13MYH3 MYH3 6.371156e-01 6.371156e-01
14TNNT1 TNNT1 1.634704e-12 6.538818e-12
15
16plot_genes_in_pseudotime(cds_subset, color_by = "seurat_clusters")
Clustering Genes by Pseudotemporal Expression Pattern
在研究时间序列基因表达研究时,一个常见的问题是:“哪些基因遵循相似的动力学趋势?”Monocle可以通过将具有相似趋势的基因分组来帮助你回答这个问题,这样你就可以分析这些基因组,看看它们有什么共同之处。Monocle提供了一种方便的方法来可视化所有伪时间相关的基因。函数plot_pseudotime_heatmap接受一个CellDataSet对象(通常只包含重要基因的子集),并生成与plot_genes_in_pseudotime类似的平滑表达曲线.然后,它将这些基因聚类并使用pheatmap包绘制它们。这允许您可视化基因模块,这些模块在伪时间内共同变化。
注意下面的num_clusters 指的是基因可以聚成几个类,而不是细胞。
1diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
2 fullModelFormulaStr = "~sm.ns(Pseudotime)")
3sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
4plot_pseudotime_heatmap(HSMM[sig_gene_names,],
5 num_clusters = 6,
6 cores = 1,
7 show_rownames = T)
8
Multi-Factorial Differential Expression Analysis
Monocle可以在多个因素存在的情况下进行差异分析,这可以帮助你减去一些因素来看到其他因素的影响。在下面的简单例子中,Monocle测试了三个基因在成肌细胞和成纤维细胞之间的差异表达,同时减去percent.mt的影响。为此,我们必须同时指定完整模型和简化模型。完整的模型同时捕捉细胞类型和percent.mt的影响。
1to_be_tested <-
2 row.names(subset(fData(HSMM),
3 gene_short_name %in% c("TPM1", "MYH3", "CCNB2", "GAPDH")))
4
5cds_subset <- HSMM[to_be_tested,]
6
7diff_test_res <- differentialGeneTest(cds_subset,
8 fullModelFormulaStr = "~CellType + percent.mt",
9 reducedModelFormulaStr = "~percent.mt")
10diff_test_res[,c("gene_short_name", "pval", "qval")]
11
12 gene_short_name pval qval
13GAPDH GAPDH 0.07990737 0.1598147
14CCNB2 CCNB2 0.04148172 0.1598147
15TPM1 TPM1 0.90861287 0.9086129
16MYH3 MYH3 0.77085745 0.9086129
17
18plot_genes_jitter(cds_subset,
19 grouping = "seurat_clusters", color_by = "CellType", plot_trend = TRUE) +
20 facet_wrap( ~ feature_label, scales= "free_y")
Analyzing Branches in Single-Cell Trajectories
通常,单细胞的轨迹包括分支。分支的产生是因为细胞执行替代基因表达程序。在发育过程中,当细胞做出命运的选择时,分支就会出现在轨迹中:一个发育谱系沿着一条路径前进,而另一个谱系产生第二条路径。Monocle包含用于分析这些分支事件的广泛功能。
芭芭拉·特雷特琳和她的同事们在史蒂夫·奎克的实验室里进行了一项实验,他们从正在发育的老鼠肺中获取细胞。他们在细胞发育的早期捕获细胞,之后当肺包含两种主要类型的上皮细胞(AT1和AT2),以及即将决定成为AT1或AT2的细胞时。Monocle可以将这个过程重构为一个分支轨迹,让你可以非常详细地分析决策点。下图显示了使用Monocle的一些数据重建的一般方法。有一个单独的分支,标记为“1”。当细胞从树的左上方通过树枝的早期发育阶段通过时,哪些基因发生了变化?哪些基因在分支间有差异表达?为了回答这个问题,Monocle为您提供了一个特殊的统计测试:分支表达式分析建模,或BEAM。
1lung <- load_lung()
2plot_cell_trajectory(lung, color_by = "Time")
3
4# 图如下
5
6BEAM_res <- BEAM(lung, branch_point = 1, cores = 1)
7BEAM_res <- BEAM_res[order(BEAM_res$qval),]
8BEAM_res <- BEAM_res[,c("gene_short_name", "pval", "qval")]
使用一种特殊类型的热图,您可以可视化所有明显依赖于分支的基因的变化。这张热图同时显示了两种血统的变化。它还要求您选择要检查的分支点。列是伪时间中的点,行是基因,伪时间的开始在热图的中间。当你从热图的中间读到右边的时候,你正在跟随一个伪时间谱系。当你读到左边时,另一个。这些基因是分层聚类的,因此您可以可视化具有类似的依赖于序列的基因模块.
1plot_genes_branched_heatmap(lung[row.names(subset(BEAM_res,
2 qval < 1e-4)),],
3 branch_point = 1,
4 num_clusters = 4,
5 cores = 1,
6 use_gene_short_name = T,
7 show_rownames = T)
8
1lung_genes <- row.names(subset(fData(lung),
2 gene_short_name %in% c("Ccnd2", "Sftpb", "Pdpn")))
3plot_genes_branched_pseudotime(lung[lung_genes,],
4 branch_point = 1,
5 color_by = "Time",
6 ncol = 1)
1> str(HSMM)
2Formal class 'CellDataSet' [package "monocle"] with 19 slots
3 ..@ reducedDimS : num [1:2, 1:2638] -2.613 -0.652 -1.784 0.745 -2.376 ...
4 .. ..- attr(*, "dimnames")=List of 2
5 .. .. ..$ : NULL
6 .. .. ..$ : chr [1:2638] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" ...
7 ..@ reducedDimW : num [1:281, 1:2] -0.0384 0.0812 -0.0465 0.0681 0.0655 ...
8 ..@ reducedDimA : num [1:2, 1:2638] 13.57 29.4 -2.74 -25.16 -9.3 ...
9 ..@ reducedDimK : num [1:2, 1:126] -2.305 -0.692 -1.711 0.538 -1.935 ...
10 .. ..- attr(*, "dimnames")=List of 2
11 .. .. ..$ : NULL
12 .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
13 ..@ minSpanningTree :List of 10
14 .. ..$ :List of 1
15 .. .. ..$ Y_1: 'igraph.vs' Named int [1:2] 47 101
16 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_47" "Y_101"
17 .. .. .. ..- attr(*, "env")=<weakref>
18 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
19 .. ..$ :List of 1
20 .. .. ..$ Y_2: 'igraph.vs' Named int [1:2] 62 90
21 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_62" "Y_90"
22 .. .. .. ..- attr(*, "env")=<weakref>
23 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
24 .. ..$ :List of 1
25 .. .. ..$ Y_3: 'igraph.vs' Named int [1:2] 90 126
26 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_90" "Y_126"
27 .. .. .. ..- attr(*, "env")=<weakref>
28 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
29 .. ..$ :List of 1
30 .. .. ..$ Y_4: 'igraph.vs' Named int 106
31 .. .. .. ..- attr(*, "names")= chr "Y_106"
32 .. .. .. ..- attr(*, "env")=<weakref>
33 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
34 .. ..$ :List of 1
35 .. .. ..$ Y_5: 'igraph.vs' Named int [1:2] 36 65
36 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_36" "Y_65"
37 .. .. .. ..- attr(*, "env")=<weakref>
38 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
39 .. ..$ :List of 1
40 .. .. ..$ Y_6: 'igraph.vs' Named int [1:2] 39 53
41 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_39" "Y_53"
42 .. .. .. ..- attr(*, "env")=<weakref>
43 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
44 .. ..$ :List of 1
45 .. .. ..$ Y_7: 'igraph.vs' Named int [1:2] 34 98
46 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_34" "Y_98"
47 .. .. .. ..- attr(*, "env")=<weakref>
48 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
49 .. ..$ :List of 1
50 .. .. ..$ Y_8: 'igraph.vs' Named int [1:2] 49 126
51 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_49" "Y_126"
52 .. .. .. ..- attr(*, "env")=<weakref>
53 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
54 .. ..$ :List of 1
55 .. .. ..$ Y_9: 'igraph.vs' Named int [1:2] 42 93
56 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_42" "Y_93"
57 .. .. .. ..- attr(*, "env")=<weakref>
58 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
59 .. ..$ :List of 1
60 .. .. ..$ Y_10: 'igraph.vs' Named int [1:2] 11 25
61 .. .. .. ..- attr(*, "names")= chr [1:2] "Y_11" "Y_25"
62 .. .. .. ..- attr(*, "env")=<weakref>
63 .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
64 .. ..- attr(*, "class")= chr "igraph"
65 ..@ cellPairwiseDistances: num [1:126, 1:126] 0 1.37 1.41 10.62 5.52 ...
66 .. ..- attr(*, "dimnames")=List of 2
67 .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
68 .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
69 ..@ expressionFamily :Formal class 'vglmff' [package "VGAM"] with 22 slots
70 .. .. ..@ blurb : chr [1:6] "Negative-binomial distribution with size known\n\n" "Links: " "loglink(mu)" "\n" ...
71 .. .. ..@ constraints : expression({ constraints <- cm.zero.VGAM(constraints, x = x, NULL, M = M, predictors.names = predictors.names, M1 = 2) })
72 .. .. ..@ deviance :function ()
73 .. .. ..@ fini : expression({ })
74 .. .. ..@ first : expression({ })
75 .. .. ..@ infos :function (...)
76 .. .. ..@ initialize : expression({ M1 <- 1 temp5 <- w.y.check(w = w, y = y, Is.nonnegative.y = TRUE, Is.integer.y = TRUE, ncol.w.max | __truncated__
77 .. .. ..@ last : expression({ misc$link <- rep_len("loglink", NOS) names(misc$link) <- mynames1 misc$earg <- vector("list", M) | __truncated__
78 .. .. ..@ linkfun :function ()
79 .. .. ..@ linkinv :function (eta, extra = NULL)
80 .. .. ..@ loglikelihood :function (mu, y, w, residuals = FALSE, eta, extra = NULL, summation = TRUE)
81 .. .. ..@ middle : expression({ })
82 .. .. ..@ middle2 : expression({ })
83 .. .. ..@ summary.dispersion: logi(0)
84 .. .. ..@ vfamily : chr "negbinomial.size"
85 .. .. ..@ validparams :function (eta, y, extra = NULL)
86 .. .. ..@ validfitted :function ()
87 .. .. ..@ simslot :function (object, nsim)
88 .. .. ..@ hadof :function ()
89 .. .. ..@ charfun :function ()
90 .. .. ..@ deriv : expression({ eta <- cbind(eta) M <- ncol(eta) kmat <- matrix(Inf, n, M, byrow = TRUE) newemu <- list(theta = | __truncated__
91 .. .. ..@ weight : expression({ ned2l.dmunb2 <- 1/mu - 1/(mu + kmat) wz <- ned2l.dmunb2 * dmu.deta^2 c(w) * wz })
92 ..@ lowerDetectionLimit : num 1
93 ..@ dispFitInfo :<environment: 0x000000023e5e8110>
94 ..@ dim_reduce_type : chr "DDRTree"
95 ..@ auxOrderingData :<environment: 0x000000002c5323c8>
96 ..@ auxClusteringData :<environment: 0x00000000240fc8a0>
97 ..@ experimentData :Formal class 'MIAME' [package "Biobase"] with 13 slots
98 .. .. ..@ name : chr ""
99 .. .. ..@ lab : chr ""
100 .. .. ..@ contact : chr ""
101 .. .. ..@ title : chr ""
102 .. .. ..@ abstract : chr ""
103 .. .. ..@ url : chr ""
104 .. .. ..@ pubMedIds : chr ""
105 .. .. ..@ samples : list()
106 .. .. ..@ hybridizations : list()
107 .. .. ..@ normControls : list()
108 .. .. ..@ preprocessing : list()
109 .. .. ..@ other : list()
110 .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
111 .. .. .. .. ..@ .Data:List of 2
112 .. .. .. .. .. ..$ : int [1:3] 1 0 0
113 .. .. .. .. .. ..$ : int [1:3] 1 1 0
114 ..@ assayData :<environment: 0x000000005575c7a0>
115 ..@ phenoData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
116 .. .. ..@ varMetadata :'data.frame': 17 obs. of 1 variable:
117 .. .. .. ..$ labelDescription: chr [1:17] NA NA NA NA ...
118 .. .. ..@ data :'data.frame': 2638 obs. of 17 variables:
119 .. .. .. ..$ orig.ident : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
120 .. .. .. ..$ nCount_RNA : num [1:2638] 2419 4903 3147 2639 980 ...
121 .. .. .. ..$ nFeature_RNA : int [1:2638] 779 1352 1129 960 521 781 782 790 532 550 ...
122 .. .. .. ..$ percent.mt : num [1:2638] 3.02 3.79 0.89 1.74 1.22 ...
123 .. .. .. ..$ RNA_snn_res.0.5 : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
124 .. .. .. ..$ seurat_clusters : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
125 .. .. .. ..$ Size_Factor : num [1:2638] 1.108 2.245 1.441 1.208 0.449 ...
126 .. .. .. ..$ num_genes_expressed : int [1:2638] 126 181 149 153 34 103 104 112 64 51 ...
127 .. .. .. ..$ Cluster : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
128 .. .. .. ..$ peaks : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
129 .. .. .. ..$ halo : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
130 .. .. .. ..$ delta : num [1:2638] 0.401 0.442 1.031 0.205 0.812 ...
131 .. .. .. ..$ rho : num [1:2638] 36 33.3 44.8 47.7 31.7 ...
132 .. .. .. ..$ nearest_higher_density_neighbor: logi [1:2638] NA NA NA NA NA NA ...
133 .. .. .. ..$ Pseudotime : num [1:2638] 12.277 11.115 13.023 0.723 16.159 ...
134 .. .. .. ..$ State : Factor w/ 5 levels "1","2","3","4",..: 5 2 5 1 5 4 5 5 5 1 ...
135 .. .. .. ..$ CellType : Factor w/ 3 levels "CDC20","GABPB2",..: 3 3 3 3 3 3 3 3 3 3 ...
136 .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
137 .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
138 .. .. .. .. ..@ .Data:List of 1
139 .. .. .. .. .. ..$ : int [1:3] 1 1 0
140 ..@ featureData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
141 .. .. ..@ varMetadata :'data.frame': 3 obs. of 1 variable:
142 .. .. .. ..$ labelDescription: chr [1:3] NA NA NA
143 .. .. ..@ data :'data.frame': 13714 obs. of 3 variables:
144 .. .. .. ..$ gene_short_name : Factor w/ 13714 levels "7SK.2","A1BG",..: 527 715 9394 9395 5971 7345 5750 8158 9747 4917 ...
145 .. .. .. ..$ num_cells_expressed: int [1:13714] 0 0 0 0 0 1 0 0 0 6 ...
146 .. .. .. ..$ use_for_ordering : logi [1:13714] FALSE FALSE FALSE FALSE FALSE FALSE ...
147 .. .. ..@ dimLabels : chr [1:2] "featureNames" "featureColumns"
148 .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
149 .. .. .. .. ..@ .Data:List of 1
150 .. .. .. .. .. ..$ : int [1:3] 1 1 0
151 ..@ annotation : chr(0)
152 ..@ protocolData :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
153 .. .. ..@ varMetadata :'data.frame': 0 obs. of 1 variable:
154 .. .. .. ..$ labelDescription: chr(0)
155 .. .. ..@ data :'data.frame': 2638 obs. of 0 variables
156 .. .. ..@ dimLabels : chr [1:2] "sampleNames" "sampleColumns"
157 .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
158 .. .. .. .. ..@ .Data:List of 1
159 .. .. .. .. .. ..$ : int [1:3] 1 1 0
160 ..@ .__classVersion__ :Formal class 'Versions' [package "Biobase"] with 1 slot
161 .. .. ..@ .Data:List of 5
162 .. .. .. ..$ : int [1:3] 3 5 2
163 .. .. .. ..$ : int [1:3] 2 42 0
164 .. .. .. ..$ : int [1:3] 1 3 0
165 .. .. .. ..$ : int [1:3] 1 0 0
166 .. .. .. ..$ : int [1:3] 1 2 0
sc-RAN-seq 数据分析||Seurat新版教程:整合分析
Seurat新版教程:Guided Clustering Tutorial-(下)
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注