Monocle2 踩坑教程（2）

Original 周运来单细胞天地 2022-06-07

分享是一种态度

回顾

差异分析

差异基因表达分析是RNA-Seq实验中的一项常见任务。Monocle可以帮助你找到不同细胞群间差异表达的基因，并评估这些变化的统计显著性。这些比较要求您有一种方法将细胞收集到两个或更多组中。这些组由每个CellDataSet的表现型数据表中的列定义。Monocle将评估不同细胞群中每个基因表达水平的显著性。

 1marker_genes <- row.names(subset(fData(HSMM),
 2                                 gene_short_name %in% c("MEF2C", "MEF2D", "MYF5",
 3                                                        "ANPEP", "PDGFRA","MYOG",
 4                                                        "TPM1",  "TPM2",  "MYH2",
 5                                                        "MYH3",  "NCAM1", "TNNT1",
 6                                                        "TNNT2", "TNNC1", "CDK1",
 7                                                        "CDK2",  "CCNB1", "CCNB2",
 8                                                        "CCND1", "CCNA1", "ID1")))
 9
10diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
11                                      fullModelFormulaStr = "~percent.mt")
12> head(diff_test_res)
13      status           family       pval      qval gene_short_name num_cells_expressed use_for_ordering
14MEF2D     OK negbinomial.size 0.10544931 0.4590520           MEF2D                   1            FALSE
15CCNB1     OK negbinomial.size 0.45277970 0.7043240           CCNB1                   1            FALSE
16MEF2C     OK negbinomial.size 0.28733226 0.5028315           MEF2C                   2            FALSE
17TPM2      OK negbinomial.size 0.94348541 0.9696949            TPM2                   0            FALSE
18CDK1      OK negbinomial.size 0.12768261 0.4590520            CDK1                   0            FALSE
19CCND1     OK negbinomial.size 0.04021331 0.4590520           CCND1                   0            FALSE
20
21MYOG_ID1 <- HSMM[row.names(subset(fData(HSMM),
22                                      gene_short_name %in% c("CCND1", "CCNB2"))),]
23plot_genes_jitter(MYOG_ID1, grouping = "seurat_clusters", ncol= 2)

选择对区分细胞类型有意义的基因（Finding Genes that Distinguish Cell Type or State ）

 1#Finding Genes that Distinguish Cell Type or State 
 2to_be_tested <- row.names(subset(fData(HSMM),
 3                                 gene_short_name %in% c("CDC20", "NCAM1", "ANPEP")))
 4cds_subset <- HSMM[to_be_tested,]
 5
 6diff_test_res <- differentialGeneTest(cds_subset,
 7                                      fullModelFormulaStr = "~CellType")
 8diff_test_res[,c("gene_short_name", "pval", "qval")]
 9
10plot_genes_jitter(cds_subset,
11                  grouping = "CellType",
12                  color_by = "CellType",
13                  nrow= 1,
14                  ncol = NULL,
15                  plot_trend = TRUE)

 1full_model_fits <-
 2  fitModel(cds_subset,  modelFormulaStr = "~CellType")
 3reduced_model_fits <- fitModel(cds_subset, modelFormulaStr = "~1")
 4diff_test_res <- compareModels(full_model_fits, reduced_model_fits)
 5diff_test_res
 6
 7      status           family        pval       qval
 8CDC20     OK negbinomial.size 0.007822883 0.02346865
 9NCAM1     OK negbinomial.size 0.791131484 0.88906005
10ANPEP     OK negbinomial.size 0.889060052 0.88906005

Monocle中的差异分析过程非常灵活:您在测试中使用的模型公式可以包含pData表中作为列存在的任何项，包括Monocle在其他分析步骤中添加的列。例如，如果您使用集群细胞，您可以使用集群作为模型公式来测试cluster之间不同的基因。

Finding Genes that Change as a Function of Pseudotime

Monocle的主要工作是将细胞按照生物过程(如细胞分化)的顺序排列，而不事先知道要查看哪些基因。一旦这样做了，你就可以分析细胞，找到随着细胞进步而变化的基因。例如，你可以发现当细胞“成熟”时，基因显著上调。

 1to_be_tested <- row.names(subset(fData(HSMM),
 2                                 gene_short_name %in% c("MYH3", "MEF2C", "CCNB2", "TNNT1")))
 3cds_subset <- HSMM[to_be_tested,]
 4
 5diff_test_res <- differentialGeneTest(cds_subset,
 6                                      fullModelFormulaStr = "~sm.ns(Pseudotime)")
 7
 8diff_test_res[,c("gene_short_name", "pval", "qval")]
 9
10      gene_short_name         pval         qval
11MEF2C           MEF2C 2.996841e-02 3.995789e-02
12CCNB2           CCNB2 5.328721e-03 1.065744e-02
13MYH3             MYH3 6.371156e-01 6.371156e-01
14TNNT1           TNNT1 1.634704e-12 6.538818e-12
15
16plot_genes_in_pseudotime(cds_subset, color_by = "seurat_clusters")

Clustering Genes by Pseudotemporal Expression Pattern

在研究时间序列基因表达研究时，一个常见的问题是:“哪些基因遵循相似的动力学趋势?”Monocle可以通过将具有相似趋势的基因分组来帮助你回答这个问题，这样你就可以分析这些基因组，看看它们有什么共同之处。Monocle提供了一种方便的方法来可视化所有伪时间相关的基因。函数plot_pseudotime_heatmap接受一个CellDataSet对象(通常只包含重要基因的子集)，并生成与plot_genes_in_pseudotime类似的平滑表达曲线.然后，它将这些基因聚类并使用pheatmap包绘制它们。这允许您可视化基因模块，这些模块在伪时间内共同变化。

注意下面的num_clusters 指的是基因可以聚成几个类，而不是细胞。

1diff_test_res <- differentialGeneTest(HSMM[marker_genes,],
2                                      fullModelFormulaStr = "~sm.ns(Pseudotime)")
3sig_gene_names <- row.names(subset(diff_test_res, qval < 0.1))
4plot_pseudotime_heatmap(HSMM[sig_gene_names,],
5                        num_clusters = 6,
6                        cores = 1,
7                        show_rownames = T)
8

Multi-Factorial Differential Expression Analysis

Monocle可以在多个因素存在的情况下进行差异分析，这可以帮助你减去一些因素来看到其他因素的影响。在下面的简单例子中，Monocle测试了三个基因在成肌细胞和成纤维细胞之间的差异表达，同时减去percent.mt的影响。为此，我们必须同时指定完整模型和简化模型。完整的模型同时捕捉细胞类型和percent.mt的影响。

 1to_be_tested <-
 2  row.names(subset(fData(HSMM),
 3                   gene_short_name %in% c("TPM1", "MYH3", "CCNB2", "GAPDH")))
 4
 5cds_subset <- HSMM[to_be_tested,]
 6
 7diff_test_res <- differentialGeneTest(cds_subset,
 8                                      fullModelFormulaStr = "~CellType + percent.mt",
 9                                      reducedModelFormulaStr = "~percent.mt")
10diff_test_res[,c("gene_short_name", "pval", "qval")]
11
12      gene_short_name       pval      qval
13GAPDH           GAPDH 0.07990737 0.1598147
14CCNB2           CCNB2 0.04148172 0.1598147
15TPM1             TPM1 0.90861287 0.9086129
16MYH3             MYH3 0.77085745 0.9086129
17
18plot_genes_jitter(cds_subset,
19                  grouping = "seurat_clusters", color_by = "CellType", plot_trend = TRUE) +
20  facet_wrap( ~ feature_label, scales= "free_y")

Analyzing Branches in Single-Cell Trajectories

通常，单细胞的轨迹包括分支。分支的产生是因为细胞执行替代基因表达程序。在发育过程中，当细胞做出命运的选择时，分支就会出现在轨迹中:一个发育谱系沿着一条路径前进，而另一个谱系产生第二条路径。Monocle包含用于分析这些分支事件的广泛功能。

芭芭拉·特雷特琳和她的同事们在史蒂夫·奎克的实验室里进行了一项实验，他们从正在发育的老鼠肺中获取细胞。他们在细胞发育的早期捕获细胞，之后当肺包含两种主要类型的上皮细胞(AT1和AT2)，以及即将决定成为AT1或AT2的细胞时。Monocle可以将这个过程重构为一个分支轨迹，让你可以非常详细地分析决策点。下图显示了使用Monocle的一些数据重建的一般方法。有一个单独的分支，标记为“1”。当细胞从树的左上方通过树枝的早期发育阶段通过时，哪些基因发生了变化?哪些基因在分支间有差异表达?为了回答这个问题，Monocle为您提供了一个特殊的统计测试:分支表达式分析建模，或BEAM。

1lung <- load_lung()
2plot_cell_trajectory(lung, color_by = "Time")
3
4# 图如下
5
6BEAM_res <- BEAM(lung, branch_point = 1, cores = 1)
7BEAM_res <- BEAM_res[order(BEAM_res$qval),]
8BEAM_res <- BEAM_res[,c("gene_short_name", "pval", "qval")]

使用一种特殊类型的热图，您可以可视化所有明显依赖于分支的基因的变化。这张热图同时显示了两种血统的变化。它还要求您选择要检查的分支点。列是伪时间中的点，行是基因，伪时间的开始在热图的中间。当你从热图的中间读到右边的时候，你正在跟随一个伪时间谱系。当你读到左边时，另一个。这些基因是分层聚类的，因此您可以可视化具有类似的依赖于序列的基因模块.

1plot_genes_branched_heatmap(lung[row.names(subset(BEAM_res,
2                                                  qval < 1e-4)),],
3                            branch_point = 1,
4                            num_clusters = 4,
5                            cores = 1,
6                            use_gene_short_name = T,
7                            show_rownames = T)
8

1lung_genes <- row.names(subset(fData(lung),
2                               gene_short_name %in% c("Ccnd2", "Sftpb", "Pdpn")))
3plot_genes_branched_pseudotime(lung[lung_genes,],
4                               branch_point = 1,
5                               color_by = "Time",
6                               ncol = 1)

  1> str(HSMM)
  2Formal class 'CellDataSet' [package "monocle"] with 19 slots
  3  ..@ reducedDimS          : num [1:2, 1:2638] -2.613 -0.652 -1.784 0.745 -2.376 ...
  4  .. ..- attr(*, "dimnames")=List of 2
  5  .. .. ..$ : NULL
  6  .. .. ..$ : chr [1:2638] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" ...
  7  ..@ reducedDimW          : num [1:281, 1:2] -0.0384 0.0812 -0.0465 0.0681 0.0655 ...
  8  ..@ reducedDimA          : num [1:2, 1:2638] 13.57 29.4 -2.74 -25.16 -9.3 ...
  9  ..@ reducedDimK          : num [1:2, 1:126] -2.305 -0.692 -1.711 0.538 -1.935 ...
 10  .. ..- attr(*, "dimnames")=List of 2
 11  .. .. ..$ : NULL
 12  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
 13  ..@ minSpanningTree      :List of 10
 14  .. ..$ :List of 1
 15  .. .. ..$ Y_1: 'igraph.vs' Named int [1:2] 47 101
 16  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_47" "Y_101"
 17  .. .. .. ..- attr(*, "env")=<weakref> 
 18  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 19  .. ..$ :List of 1
 20  .. .. ..$ Y_2: 'igraph.vs' Named int [1:2] 62 90
 21  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_62" "Y_90"
 22  .. .. .. ..- attr(*, "env")=<weakref> 
 23  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 24  .. ..$ :List of 1
 25  .. .. ..$ Y_3: 'igraph.vs' Named int [1:2] 90 126
 26  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_90" "Y_126"
 27  .. .. .. ..- attr(*, "env")=<weakref> 
 28  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 29  .. ..$ :List of 1
 30  .. .. ..$ Y_4: 'igraph.vs' Named int 106
 31  .. .. .. ..- attr(*, "names")= chr "Y_106"
 32  .. .. .. ..- attr(*, "env")=<weakref> 
 33  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 34  .. ..$ :List of 1
 35  .. .. ..$ Y_5: 'igraph.vs' Named int [1:2] 36 65
 36  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_36" "Y_65"
 37  .. .. .. ..- attr(*, "env")=<weakref> 
 38  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 39  .. ..$ :List of 1
 40  .. .. ..$ Y_6: 'igraph.vs' Named int [1:2] 39 53
 41  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_39" "Y_53"
 42  .. .. .. ..- attr(*, "env")=<weakref> 
 43  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 44  .. ..$ :List of 1
 45  .. .. ..$ Y_7: 'igraph.vs' Named int [1:2] 34 98
 46  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_34" "Y_98"
 47  .. .. .. ..- attr(*, "env")=<weakref> 
 48  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 49  .. ..$ :List of 1
 50  .. .. ..$ Y_8: 'igraph.vs' Named int [1:2] 49 126
 51  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_49" "Y_126"
 52  .. .. .. ..- attr(*, "env")=<weakref> 
 53  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 54  .. ..$ :List of 1
 55  .. .. ..$ Y_9: 'igraph.vs' Named int [1:2] 42 93
 56  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_42" "Y_93"
 57  .. .. .. ..- attr(*, "env")=<weakref> 
 58  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 59  .. ..$ :List of 1
 60  .. .. ..$ Y_10: 'igraph.vs' Named int [1:2] 11 25
 61  .. .. .. ..- attr(*, "names")= chr [1:2] "Y_11" "Y_25"
 62  .. .. .. ..- attr(*, "env")=<weakref> 
 63  .. .. .. ..- attr(*, "graph")= chr "484d230c-94de-11e9-9f7f-07844984f7db"
 64  .. ..- attr(*, "class")= chr "igraph"
 65  ..@ cellPairwiseDistances: num [1:126, 1:126] 0 1.37 1.41 10.62 5.52 ...
 66  .. ..- attr(*, "dimnames")=List of 2
 67  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
 68  .. .. ..$ : chr [1:126] "Y_1" "Y_2" "Y_3" "Y_4" ...
 69  ..@ expressionFamily     :Formal class 'vglmff' [package "VGAM"] with 22 slots
 70  .. .. ..@ blurb             : chr [1:6] "Negative-binomial distribution with size known\n\n" "Links:    " "loglink(mu)" "\n" ...
 71  .. .. ..@ constraints       :  expression({  constraints <- cm.zero.VGAM(constraints, x = x, NULL, M = M, predictors.names = predictors.names, M1 = 2) })
 72  .. .. ..@ deviance          :function ()  
 73  .. .. ..@ fini              :  expression({ })
 74  .. .. ..@ first             :  expression({ })
 75  .. .. ..@ infos             :function (...)  
 76  .. .. ..@ initialize        :  expression({  M1 <- 1  temp5 <- w.y.check(w = w, y = y, Is.nonnegative.y = TRUE, Is.integer.y = TRUE, ncol.w.max | __truncated__
 77  .. .. ..@ last              :  expression({  misc$link <- rep_len("loglink", NOS)  names(misc$link) <- mynames1  misc$earg <- vector("list", M) | __truncated__
 78  .. .. ..@ linkfun           :function ()  
 79  .. .. ..@ linkinv           :function (eta, extra = NULL)  
 80  .. .. ..@ loglikelihood     :function (mu, y, w, residuals = FALSE, eta, extra = NULL, summation = TRUE)  
 81  .. .. ..@ middle            :  expression({ })
 82  .. .. ..@ middle2           :  expression({ })
 83  .. .. ..@ summary.dispersion: logi(0) 
 84  .. .. ..@ vfamily           : chr "negbinomial.size"
 85  .. .. ..@ validparams       :function (eta, y, extra = NULL)  
 86  .. .. ..@ validfitted       :function ()  
 87  .. .. ..@ simslot           :function (object, nsim)  
 88  .. .. ..@ hadof             :function ()  
 89  .. .. ..@ charfun           :function ()  
 90  .. .. ..@ deriv             :  expression({  eta <- cbind(eta)  M <- ncol(eta)  kmat <- matrix(Inf, n, M, byrow = TRUE)  newemu <- list(theta = | __truncated__
 91  .. .. ..@ weight            :  expression({  ned2l.dmunb2 <- 1/mu - 1/(mu + kmat)  wz <- ned2l.dmunb2 * dmu.deta^2  c(w) * wz })
 92  ..@ lowerDetectionLimit  : num 1
 93  ..@ dispFitInfo          :<environment: 0x000000023e5e8110> 
 94  ..@ dim_reduce_type      : chr "DDRTree"
 95  ..@ auxOrderingData      :<environment: 0x000000002c5323c8> 
 96  ..@ auxClusteringData    :<environment: 0x00000000240fc8a0> 
 97  ..@ experimentData       :Formal class 'MIAME' [package "Biobase"] with 13 slots
 98  .. .. ..@ name             : chr ""
 99  .. .. ..@ lab              : chr ""
100  .. .. ..@ contact          : chr ""
101  .. .. ..@ title            : chr ""
102  .. .. ..@ abstract         : chr ""
103  .. .. ..@ url              : chr ""
104  .. .. ..@ pubMedIds        : chr ""
105  .. .. ..@ samples          : list()
106  .. .. ..@ hybridizations   : list()
107  .. .. ..@ normControls     : list()
108  .. .. ..@ preprocessing    : list()
109  .. .. ..@ other            : list()
110  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
111  .. .. .. .. ..@ .Data:List of 2
112  .. .. .. .. .. ..$ : int [1:3] 1 0 0
113  .. .. .. .. .. ..$ : int [1:3] 1 1 0
114  ..@ assayData            :<environment: 0x000000005575c7a0> 
115  ..@ phenoData            :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
116  .. .. ..@ varMetadata      :'data.frame': 17 obs. of  1 variable:
117  .. .. .. ..$ labelDescription: chr [1:17] NA NA NA NA ...
118  .. .. ..@ data             :'data.frame': 2638 obs. of  17 variables:
119  .. .. .. ..$ orig.ident                     : Factor w/ 1 level "pbmc3k": 1 1 1 1 1 1 1 1 1 1 ...
120  .. .. .. ..$ nCount_RNA                     : num [1:2638] 2419 4903 3147 2639 980 ...
121  .. .. .. ..$ nFeature_RNA                   : int [1:2638] 779 1352 1129 960 521 781 782 790 532 550 ...
122  .. .. .. ..$ percent.mt                     : num [1:2638] 3.02 3.79 0.89 1.74 1.22 ...
123  .. .. .. ..$ RNA_snn_res.0.5                : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
124  .. .. .. ..$ seurat_clusters                : Factor w/ 9 levels "0","1","2","3",..: 2 4 2 3 7 2 5 5 1 6 ...
125  .. .. .. ..$ Size_Factor                    : num [1:2638] 1.108 2.245 1.441 1.208 0.449 ...
126  .. .. .. ..$ num_genes_expressed            : int [1:2638] 126 181 149 153 34 103 104 112 64 51 ...
127  .. .. .. ..$ Cluster                        : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
128  .. .. .. ..$ peaks                          : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
129  .. .. .. ..$ halo                           : logi [1:2638] FALSE FALSE FALSE FALSE FALSE FALSE ...
130  .. .. .. ..$ delta                          : num [1:2638] 0.401 0.442 1.031 0.205 0.812 ...
131  .. .. .. ..$ rho                            : num [1:2638] 36 33.3 44.8 47.7 31.7 ...
132  .. .. .. ..$ nearest_higher_density_neighbor: logi [1:2638] NA NA NA NA NA NA ...
133  .. .. .. ..$ Pseudotime                     : num [1:2638] 12.277 11.115 13.023 0.723 16.159 ...
134  .. .. .. ..$ State                          : Factor w/ 5 levels "1","2","3","4",..: 5 2 5 1 5 4 5 5 5 1 ...
135  .. .. .. ..$ CellType                       : Factor w/ 3 levels "CDC20","GABPB2",..: 3 3 3 3 3 3 3 3 3 3 ...
136  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
137  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
138  .. .. .. .. ..@ .Data:List of 1
139  .. .. .. .. .. ..$ : int [1:3] 1 1 0
140  ..@ featureData          :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
141  .. .. ..@ varMetadata      :'data.frame': 3 obs. of  1 variable:
142  .. .. .. ..$ labelDescription: chr [1:3] NA NA NA
143  .. .. ..@ data             :'data.frame': 13714 obs. of  3 variables:
144  .. .. .. ..$ gene_short_name    : Factor w/ 13714 levels "7SK.2","A1BG",..: 527 715 9394 9395 5971 7345 5750 8158 9747 4917 ...
145  .. .. .. ..$ num_cells_expressed: int [1:13714] 0 0 0 0 0 1 0 0 0 6 ...
146  .. .. .. ..$ use_for_ordering   : logi [1:13714] FALSE FALSE FALSE FALSE FALSE FALSE ...
147  .. .. ..@ dimLabels        : chr [1:2] "featureNames" "featureColumns"
148  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
149  .. .. .. .. ..@ .Data:List of 1
150  .. .. .. .. .. ..$ : int [1:3] 1 1 0
151  ..@ annotation           : chr(0) 
152  ..@ protocolData         :Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
153  .. .. ..@ varMetadata      :'data.frame': 0 obs. of  1 variable:
154  .. .. .. ..$ labelDescription: chr(0) 
155  .. .. ..@ data             :'data.frame': 2638 obs. of  0 variables
156  .. .. ..@ dimLabels        : chr [1:2] "sampleNames" "sampleColumns"
157  .. .. ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slot
158  .. .. .. .. ..@ .Data:List of 1
159  .. .. .. .. .. ..$ : int [1:3] 1 1 0
160  ..@ .__classVersion__    :Formal class 'Versions' [package "Biobase"] with 1 slot
161  .. .. ..@ .Data:List of 5
162  .. .. .. ..$ : int [1:3] 3 5 2
163  .. .. .. ..$ : int [1:3] 2 42 0
164  .. .. .. ..$ : int [1:3] 1 3 0
165  .. .. .. ..$ : int [1:3] 1 0 0
166  .. .. .. ..$ : int [1:3] 1 2 0