富集分析事后丸：过滤数据

Original Y叔叔 YuLabSMU 2022-09-20

有了事后丸之后，腰不酸了，背不疼了，腿啊也不抽筋了，走路也有劲儿了，我有个梦想，希望每个生信小白做分析之后，都能吃上一颗事后丸。

The cnetplot is a really attractive feature. I wonder if it is possible to subset the enrichment object that goes into the cnetplot function?

Case scenario: Run an enrichment analysis with pvalueCutoff = 1, to see all results. Plotting them all would be infeasible. How to subset the enrichment object to, say, first ten most significant terms, and then plot it with cnetplot?

github上这个问题问得挺好，其实啊，你用clusterProfiler系列包做富集分析的话，根本就不用卡p值、q值这些，因为你完全可以吃事后丸！拿到全部结果之后，想怎么搞就怎么搞。

基本上data.frame取子集那些方法，包括[, [[, $等操作符，都被我重新定义，也就是说，对于富集分析的结果，你完全可以像data.frame一样的操作。

这里用DOSE包做为例子，富集的结果存在x对象中：

> require(DOSE)
Loading required package: DOSE

DOSE v3.9.1  For help: https://guangchuangyu.github.io/DOSE

If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609

> data(geneList)
> de = names(geneList)[1:100]
> x = enrichDO(de, qvalueCutoff=1, pvalueCutoff=1)

我们可以用dim看看有多少结果，可以用head和tail偷窥一下结果：

> dim(x)
[1] 383   9
> head(x, 2)
                       ID            Description GeneRatio  BgRatio
DOID:0060071 DOID:0060071 pre-malignant neoplasm      5/77  22/8007
DOID:5295       DOID:5295     intestinal disease      9/77 157/8007
                   pvalue     p.adjust       qvalue
DOID:0060071 1.671524e-06 0.0006401937 0.0004609887
DOID:5295    1.759049e-05 0.0027885022 0.0020079362
                                                  geneID Count
DOID:0060071                    6280/6278/10232/332/4321     5
DOID:5295    4312/6279/3627/10563/4283/890/366/4902/3620     9

然后我们就可以像操作data.frame一样来操作它，取个子集，但最后你会发现，输出的是data.frame的对象y，也就是你实际上无法用enrichplot去对y作图：

> y = x[x$pvalue < 0.05, ]
> dim(y)
[1] 121   9
> tail(y, 2)
                 ID                    Description GeneRatio  BgRatio
DOID:7474 DOID:7474 malignant pleural mesothelioma      2/77  36/8007
DOID:8692 DOID:8692               myeloid leukemia      4/77 142/8007
              pvalue  p.adjust    qvalue             geneID Count
DOID:7474 0.04659310 0.1487096 0.1070824          10232/332     2
DOID:8692 0.04748286 0.1502970 0.1082254 820/10232/332/3620     4
> class(y)
[1] "data.frame"

解决方法来了

于是我给[方法加了一个参数，叫asis，让输出保持为富集分析结果的对象，默认是FALSE，我们显式设为TRUE即可。这功能要求DOSE >= 1.9.1。

> y = x[x$pvalue < 0.05, asis=T]
> class(y)
[1] "enrichResult"
attr(,"package")
[1] "DOSE"
> y
#
# over-representation test
#
#...@organism    Homo sapiens
#...@ontology    DO
#...@keytype     ENTREZID
#...@gene        chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" "6280" "2305" ...
#...pvalues adjusted by 'BH' with cutoff <1
#...121 enriched terms found
'data.frame':   121 obs. of  9 variables:
 $ ID         : chr  "DOID:0060071" "DOID:5295" "DOID:8719" "DOID:3007" ...
 $ Description: chr  "pre-malignant neoplasm" "intestinal disease" "in situ carcinoma" "breast ductal carcinoma" ...
 $ GeneRatio  : chr  "5/77" "9/77" "4/77" "4/77" ...
 $ BgRatio    : chr  "22/8007" "157/8007" "18/8007" "29/8007" ...
 $ pvalue     : num  1.67e-06 1.76e-05 2.18e-05 1.56e-04 2.08e-04 ...
 $ p.adjust   : num  0.00064 0.00279 0.00279 0.0136 0.0136 ...
 $ qvalue     : num  0.000461 0.002008 0.002008 0.009796 0.009796 ...
 $ geneID     : chr  "6280/6278/10232/332/4321" "4312/6279/3627/10563/4283/890/366/4902/3620" "6280/6278/10232/332" "6280/6279/4751/6286" ...
 $ Count      : int  5 9 4 4 13 6 13 5 5 6 ...
#...Citation
  Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
  R/Bioconductor package for Disease Ontology Semantic and Enrichment
  analysis. Bioinformatics 2015, 31(4):608-609

于是新的y，就还是富集分析生成的对象，这样就可以用enrichplot来画图。目前只在开发版支持，超几何分布和GSEA都全线支持。以后就有事后丸吃了！

这里演示用的虽然只是pvalue，我知道很多人只会follow例子，所以要特别指出，任何column，你都可以拿来滤，比如geneID你可以滤出只有某些基因参与的通路出来，比如ID或Description你可以滤出你想要的通路来，而Count你可以限制参与的基因的数目等等，滤完之后，你就可以画出相应的图了，这简直给了你玩坏它的可能性。

PS1: 微软越来越硬了，一则消息炸开了锅：
Unlimited free private Git repositories
从此github晋升网盘，我一直在用gitlab当网盘，现在可以用github了。
还记得《制作meme的通用方式，来了解一下》一文中的调侃吗？虽然目前还是gitlab给得大方些。这次github可是要死掐gitlab了，毕竟同性社交嘛，社交的地方，还是要看人气。其实gitlab的云存储，用的也是微软家的，以我个人的经验，gitlab当机的概率要高于github。
100年后看，苹果 vs 微软，微软才是伟大的公司，后乔布斯时候的苹果只知道赚钱。100年后再看，Steve Jobs vs Bill Gates，Gates才是伟大的人，因为Gates身体力行在做慈善。捐钱容易，身体力行在推动一些东西，就难得了。

PS2: 月底要去南丹麦大学，有没有丹麦的小伙伴要面基？

PS3: 2019年立个flag，我要写一本书，扫码围观：

往期精彩

一把短刀，怎么就让他连捅18人？！

向杨大市长道歉

向不容妄议的杨市长道歉

以色列搞大了：伊朗说要直接出兵参战

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

富集分析事后丸：过滤数据

解决方法来了

您可能也对以下帖子感兴趣

一把短刀，怎么就让他连捅18人？！

向杨大市长道歉

向不容妄议的杨市长道歉

以色列搞大了：伊朗说要直接出兵参战

黄晓菁，这位杭州泰隆银行女员工自爆视频火了，带给我们那些思考？

生成图片，分享到微信朋友圈

富集分析事后丸：过滤数据

解决方法来了

您可能也对以下帖子感兴趣