enrichGO出不来结果?没结果也是正确的结果
Dear Dr. Guangchuang Yu, I write to you regarding a doubt concerning the enrichGO function from Clustalprofiler package. I have been used this package before, but now I’m using the same R script and I have an error message.
This is the command I use:
go.bp <- enrichGO(gene = gene.df$ENSEMBL, universe = universe.ENSEMBLID, keytype = ‘ENSEMBL’, OrgDb = org.Ce.eg.db, ont = ‘BP’, pAdjustMethod = ‘BH’, pvalueCutoff = 0.01, qvalueCutoff = 0.05, readable=T)
and the error is the following one:
No gene set have size > 10 …
→ return NULL…My input list is attached to this email (101 genes in total). When I use this list in a web resource such as GOrilla it gives to me the proper GO terms.
Thank you very much in advance. Best regards,
María
最近clusterProfiler用户的问题,这个问题还蛮普遍。这个我在《why clusterProfiler fails》中也有谈到,并不是能出结果就是好的。没有结果也是一种结果。
这里首先他用了universe = universe.ENSEMBLID
,而报错信息又是No gene set have size > 10 ...
,可以看出他来当背景的基因集应该比较小,而enrichGO
默认只[10,500]区间的基因集,太小和太大的基因集,通常意义不大,而且也容易给出很小的p值。当然这个参数是可以调的,你可以用全集,或者自己设置大小。
他在信中并没有给出universe.ENSEMBLID
,而只是给了gene.df
给我,如果没有设定universe
的情况下,是可以跑出结果的:
> go.bp <- enrichGO(gene = gene.df$ENSEMBL,
keytype = 'ENSEMBL',
OrgDb = org.Ce.eg.db,
ont = 'BP',
pAdjustMethod = 'BH',
pvalueCutoff = 0.01,
qvalueCutoff = 0.05,
readable=T)
>
> go.bp## over-representation test##...@organism Caenorhabditis elegans#...@ontology BP#...@keytype ENSEMBL#...@gene chr [1:101] "WBGene00013475" "WBGene00002198" "WBGene00004395" ...#...pvalues adjusted by 'BH' with cutoff <0.01#...23 enriched terms found'data.frame': 23 obs. of 9 variables:
$ ID : chr "GO:0018193" "GO:0008360" "GO:0018105" "GO:0018209" ...
$ Description: chr "peptidyl-amino acid modification" "regulation of cell shape" "peptidyl-serine phosphorylation" "peptidyl-serine modification" ...
$ GeneRatio : chr "85/101" "55/101" "56/101" "56/101" ...
$ BgRatio : chr "414/11158" "101/11158" "151/11158" "152/11158" ...
$ pvalue : num 3.21e-108 5.28e-92 1.57e-81 2.48e-81 1.60e-77 ...
$ p.adjust : num 6.87e-106 5.65e-90 1.12e-79 1.33e-79 6.83e-76 ...
$ qvalue : num 6.35e-106 5.23e-90 1.04e-79 1.23e-79 6.32e-76 ...
$ geneID : chr "Y69E1A.3/kin-14/rol-3/C35E7.10/kin-5/Y116A8C.24/F59A3.8/F26E4.5/spe-8/C25A8.5/T21G5.1/F57B9.8/T25B9.5/W01B6.5/k"| __truncated__ "ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8C.2/K09E4"| __truncated__ "Y39G8B.5/ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8"| __truncated__ "Y39G8B.5/ZK507.3/K09C6.8/C09D4.3/R13H9.5/C56C10.6/Y65B4A.9/D2045.5/C39H7.1/B0218.5/ttbk-2/T05A7.6/M04F3.3/Y39G8"| __truncated__ ...
$ Count : int 85 55 56 56 55 56 55 55 24 29 ...#...Citation
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology 2012, 16(5):284-287
这里他还提到了用GOrilla跑的话,有结果出来,而clusterProfiler没结果,这一方面是前面谈到的[10, 500]区间的问题,另一方面,也是《why clusterProfiler fails》里讲到的,因为clusterProfiler给出更保守、更可靠的结果。
我打开GOrilla主页,看了下文档,http://cbl-gorilla.cs.technion.ac.il/help.html,果然在P-value threshold
这一块里写得很清楚:
Parameters
P-value threshold - Only GO terms with a p-value better than this threshold are reported.
Note that this p-value does not include the multiple hypothesis correction on the number of tested GO terms. To correct for this the p-value should be multiplied by the number of GO terms used as reported in the results page.
GOrilla并没有multiple testing correction, 而clusterProfiler用了p value adjustment和qvalue来控制multiple comparison。所以啊,clusterProfiler报出来的显著性要保守和可靠得多,出来的结果必须要比GOrilla少。