查看原文
其他

using meshes for MeSH Enrichment Analysis

Y叔 biobabble 2020-02-05

MeSH (Medical Subject Headings) is the NLM (U.S. National Library of Medicine) controlled vocabulary used to manually index articles for MEDLINE/PubMed. MeSH is comprehensive life science vocabulary. MeSH has 19 categories and MeSH.db contains 16 of them. That is:

AbbreviationCategory
AAnatomy
BOrganisms
CDiseases
DChemicals and Drugs
EAnalytical, Diagnostic and Therapeutic Techniques and Equipment
FPsychiatry and Psychology
GPhenomena and Processes
HDisciplines and Occupations
IAnthropology, Education, Sociology and Social Phenomena
JTechnology and Food and Beverages
KHumanities
LInformation Science
MPersons
NHealth Care
VPublication Type
ZGeographical Locations


MeSH terms were associated with Entrez Gene ID by three methods, gendoogene2pubmed and RBBH (Reciprocal Blast Best Hit).

MethodWay of corresponding Entrez Gene IDs and MeSH IDs
GendooText-mining
gene2pubmedManual curation by NCBI teams
RBBHsequence homology with BLASTP search (E-value<10-50)


meshes supports enrichment analysis (over-representation analysis and gene set enrichment analysis) of gene list or whole expression profile using MeSH annotation. Data source from gendoogene2pubmed and RBBH are all supported. User can selecte interesting category to test. All 16 categories are supported. The analysis supports >70 species listed in MeSHDb BiocView.

library(meshes)
data(geneList)
de = names(geneList)[1:100]
x <- enrichMeSH(de, MeSHDb = "MeSH.Hsa.eg.db", database='gendoo', category = 'C')
head(x)

#
#              ID              Description GeneRatio   BgRatio       pvalue
## D043171 D043171  Chromosomal Instability     16/96 198/16528 2.794765e-14
## D000782 D000782               Aneuploidy     17/96 320/16528 3.866830e-12
## D042822 D042822      Genomic Instability     16/96 312/16528 3.007419e-11
## D012595 D012595    Scleroderma, Systemic     11/96 279/16528 6.449334e-07
## D009303 D009303 Nasopharyngeal Neoplasms     11/96 314/16528 2.049315e-06
## D019698 D019698     Hepatitis C, Chronic     11/96 317/16528 2.246856e-06
##             p.adjust       qvalue
## D043171 2.434241e-11 1.794534e-11
## D000782 1.684004e-09 1.241456e-09
## D042822 8.731539e-09 6.436931e-09
## D012595 1.404343e-04 1.035288e-04
## D009303 3.261686e-04 2.404530e-04
## D019698 3.261686e-04 2.404530e-04
##                                                                                      geneID
## D043171    4312/991/2305/1062/4605/10403/7153/55355/4751/4085/81620/332/7272/9212/1111/6790
## D000782 4312/55143/991/1062/7153/4751/79019/55839/890/983/4085/332/7272/9212/8208/1111/6790
## D042822     55143/991/1062/4605/7153/1381/9787/4751/10635/890/4085/81620/332/9212/1111/6790
## D012595                              4312/6280/1062/4605/7153/3627/4283/6362/7850/3002/4321
## D009303                                4312/7153/3627/6241/983/4085/5918/332/3002/4321/6790
## D019698                               4312/3627/10563/6373/4283/983/6362/7850/332/3002/3620
##         Count
## D043171    16
## D000782    17
## D042822    16
## D012595    11
## D009303    11
## D019698    11

In the over-representation analysis, we use data source from gendoo and C (Diseases) category.

In the following example, we use data source from gene2pubmed and test category G (Phenomena and Processes) using GSEA.

y <- gseMeSH(geneList, MeSHDb = "MeSH.Hsa.eg.db", database = 'gene2pubmed', category = "G")

#
# [1] "preparing geneSet collections..."
## [1] "GSEA analysis..."
## [1] "leading edge analysis..."
## [1] "done..."
head(y)

#
#              ID                  Description setSize enrichmentScore
## D009929 D009929                   Organ Size     449      -0.3458797
## D059647 D059647 Gene-Environment Interaction     455      -0.3551242
## D009043 D009043               Motor Activity     398      -0.3391521
## D050156 D050156                 Adipogenesis     368      -0.3618413
## D004041 D004041                 Dietary Fats     314      -0.3427588
## D006339 D006339                   Heart Rate     312      -0.3695689
##               NES      pvalue   p.adjust    qvalues rank
## D009929 -1.524164 0.001248439 0.03715088 0.02756207 2309
## D059647 -1.564984 0.001251564 0.03715088 0.02756207 2237
## D009043 -1.483672 0.001256281 0.03715088 0.02756207 1757
## D050156 -1.577000 0.001256281 0.03715088 0.02756207 2207
## D004041 -1.473730 0.001269036 0.03715088 0.02756207 1684
## D006339 -1.588315 0.001270648 0.03715088 0.02756207 2405
##                           leading_edge
## D009929 tags=27%, list=18%, signal=22%
## D059647 tags=26%, list=18%, signal=22%
## D009043 tags=21%, list=14%, signal=18%
## D050156 tags=26%, list=18%, signal=22%
## D004041 tags=21%, list=13%, signal=19%
## D006339 tags=29%, list=19%, signal=24%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          core_enrichment
## D009929                     154/9846/3315/6716/9732/5139/7337/5530/4086/6532/1499/7157/627/2252/22891/2908/8654/4088/22846/4057/860/268/2735/2104/23522/5480/51131/3082/10253/831/604/1028/182/7173/5624/8743/23047/596/9905/1548/2272/22829/948/27303/4314/196/6019/595/5021/7248/4212/2488/54820/5334/6403/2246/4803/866/5919/79789/1907/7048/1831/4060/2247/5468/8076/5793/3485/1733/3952/126/3778/79068/79633/6653/5244/4313/3625/10468/9201/1501/6720/2273/2099/3480/5764/6387/1471/1462/4016/2690/8817/8821/5125/1191/5350/2162/5744/23541/185/367/4982/25802/4128/150/3479/10451/9370/125/4857/1308/2167/652/57502/4137/8614/5241
## D059647 9497/118/8859/6532/23405/7424/2295/7157/8631/627/2774/22891/2908/4088/51151/11132/1387/860/268/7366/2104/4153/29119/3791/1543/3643/22841/1129/5624/3240/3174/3350/5590/55304/55213/1548/2169/196/8204/8863/5021/23284/9162/11005/4256/3426/84159/5334/629/1793/4208/4322/7048/6817/553/56172/3953/22795/2638/210/5243/5468/1393/1012/27136/51314/4023/5172/4319/4214/3952/5577/126/7832/79068/4313/2944/9369/3075/6720/7494/2099/857/57161/9223/4306/79750/4035/4915/10443/5744/5654/100126791/3551/2487/1746/185/2952/6935/4128/4059/4582/27324/9358/64084/7166/6505/9370/3708/3117/80129/125/5105/2018/2167/652/4137/1524/5241
## D009043                                                                                                                                                                                                       23621/3082/1291/2915/1543/7466/3240/3350/55304/181/2169/27306/80169/9627/196/8678/8863/23284/81627/4692/5799/2259/3087/1278/1277/3953/4747/2247/6414/210/4744/5468/89795/4023/8522/3485/3952/79068/8864/4313/2944/2273/2099/3480/8528/4908/56892/3339/57161/4741/4306/6571/79750/4915/5744/2487/58503/347/6863/2952/5327/367/4982/4128/4059/3572/150/7060/9358/7166/3479/9254/5348/4129/9370/3708/1311/5105/4137/1408/5241
## D050156                                                                                                            5595/8609/9563/27332/1499/79738/4837/7157/79960/5729/408/2908/4088/6500/8038/4057/6649/5564/860/8648/10365/10253/54884/4602/7474/6776/79875/596/25956/8644/80781/79923/1490/50486/7840/84162/6041/4692/2246/4208/11075/63924/5919/284119/2308/9411/54795/5950/79365/2247/5468/50507/6469/8553/4023/594/7350/81029/3952/79068/5733/4313/10468/10628/6720/11213/55893/290/6678/63895/4035/633/23414/8639/2162/165/3551/10788/185/3357/367/4982/3667/1634/4128/23024/3479/6424/9370/2167/652/8839/54829/2625/79689/10974
## D004041                                                                                                                                                                                                                                                                                      3554/4925/22841/7466/2181/3350/201134/181/2169/948/55911/324/4018/3426/3087/6785/2308/1581/56172/3953/1384/5950/2166/60481/5468/5166/50507/1012/27136/4023/7056/4214/9365/7350/3952/3778/79068/8864/2944/6720/5159/3991/2203/2819/9223/4035/32/213/165/347/2152/185/3487/5327/3667/54898/150/64084/3479/9370/5105/5174/2018/5346/7021/79689
## D006339                                                                                                                                                                            4985/7139/8929/3784/3375/154/1760/9781/5139/118/2702/6532/6416/2869/270/7157/627/2908/7138/5563/3643/1129/7779/947/2034/4179/64388/1621/4881/8863/5021/844/4212/11030/5797/6403/4803/84059/79789/5176/3953/5243/5468/1012/2868/5793/4023/7056/3952/5577/126/2946/3778/477/5733/4313/2944/9201/3075/9499/2273/2099/1471/857/775/4306/4487/213/5350/5744/23245/2152/2697/2791/185/6863/2952/5327/80206/9607/3572/150/3479/2006/55259/9370/125/652/55351

Users can use visualization methods implemented in DOSE (i.e. barplotdotplotcnetplotenrichMapupsetplotgseaplot) to help interpreting enriched results.

gseaplot(y, y[1,1], title=y[1,2])

传送门

    您可能也对以下帖子感兴趣

    文章有问题?点此查看未经处理的缓存