其他
R语言和网络药理学:批量处理ETCM的数据
关注公众号,发送R语言,获取学习资料!
在做网络药理学时经常要用到一个网站:ETCM
ETCM[1]全称The Encyclopedia of Traditional Chinese Medicine,中药百科全书。
“ETCM包括常用中药、方剂及其成分的全面、标准化信息。为了促进中药功能和机制的研究,ETCM提供了中药成分、中药和方剂的预测靶基因。ETCM还开发了系统的分析功能,使用户可以探索中草药、方剂、成分、基因靶标、相关通路或疾病之间的关系或建立网络。ETCM供免费的学术使用,数据可以方便的导出。
比如我查询某个中药成分的靶点,可以下载后得到如下的表格:
假如我现在查询了10个成分的靶点,那么就会得到10个csv格式的表格。
表格的第27行,第2列含有我们需要的靶点信息,但是其中还有数字,我们并不需要。
现在我的需求是:把所有表格中的靶点提取出来放到一起。
这在网络药理学中是很常见的操作,下面演示如何用R语言完成这一操作。
先单个试一下
rm(list=ls())
library(stringr)
df <- read.csv("../000files/etcm/tableExport (1).csv",header = F) # 读取文件
df1 <- df[27,2] # 选中单元格
df2 <- gsub(" ","",df1) # 去掉空格
df3 <- str_split(df2,",",simplify = T) # 按照,分割
df4 <- str_extract(df3, "^.*(?=\\()") # 根据正则表达式提取需要的内容,好像叫零宽断言?
df4
## [1] "ARG2" "ASL" "ASS1" "AZIN2" "CKM" "linB" "NOS1" "NOS2"
## [9] "NOS3" "SLC7A1" "SLC7A3" "SLC7A4"
#write.csv(df4,file = "tets.csv")
很成功!接下来就是批量操作!
批量处理ECTM的数据
# 列出所有文件
allfiles <- list.files("../000files/etcm/",pattern = ".csv", full.names = T)
allfiles
## [1] "../000files/etcm/tableExport (1).csv"
## [2] "../000files/etcm/tableExport (10).csv"
## [3] "../000files/etcm/tableExport (2).csv"
## [4] "../000files/etcm/tableExport (3).csv"
## [5] "../000files/etcm/tableExport (4).csv"
## [6] "../000files/etcm/tableExport (6).csv"
## [7] "../000files/etcm/tableExport (7).csv"
## [8] "../000files/etcm/tableExport (8).csv"
## [9] "../000files/etcm/tableExport (9).csv"
# 加载R包
library(tidyverse)
## -- Attaching packages ----------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.6 v dplyr 1.0.8
## v tidyr 1.2.0 v forcats 0.5.1
## v readr 2.0.1
## -- Conflicts -------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
df <- lapply(allfiles, read.csv, header = F) %>% # 批量读取
reduce(., rbind) %>% # 合并
filter(., V1=="Candidate Target Genes") %>% # 选择行
select(., 2) %>% # 选择列
apply(.,1, str_replace_all, pattern = " ", replacement = "") %>% # 去掉空格
str_split(.,",",simplify = T) %>% # 分割
str_extract(., "^.*(?=\\()") # 正则表达式提取
df4 <- na.omit(df)
df5 <- unique(df4) # 去重
df5
## [1] "ARG2" "MIF" "ADH1B" "ACO2" "GLO1" "ANXA5"
## [7] "AHR" "ASL" "SHBG" "ARF1" "AKR1B1" "ATP8A1"
## [13] "AKR1C3" "ASS1" "GSTP1" "ANG" "DGKD" "ALDH2"
## [19] "AZIN2" "HPGDS" "APRT" "DGKG" "AURKB" "AR"
## [25] "CKM" "ISYNA1" "BC1747" "F2" "CBR1" "linB"
## [31] "ITPR1" "BHMT" "NR5A1" "NOS1" "NAGA" "C8G"
## [37] "ntpK" "DHFRL1" "NOS2" "PAEP" "CPB1" "PISD"
## [43] "DNMT1" "NOS3" "PAPSS1" "CS" "PLA2G1B" "SLC7A1"
## [49] "PLA2G2E" "CTDSP1" "PRKCA" "SLC7A3" "PPARD" "GNMT"
## [55] "PROCR" "SLC7A4" "TGFBR2" "HGS" "PSAP" "TRDMT1"
## [61] "HS3ST3A1" "PTDSS1" "IDH1" "PTDSS2" "IDH2" "PTEN"
## [67] "IL4I1" "RHO" "ITEVIR" "SCARB1" "ITPA" "SMPD3"
## [73] "LSM6" "SMPD4" "MDH2" "tesA" "ME2" "PDE5A"
## [79] "PLEKHA1" "RNASE1" "RNASE3" "SRC" "TM0857" "TM1436"
## [85] "TNFSF13B" "UCK2"
OK,又是搞定收工的一天!
接下来就可以方便的进行更多的操作了!
参考资料
ETCM网址: http://www.tcmip.cn/ETCM/index.php/Home/
以上就是今天的内容,希望对你有帮助哦!欢迎点赞、在看、关注、转发!
欢迎在评论区留言或直接添加我的微信!
End
欢迎关注公众号:医学和生信笔记
“医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!
往期回顾
2022-01-07
2022-01-06
2022-01-01
2022-01-05
2022-03-23