一个公共数据集最多可以被挖掘多少次呢?
是我太年轻
学员群有咨询 Agilent-038314 CBC Homo sapiens lncRNA + mRNA microarray V2.0 这个表达量芯片的数据处理问题,当然了,主要是芯片的探针ID对应基因名字的问题。 链接是;https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GPL18109
因为大家还是初学者,所以我就想着先打击一下,说这样的芯片比较难,肯定是很少有人挖掘它,因为它仅仅是提供了探针的碱基序列,有一个费时费力的流程去拿到该芯片的注释信息:
比如我们可以看到《LncRNA and mRNA integration network reconstruction reveals novel key regulators in esophageal squamous-cell carcinoma》 这个2019的文献,链接是: https://doi.org/10.1016/j.ygeno.2018.01.003
就对这个芯片做了非常复杂的处理:
Microarray data contained 71,584 probes. After applying the criteria for re-annotation, 39,068 of the probes were retained, among which 20,323 were corresponding to mRNAs and 18,745 to lncRNA.
最后这些探针还需要去冗余,得到:These probes were mapped to 25,018 unique genes, including 13,490 protein coding genes (PCGs) and 11,528 lncRNAs.
而且这些数据,文章都整理好了,都在附件:
现在你还在发愁,这样的芯片,如何做ID转化吗?
让我吃惊的是
出于职业习惯,我去看了看这个数据集页面: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53625
发现它居然链接到了4个文献:
Li J, Chen Z, Tian L, Zhou C et al. LncRNA profile study reveals a three-lncRNA signature associated with the survival of patients with oesophageal squamous cell carcinoma. Gut 2014 Nov;63(11):1700-10. PMID: 24522499 Li W, Zhang L, Guo B, Deng J et al. Exosomal FMR1-AS1 facilitates maintaining cancer stem-like cell dynamic equilibrium via TLR7/NFκB/c-Myc signaling in female esophageal carcinoma. Mol Cancer 2019 Feb 8;18(1):22. PMID: 30736860 Li Y, Lu Z, Che Y, Wang J et al. Immune signature profiling identified predictive and prognostic factors for esophageal squamous cell carcinoma. Oncoimmunology 2017;6(11):e1356147. PMID: 29147607 Shi X, Chen Z, Hu X, Luo M et al. AJUBA promotes the migration and invasion of esophageal squamous cell carcinoma cells through upregulation of MMP10 and MMP13 expression. Oncotarget 2016 Jun 14;7(24):36407-36418. PMID: 27172796 Liu J, Wang Y, Chu Y, Xu R et al. Identification of a TLR-Induced Four-lncRNA Signature as a Novel Prognostic Biomarker in Esophageal Carcinoma. Front Cell Dev Biol 2020;8:649. PMID: 32850794
我又好奇去谷歌搜了一下这个数据集:
粗略看了看,起码几十篇数据挖掘文献, 就针对这一个数据集,变着花样各种挖,活脱脱的一个数据挖掘发展史!