BRCA的甲基化信号分型（逆向收费读文献2019-11）赠送一篇文章思路

Original 生信技能树生信技能树 2022-06-06

收录于合集 #甲基化 32个

栏目起源

2年前，考虑到科研路的艰难，我组建了文献阅读小组，广邀粉丝参与，从自身做起，开始学习及分享！感兴趣可以点击下面的链接跳转去了解详情：

逆向收费读文献社群（2018-01-07）

逆向收费读文献社群（2018-06-09）

逆向收费读文献社群（第二年通知）（2019-01-26）

大概有50人加入吧，成功坚持下来的朋友们累积了 200多文献阅读笔记，反而是公众号编辑忙不过来，所以大多数好的笔记无缘跟粉丝见面，不过我自己的笔记可以开绿色通道，现在开启系列连载：

预测BRCA基因功能缺陷的HRDetect基因集

TNBC分型研究的来龙去脉

BRCA分型之PAM50

本次更新的《BRCA的甲基化信号分型》为2019 第十一周分享

文章发表于：Breast Cancer Res. 2016; 通过对188个乳腺癌患者样本的450K甲基化芯片，无监督聚类可以分成7组，而且还跟TCGA计划的数据做了比较，而且是多组学层面的比较。

背景知识

对乳腺癌来根据IHC或者分子表达分型已经非常成熟了，包括：

两个ER受体阳性的亚型
relatively low (luminal A) and high (luminal B) expression of proliferation-related genes,
一个 ERBB2-amplified tumors [human epidermal growth factor receptor 2 (HER2)-enriched],
TNBC或者 (basal-like)，低表达 ER, progesterone receptor (PR), and HER2
还有一个normal-like

但是在甲基化领域研究比较少，所以研究者纳入188位乳腺癌患者甲基化数据进行分型，并且在TCGA计划的669样本的450K数据进行独立验证。

如果你一直关注我们的生信菜鸟团，就知道昨天发布了一个：按基因在染色体上的顺序画差异甲基化热图同样的有一百多人的甲基化数据，也可以走本文同样的流程，这样一篇文章就出来啦！

数据

这篇文章涉及到作者自己的数据加上一些公共数据库，如下：

DNA methylation data for the discovery cohort and the cohort of normal cell types are available in the NCBI GEO [16] under accession numbers [GEO:GSE75067] and [GEO:GSE74877], respectively.
DNA methylation data from subpopulations of human blood cells generated by Reinius et al. [15] were downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) [16] accession number [GEO:GSE35069].
芯片表达量数据 were available for 158 of the tumors in the discovery cohort as part of accession number [GEO:GSE25307], which encompasses 577 breast tumors [22].

数据分析过程

对450K甲基化位点，作者这里仅仅是挑选那些在normal队列里面的平均信号值大于0.7或者小于0.3的那些 285K位点。这样分析得到2108个差异甲基化位点，包括：

1016个位点在normal信号值小于0.3，而在tumor大于0.7
总共是515个基因，显著富集在homeobox genes，developmental proteins，cell fate commitment 功能通路。
1092个位点在normal大于0.7，而在tumor小于0.3
总共是416个基因，显著富集在glycoproteins, keratinization ,epithelial cell differentiation功能通路。

而且这些位点，在TCGA数据库也是一样，值得注意的是作者做差异分析的方法有一点创意哦，当然投稿时候就需要承担风险！

根据这2108个差异甲基化位点的信号矩阵，对188个乳腺癌患者进行无监督聚类，合理的挑选，最后定为 7 类。

在TCGA的669个450K样本验证前面得到的7类。

最后当然要探究这7类的临床价值，以及各个类别的生物学意义。

其它甲基化信号分类相关研究

早在2015，就有甲基化信号分型文章 A DNA methylation-based definition of biologically distinct breast cancer subtypes，使用了DNA methylation was analyzed using Infinium 450K arrays in 40 tumors and 17 normal breast samples, 数据公布在；GSE52865. 所以可以非常容易复现这个研究。

、

当然，甲基化数据最多的其实是Nature (04 October 2012) 的TCGA公布的：Illumina Infinium DNA methylation arrays were used to assay 802 breast tumours. Data from HumanMethylation27 (HM27) and HumanMethylation450 (HM450) arrays

通常我们不需要它的原始芯片数据，只是利用好甲基化信号值矩阵的生物学意义即可，如果要下载TCGA Breast Cancer 450K Methylation Data 原始数据，也可以走 https://github.com/gwaygenomics/brca_lowstage_DMGRs 流程。

这个时候需要学习TCGA课程，了解如何下载数据，以及进行后续分析。

https://www.bilibili.com/video/av49363776

也可以仅仅是整合甲基化信号矩阵及表达量矩阵进行数据挖掘，比如：DNA methylation data and RNA-Seq data of breast tumors and normal tissues in the database of The Cancer Genome Atlas (TCGA) were integrated with information of DNA motifs in seven databases 文章是：Identification of epigenetic modulators in human breast cancer by integrated analysis of DNA methylation and RNA-Seq data