TCGA数据库maf突变资料官方大全
因为TCGA计划跨时太长,这些年找somatic变异的软件也很多,所以TCGA团队下功夫在计划结束后(April 2018)完整的系统性的整理了最后的somatic突变数据。依托于文章:[Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines]() March 201810.1016/j.cels.2018.03.002
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs
across 33 different cancer types, in total >400 TB of raw data files
requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project
, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices
for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants
and forms the basis for PanCan Atlas papers.
The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
纳入的软件
Deposited Data | ||
---|---|---|
MC3 Files | https://gdc.cancer.gov/about-data/publications/mc3-2017 | |
Software and Algorithms | ||
MuTect | https://github.com/broadinstitute/mutect | |
Pindel | https://github.com/genome/pindel | |
Radia | https://github.com/aradenbaugh/radia | |
VarScan2 | http://dkoboldt.github.io/varscan/ | |
SomaticSniper | https://github.com/genome/somatic-sniper | |
MuSE | https://github.com/danielfan/MuSE | |
Indelocator | http://archive.broadinstitute.org/cancer/cga/indelocator | |
Maf2Vcf | https://github.com/covingto/vcf2maf/ |
文章资料下载
GDC Manifests
Open-Access Data - Download Manifest (6 Files)
MAF Files
MC3 Public MAF - mc3.v0.2.8.PUBLIC.maf.gz
Reference Files
Reference Data - ref_data_for_oxog_all_permissions.tar.gz
Reference Data - 2 - ref_data_for_oxog.tar
Target Region BED File - gencode.v19.basic.exome.bed
gaf_20111020Plusbroad_wex_1.1_hg19.bed - gaf_20111020Plusbroad_wex_1.1_hg19.bed
其它类似研究
TCGA计划是最出名的肿瘤研究计划,其它也有一些优秀者值得关注
Project | Method | Sample Count (Approx.) |
---|---|---|
TCGA MC3 | exome | 10,000 |
GENIE | 44 gene panel | 19,000 |
ICGC PCAWG | whole genome | 2,800 |
100,000 Genomes Project | whole genome | projected: 100,000 |
CCLE | exome | 950 |
Target | exome | 700 |
Foundation medicine | 306 gene panel | 18,000 |
写在最后
因为文中太多链接,所以大家可能需要点击阅读原文去跳转
然后因为这些资源介绍太简单,没有资格列入我的TCGA 28篇教程,所以大家就随意看看。
TCGA的28篇教程-使用R语言的cgdsr包获取TCGA数据(cBioPortal)
TCGA的28篇教程-使用R语言的RTCGA包获取TCGA数据 (离线打包版本)
TCGA的28篇教程-使用R语言的RTCGAToolbox包获取TCGA数据 (FireBrowse portal)
TCGA的28篇教程-批量下载TCGA所有数据 ( UCSC的 XENA)