查看原文
其他

TCGA的pan-caner资料大全(以后挖掘TCGA数据库就用它)

随着这28篇TCGA数据库整合挖掘文章出现的是他们团队精心整理好的全套TCGA数据资料供下载,其实就是TCGA的pan-caner项目的产品,全部组学数据都被整理好了,比如:

  • gene and protein expression

  • copy number

  • DNA methylation

  • somatic mutation

全部文件下载

链接是 https://gdc.cancer.gov/about-data/publications/pancanatlas :

  • RNA (Final) - EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv

  • RPPA (Final) - TCGA-RPPA-pancan-clean.txt

  • DNA Methylation (450K Only) - jhu-usc.edu_PANCAN_HumanMethylation450.betaValue_whitelisted.tsv

  • DNA Methylation (Merged 27K+450K Only) - jhu-usc.edu_PANCAN_merged_HumanMethylation27_HumanMethylation450.betaValue_whitelisted.tsv

  • miRNA (Batch Effects Normalized miRNA data)

  • Sample List - PanCanAtlas_miRNA_sample_information_list.txt

  • Protocol Platform - pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16.csv

  • Copy Number - broad.mit.edu_PANCAN_Genome_Wide_SNP_6_whitelisted.seg

  • ABSOLUTE-annotated MAF - TCGA_consolidated.abs_mafs_truncated.fixed.txt.gz

  • ABSOLUTE-annotated seg file - TCGA_mastercalls.abs_segtabs.fixed.txt

  • ABSOLUTE purity/ploidy file - TCGA_mastercalls.abs_tables_JSedit.fixed.txt

  • Mutations - mc3.v0.2.8.PUBLIC.maf.gz

  • TCGA-Clinical Data Resource (CDR) Outcome* -

    其中临床信息也被重新校验了。

    TCGA-CDR-SupplementalTableS1.xlsx

  • A curated resource of the clinical annotations for TCGA data and provides recommendations for use of clinical endpoints

  • It is strongly recommended that this file be used for clinical elements and survival outcome data first; more details please see the TCGA-CDR paper(link is external).

  • Clinical with Follow-up - clinical_PANCAN_patient_with_followup.tsv

  • Merged Sample Quality Annotations - merged_sample_quality_annotations.tsv

  • PARADIGM Pathway Inference Matrix - merge_merged_reals.tar.gz

RNA-seq数据

介绍一个去除了批次效应并且归一化好了的 RNA-seq表达矩阵

File: EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv
Contains batch normalized RNASeqV2 mRNA data.
20531 genes (rows) x 11069 samples (columns). ~1.6 GB file size.

File: EB++GeneExpAnnotation.tsv
Contains annotations about exactly which samples were adjusted and which weren't

Adjustment procedure:

  1. All Hi-Seq data from UNC were unchanged, with the exception of PRAD (prostate)

  2. All data from BCGSC, whether Hi-Seq or GA, were unchanged

  3. PRAD batch IDs 312 and 320 were adjusted to remove batch effects. Remaining PRAD data were unchanged. See PCA-plus plot BEFORE correction and the justification for correction

  4. All GA samples from UNC were adjusted to remove platform effects between UNC Hi-Seq and GA samples. The tumor types containing UNC GA samples that were adjusted are UCEC, COAD, and READ.

  5. Genes with mostly zero reads or with residual batch effects (approx. 2-3k or 10% of genes) were removed from the adjusted samples and replaced with NAs. No genes were removed from samples with "No Change" status.

  6. Genes were adjusted using a novel algorithm called EB++; a variant of Empirical Bayes/ComBat algorithm with training/testing features added.

Future adjustments:

  1. Removal of any platform effects in GA samples vs. Hi-Seq from BCGSC. The tumor types potentially affected will be LAML, STAD, and ESCA. Analysis is pending.

  2. Possible adjustment of all samples from BCGSC to remove center effects between BCGSC and UNC. Tumor types potentially affected will be LAML, STAD, ESCA and OV. Analysis is pending.

  3. Addition of microarray samples for GBM and OV.

  4. Potential adjustment of DLBC for removal of batch effects. Analysis is pending.

网页工具

如果你下载了这么多数据文件,而不会写代码,那就必须求助于网页工具了

  • Broad Institute FireCloud (link is external)(link is external)The Broad Institute

  • cBioPortal for Cancer Genomics (link is external)(link is external)Memorial Sloan-Kettering Cancer Center

  • Next-Generation Clustered Heat Maps (link is external)(link is external)MD Anderson Cancer Center

TCGA数据库maf突变资料官方大全

如果你完全没有看懂我在讲什么,那你可能需要下面的课程:

生信技能树(爆款入门培训课)全国巡讲约你

生信技能树(爆款入门培训课)巡讲第一站-重庆  (已结束)

生物信息学全国巡讲之粤港澳大湾区专场 (已结束)

生信技能树(爆款入门培训课)巡讲第二站-济南 (已结束)

生信技能树(爆款入门培训课)巡讲-千呼万唤北京(已结束)

接下来是广州和上海,请联系小助手抢购学习名额吧!

写在最后

因为文中太多链接,所以大家可能需要点击阅读原文去跳转
然后因为这些资源介绍太简单,没有资格列入我的TCGA 28篇教程,所以大家就随意看看。


您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存