


  • gene and protein expression

  • copy number

  • DNA methylation

  • somatic mutation


链接是 https://gdc.cancer.gov/about-data/publications/pancanatlas :

  • RNA (Final) - EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv

  • RPPA (Final) - TCGA-RPPA-pancan-clean.txt

  • DNA Methylation (450K Only) - jhu-usc.edu_PANCAN_HumanMethylation450.betaValue_whitelisted.tsv

  • DNA Methylation (Merged 27K+450K Only) - jhu-usc.edu_PANCAN_merged_HumanMethylation27_HumanMethylation450.betaValue_whitelisted.tsv

  • miRNA (Batch Effects Normalized miRNA data)

  • Sample List - PanCanAtlas_miRNA_sample_information_list.txt

  • Protocol Platform - pancanMiRs_EBadjOnProtocolPlatformWithoutRepsWithUnCorrectMiRs_08_04_16.csv

  • Copy Number - broad.mit.edu_PANCAN_Genome_Wide_SNP_6_whitelisted.seg

  • ABSOLUTE-annotated MAF - TCGA_consolidated.abs_mafs_truncated.fixed.txt.gz

  • ABSOLUTE-annotated seg file - TCGA_mastercalls.abs_segtabs.fixed.txt

  • ABSOLUTE purity/ploidy file - TCGA_mastercalls.abs_tables_JSedit.fixed.txt

  • Mutations - mc3.v0.2.8.PUBLIC.maf.gz

  • TCGA-Clinical Data Resource (CDR) Outcome* -



  • A curated resource of the clinical annotations for TCGA data and provides recommendations for use of clinical endpoints

  • It is strongly recommended that this file be used for clinical elements and survival outcome data first; more details please see the TCGA-CDR paper(link is external).

  • Clinical with Follow-up - clinical_PANCAN_patient_with_followup.tsv

  • Merged Sample Quality Annotations - merged_sample_quality_annotations.tsv

  • PARADIGM Pathway Inference Matrix - merge_merged_reals.tar.gz


介绍一个去除了批次效应并且归一化好了的 RNA-seq表达矩阵

File: EB++AdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv
Contains batch normalized RNASeqV2 mRNA data.
20531 genes (rows) x 11069 samples (columns). ~1.6 GB file size.

File: EB++GeneExpAnnotation.tsv
Contains annotations about exactly which samples were adjusted and which weren't

Adjustment procedure:

  1. All Hi-Seq data from UNC were unchanged, with the exception of PRAD (prostate)

  2. All data from BCGSC, whether Hi-Seq or GA, were unchanged

  3. PRAD batch IDs 312 and 320 were adjusted to remove batch effects. Remaining PRAD data were unchanged. See PCA-plus plot BEFORE correction and the justification for correction

  4. All GA samples from UNC were adjusted to remove platform effects between UNC Hi-Seq and GA samples. The tumor types containing UNC GA samples that were adjusted are UCEC, COAD, and READ.

  5. Genes with mostly zero reads or with residual batch effects (approx. 2-3k or 10% of genes) were removed from the adjusted samples and replaced with NAs. No genes were removed from samples with "No Change" status.

  6. Genes were adjusted using a novel algorithm called EB++; a variant of Empirical Bayes/ComBat algorithm with training/testing features added.

Future adjustments:

  1. Removal of any platform effects in GA samples vs. Hi-Seq from BCGSC. The tumor types potentially affected will be LAML, STAD, and ESCA. Analysis is pending.

  2. Possible adjustment of all samples from BCGSC to remove center effects between BCGSC and UNC. Tumor types potentially affected will be LAML, STAD, ESCA and OV. Analysis is pending.

  3. Addition of microarray samples for GBM and OV.

  4. Potential adjustment of DLBC for removal of batch effects. Analysis is pending.



  • Broad Institute FireCloud (link is external)(link is external)The Broad Institute

  • cBioPortal for Cancer Genomics (link is external)(link is external)Memorial Sloan-Kettering Cancer Center

  • Next-Generation Clustered Heat Maps (link is external)(link is external)MD Anderson Cancer Center




然后因为这些资源介绍太简单,没有资格列入我的TCGA 28篇教程,所以大家就随意看看。

