使用sequenza软件判定肿瘤纯度

Original jimmy 生信技能树 2022-06-07

软件官方主页是：http://www.cbs.dtu.dk/biotools/sequenza/ 算法描述及使用说明均可以在官网找到。

软件结构介绍

该软件包括两个部分:

a python-based preprocessing tool
an R package implementing the model fitting and visualization functions

python脚本部分可以分成2个步骤：

First, it calculates the GC content in sliding windows from a genome reference file in FASTA format.
Second, it processes the sequencing data from the tumor and normal specimens, which must be in the Pileup format, as output by SAMtools

得到的结果就需要导入到R里面进行后续处理！！！

R包的流程也可以分成3个步骤：

first, sequenza. extract efficiently reads the input file into R, performs GC-content normalization of the tumor versus normal depth ratio, and performs allele-specific segmentation using the ‘copynumber’ package.
Second, **sequenza.fit** applies the model to infer cellularity and ploidy parameters and copy number profiles.
Finally, sequenza.results returns the results of the estimation together with alternative solutions and visualization of the data and the model along the genome and the individual chromosomes.

发表该软件的文章当时使用了10 个 ovarian serous carcinomas (OVCA) 和 20 个clear-cell renal cell carcinomas (KIRC) 肿瘤样本做例子。并且与SNP array data analyzed by allele-specific copy number analysis of tumors (ASCAT)检测结果的一致性，还跟两个流行的软件absCN-seq，ABSOLUTE做了对比分析说明自己开发的sequenza工具表现最优！

软件使用方法

python部分非常好安装：

From Pypi

pip install sequenza-utils

From Sources

git clone https://bitbucket.org/sequenza_tools/sequenza-utils
cd sequenza-utils
python setup.py test
python setup.py install

安装完毕即可根据说明文档开始对自己的数据一步步处理：http://sequenza-utils.readthedocs.io/en/latest/ 老实说，这个是我见过最懒的说明书。后来发现我找错了，https://bitbucket.org/sequenza_tools/sequenza/ 其实有比较详尽的说明书。

sequenza-utils -h
if [  ! -f  hg38.gc50Base.txt.gz ]; then
sequenza-utils  gc_wiggle -f $reference  -w 50 -o - | gzip > hg38.gc50Base.txt.gz
fi  
sequenza-utils bam2seqz -gc hg38.gc50Base.txt.gz \
-F $reference -n $normal_bam -t  $tumor_bam | gzip > ${sample}.seqz.gz
sequenza-utils seqz_binning -w 50 -s ${sample}.seqz.gz | gzip > ${sample}_small.seqz.gz

得到的文件如下，取决于自己实验的wes数据量

 182M May  1 17:47 hg38.gc50Base.txt.gz
 54K May  1 19:36 S12_1_small.seqz.gz
 39K May  1 23:07 S12_13_small.seqz.gz
 1.6M May  2 01:03 S12_2_small.seqz.gz
 9.0M May  2 03:28 S12_22_small.seqz.gz

其实可以直接用varscan结果，不走自己的python上游流程

利用varscan的somatic mutation calling的结果，导入R，如下；

snp <- read.table("varscan.snp", header = TRUE, sep = "\t")
cnv <- read.table("varscan.copynumber", header = TRUE, sep = "\t")
seqz.data <- VarScan2seqz(varscan.somatic = snp, varscan.copynumber = cnv)
write.table(seqz.data, "my.sample.seqz", col.names = TRUE, row.names = FALSE, sep = "\t")

制作得到的my.sample.seqz文件即可进入R流程！

R 语言部分也是非常好安装，只是一个R包而已

Reference manual:	sequenza.pdf
Vignettes:	Sequenza user guide

这个时候的说明书详尽程度尚可！

library(sequenza)
#In the package is provided a small seqz file
data.file <-  system.file("data", "example.seqz.txt.gz", package = "sequenza")
data.file
# 根据前面介绍的，三部曲
# sequenza.extract: process seqz data, normalization and segmentation
test <- sequenza.extract(data.file, verbose = FALSE)
#sequenza.fit: run grid-search approach to estimate cellularity and ploidy
CP <- sequenza.fit(test)
#sequenza.results: write files and plots using suggested or selected solution
sequenza.results(sequenza.extract = test,
                 cp.table = CP, sample.id = "Test",
                 out.dir="TEST")
## 然后是理解输出结果文件及各种成图

真的是不能太方便了，所有的输出结果尽在眼前！！！

不过，CP那个步骤可能会比较耗时。

结果文件非常多，需要仔细理解

当然了，我感兴趣的其实只有：

"cellularity"    "ploidy.estimate"   "ploidy.mean.cn"
0.89    2   1.76205114608269
0.89    2   1.76205114608269
0.89    2   1.76205114608269

一些炫目的图：

可以看到这个软件顺便找了CNV，所以：

生信技能树GATK4系列教程
GATK4的gvcf流程
你以为的可能不是你以为的
新鲜出炉的GATK4培训教材全套PPT，赶快下载学习吧
曾老湿最新私已：GATK4实战教程
GATK4的CNV流程-hg38

我不想赚你的钱，不行吗？（推荐阅读）

值得一提的是，对肿瘤外显子来分析CNV，我测试过很多工具了，这个GATK的值得一试！

WES的CNV探究-conifer软件使用

单个样本NGS数据如何做拷贝数变异分析呢

肿瘤配对样本用varscan 做cnv分析

使用cnvkit来对大批量wes样本找cnv

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

法明传[2024]173号：关于加快推进起诉状、答辩状示范文本全面应用工作的通知(附下载链接)

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

使用sequenza软件判定肿瘤纯度

软件结构介绍

软件使用方法

其实可以直接用varscan结果，不走自己的python上游流程

您可能也对以下帖子感兴趣

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

法明传[2024]173号：1月1日起，未用示范文本提交起诉状，部分法院将不予立案

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

法明传[2024]173号：关于加快推进起诉状、答辩状示范文本全面应用工作的通知(附下载链接)

2025.1.1起，全国法院全面推进应用民事起诉状、答辩状示范文本(附下载链接)

生成图片，分享到微信朋友圈

使用sequenza软件判定肿瘤纯度

软件结构介绍

软件使用方法

其实可以直接用varscan结果，不走自己的python上游流程

您可能也对以下帖子感兴趣