肿瘤外显子数据处理系列教程(八)不同注释软件的比较(上):安装及使用
大家好,上一节我们将ANNOVAR的注释后的maf文件进行了可视化,这一节我们将比较不同注释软件(vep,annovar,gatk,snpeff) 的安装及使用。
首先按照我的代码习惯,都需要一个config文件来完成批量处理:
$ cat config
case1_biorep_A_techrep
case2_biorep_A
case3_biorep_A
case4_biorep_A
case5_biorep_A
......
ANNOVAR
安装
直接到官网:http://annovar.openbioinformatics.org/en/latest/user-guide/download/,注册需要教育邮箱,获取到安装包后,解压便可使用,当然还需要下载数据库数据:
nohup ./annotate_variation.pl -downdb -webfrom annovar gnomad_genome --buildver hg38 humandb/ >down.log 2>&1 &
这里我只下载了人类hg38参考基因组的数据,就非常大了:
41G humandb/
注释
根据ANNOVAR官网的说明,2019年10月24日已经更新了版本,本教程使用的ANNOVAR版本还是旧版了,旧版的ANNOVAR的注释比较复杂,在曾老师的博客中有多篇博文介绍它的用法,但主要看:ANNOVAR软件用法还可以更复杂:http://www.bio-info-trainee.com/4007.html,主要有3种类型的注释:
基于基因的注释,
exonic
,splicing
,ncRNA
,UTR5
,UTR3
,intronic
,upstream
,downstream
,intergenic
,使用geneanno
子命令。基于区域的注释,
cytoBand
,TFBS
,SV
,bed
,GWAS
,ENCODE
,enhancers
,repressors
,promoters
,使用regionanno
子命令。只考虑位点坐标基于数据库的过滤,
dbSNP
,ExAC
,ESP6500
,cosmic
,gnomad
,1000genomes
,clinvar
使用filter
子命令。考虑位点坐标同时关心突变碱基情况。
而我在这个系列教程中,用ANNOVAR对vcf文件所做的是基于基因的注释,注释脚本annovar.sh
如下:
cat config | while read id
do
echo "start ANNOVAR for ${id} " `date`
~/biosoft/ANNOVAR/annovar/table_annovar.pl ${id}_filter.vcf ~/biosoft/ANNOVAR/annovar/humandb/ \
-buildver hg38 \
-out annovar/${id} \
-remove \
-protocol refGene,knownGene,clinvar_20170905 \
-operation g,g,f \
-nastring . \
-vcfinput
echo "end ANNOVAR for ${id} " `date`
done
注释的速度非常快,一个样本大概就是6~7秒就可以注释完,每个样本输出3个文件:
case1_biorep_A_techrep.avinput
case1_biorep_A_techrep.hg38_multianno.txt
case1_biorep_A_techrep.hg38_multianno.vcf
在文件case1_biorep_A_techrep.hg38_multianno.txt
中的一个突变记录位点注释上的信息如下:
chr1 6146376 6146376 G T exonic CHD5 . nonsynonymous SNV CHD5:NM_015557:exon11:c.C1638A:p.N546K exonic CHD5 . nonsynonymous SNV CHD5:uc001amb.3:exon11:c.C1638A:p.N546K,CHD5:uc057btb.1:exon11:c.C1638A:p.N546K . . . . . 0.25 . 75 chr1 6146376 . G T . PASS DP=184;ECNT=1;NLOD=22.56;N_ART_LOD=-1.888e+00;POP_AF=1.000e-05;P_CONTAM=0.00;P_GERMLINE=-4.514e+01;TLOD=5.41 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:97,3:0.058:48,1:49,2:37,38:154,210:60:20:0.020,0.030,0.030:0.018,3.223e-03,0.979 0/0:75,0:0.020:45,0:30,0:36,0:154,0:0:0
GATK注释
GATK的注释方法在我们系列教程的第五篇:GATK的最佳实践已经有提到,不过那个时候我们还没讲如何注释:
安装
注释只用到一个工具:Funcotator,帮助文档在:https://software.broadinstitute.org/gatk/documentation/tooldocs/4.0.6.0/org_broadinstitute_hellbender_tools_funcotator_Funcotator.php,但是,gatk-4.0.6.0版本的Funcotator工具是BETA版,还不是很稳定,我测试过了之后发现其注释结果与ANNOVAR或者VEP注释的结果相差较大,所以这里我们使用最新版本的gatk-4.1.4.0
的Funcotator工具来做注释,方法比较简单,首先需要下载gatk-4.1.4.0.zip
安装包,解压出来即可使用
wget -c https://github.com/broadinstitute/gatk/releases/download/4.1.4.0/gatk-4.1.4.0.zip
tar -zxvf gatk-4.1.4.0.zip
然后需要下载数据库文件,是一个打包好的压缩文件,大小为237M,解压出来是3.4G:
nohup wget -c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/funcotator/funcotator_dataSources.v1.6.20190124g.tar.gz &
tar -zxvf funcotator_dataSources.v1.6.20190124g.tar.gz
## 237M funcotator_dataSources.v1.6.20190124g.tar.gz
## 3.4G funcotator_dataSources.v1.6.20190124g/
注释
然后使用GATK的Funcotator
工具进行在线注释。用到的脚本Funcotator.sh
如下:
GATK=/home/hcguo/biosoft/gatk/gatk-4.1.4.0/gatk
ref=/public/biosoft/GATK/resources/bundle/hg38/Homo_sapiens_assembly38.fasta
bed=/home/llwu/reference/index/human/CCDS/GRCh38.bed
source=/home/hcguo/database/gatk_funcotator/funcotator_dataSources.v1.6.20190124g
cat config | while read id
do
sample=$id
echo "start Funcotator for ${id} " `date`
$GATK Funcotator -R $ref \
-V ${sample}_filter.vcf \
-O gatk/${sample}_funcotator.tmp.maf \
--data-sources-path ${source} \
--output-file-format MAF \
--ref-version hg38
echo "end Funcotator for ${id} " `date`
done
注释的速度也相当快,一个样本10秒左右就注释好了,同样我们以case1_biorep_A_techrep
这个样本为例,打开查看一下结果,这里跳过了头文件:
$ less case1_biorep_A_techrep_funcotator.maf|grep -v '^#'|head -2
Hugo_Symbol Entrez_Gene_Id Center NCBI_Build Chromosome Start_Position End_Position Strand Variant_Classification Variant_Type Reference_Allele Tumor_Seq_Allele1 Tumor_Seq_Allele2 dbSNP_RS dbSNP_Val_Status Tumor_Sample_Barcode Matched_Norm_Sample_Barcode Match_Norm_Seq_Allele1 Match_Norm_Seq_Allele2 Tumor_Validation_Allele1 Tumor_Validation_Allele2 Match_Norm_Validation_Allele1 Match_Norm_Validation_Allele2 Verification_StatusValidation_Status Mutation_Status Sequencing_Phase Sequence_Source Validation_Method Score BAM_File Sequencer Tumor_Sample_UUID Matched_Norm_Sample_UUID Genome_Change Annotation_Transcript Transcript_Strand Transcript_Exon Transcript_Position cDNA_Change Codon_Change Protein_Change Other_Transcripts Refseq_mRNA_Id Refseq_prot_Id SwissProt_acc_Id SwissProt_entry_Id Description UniProt_AApos UniProt_Region UniProt_SitUniProt_Natural_Variations UniProt_Experimental_Info GO_Biological_Process GO_Cellular_Component GO_Molecular_Function COSMIC_overlapping_mutations COSMIC_fusion_genes COSMIC_tissue_types_affected COSMIC_total_alterations_in_gene Tumorscape_Amplification_Peaks Tumorscape_Deletion_Peaks TCGAscape_Amplification_Peaks TCGAscape_Deletion_Peaks DrugBank ref_context gc_content CCLE_ONCOMAP_overlapping_mutations CCLE_ONCOMAP_total_mutations_in_genCGC_Mutation_Type CGC_Translocation_Partner CGC_Tumor_Types_Somatic CGC_Tumor_Types_Germline CGC_Other_Diseases DNARepairGenes_Activity_linked_to_OMIM FamilialCancerDatabase_Syndromes MUTSIG_Published_Results OREGANNO_ID OREGANNO_Values tumor_f t_alt_count t_ref_count n_alt_count n_ref_count Gencode_27_secondaryVariantClassification ACMGLMMLof_LOF_Mechanism ACMGLMMLof_Mode_of_Inheritance ACMGLMMLof_Notes ACMG_recommendation_Disease_Name ClinVar_VCF_AF_ESP ClinVar_VCF_AF_EXAC ClinVar_VCF_AF_TGP ClinVar_VCF_ALLELEID ClinVar_VCF_CLNDISDB ClinVar_VCF_CLNDISDBINCL ClinVar_VCF_CLNDN ClinVar_VCF_CLNDNINCL ClinVar_VCF_CLNHGVS ClinVar_VCF_CLNREVSTAT ClinVar_VCF_CLNSIG ClinVar_VCF_CLNSIGCONF ClinVar_VCF_CLNSIGINCL ClinVar_VCF_CLNVC ClinVar_VCF_CLNVCSO ClinVar_VCF_CLNVI ClinVar_VCF_DBVARID ClinVar_VCF_GENEINFO ClinVar_VCF_MC ClinVar_VCF_ORIGIN ClinVar_VCF_RS ClinVar_VCF_SSR LMMKnown_LMM_FLAGGED DP ECNT IN_PON NLOD N_ART_LOD POP_AF P_CONTAM P_GERMLINE RPA RU STR TLOD
CHD5 __UNKNOWN__ __UNKNOWN__ hg38 1 6146376 6146376 + Nonsense_Mutation SNP G G T __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ NA NA __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ g.chr1:6146376G>T ENST00000262450.7 - 11 173c.1638C>A c.(1636-1638)tgC>tgA p.C546* __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ CATCCATGTCGTTCTTTCTTT 0.5860349127182045 __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ __UNKNOWN__ 0.058 3 97 0 75 false 184 1 22.56 -1.888e+00 1.000e-05 0.00 -4.514e+01 5.41
可以看到这个工具一下子注释了几十个数据库,有一百多列的信息,当然很多注释结果是__UNKNOWN__
。
vep注释
在以前如要要用vep的话,安装比较麻烦,甚至需要管理员权限:http://www.bio-info-trainee.com/1600.html
也可以参考ensemble数据库对应的安装教程:http://asia.ensembl.org/info/docs/tools/vep/script/vep_download.html
里面介绍了多种方法,最后一种是用docker装vep,比较麻烦一点,但是也不难,需要学习一下docker,也是需要管理员权限,这个方法的好处在于,docker的vep镜像中已经把各种模块配置好了,启动镜像就可以用vep注释,不需要下载,如果你有管理员权限,可以尝试这种方法。如果没有管理员权限,也可以像我一样自己申请一台阿里云服务器,最低配置就好,对于这台服务器,你就有管理员权限啦,安装方法见上面的链接,这里就不再演示了(其实我也有试了其他几种方法的,但是最后都因为perl安装依赖模块缺失而放弃)
安装docker
sudo apt install docker-ce
docker --version
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu artful stable"
sudo docker run hello-world
用docker安装vep镜像
docker pull ensemblorg/ensembl-vep
docker run -t -i ensemblorg/ensembl-vep ./vep
# Create a directory on your machine:
mkdir $HOME/vep_data
# Make sure that the created directory on your machine has read and write access granted
# so the docker container can write in the directory (VEP output):
chmod a+rwx $HOME/vep_data
docker run -t -i -v $HOME/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep
docker run -t -i -v $HOME/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep perl INSTALL.pl -a cfp -s homo_sapiens -y GRCh38 -g all
启动镜像
根据上面的教程,我们在家目录中新建了一个vep_data
的文件夹,然后把我们需要注释的vcf文件拷贝到vep_data
目录下,然后启动vep镜像:
sudo docker run -t -i -v $HOME/vep_data:/opt/vep/.vep ensemblorg/ensembl-vep
注释
用vep注释的脚本vep.sh
如下:
cat /opt/vep/.vep/SRP070662_filter_vcf/config | while read id
do
echo "start vep_annotation for ${id} " `date`
./vep --cache --offline --format vcf --vcf --force_overwrite \
--dir_cache /opt/vep/.vep/ \
--dir_plugins /opt/vep/.vep/Plugins/ \
--input_file /opt/vep/.vep/SRP070662_filter_vcf/${id}_filter.vcf \
--output_file /opt/vep/.vep/VEP_annotation/${id}_vep.vcf
echo "end vep_annotation for ${id} " `date`
done
注释后结果,每个样本会输出两个文件,一个是vcf文件,一个是html文件,如:
case1_biorep_A_techrep_vep.vcf
case1_biorep_A_techrep_vep.vcf_summary.html
同样的,我们查看一下样本case1_biorep_A_techrep_vep.vcf
注释的结果,可以发现,vep的注释结果就是在原先的vcf文件的INFO
列后面注释上了很多的信息,这些信息以|
符号作为分隔
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT case1_biorep_A_techrep case1_germline
chr1 6146376 . G T . PASS DP=184;ECNT=1;NLOD=22.56;N_ART_LOD=-1.888e+00;POP_AF=1.000e-05;P_CONTAM=0.00;P_GERMLINE=-4.514e+01;TLOD=5.41;CSQ=T|missense_variant|MODERATE|CHD5|ENSG00000116254|Transcript|ENST00000262450|protein_coding|11/42||||1936|1638|546|N/K|aaC/aaA|||-1||HGNC|HGNC:16816,T|upstream_gene_variant|MODIFIER|CHD5|ENSG00000116254|Transcript|ENST00000462991|nonsense_mediated_decay|||||||||||2272|-1|cds_start_NF|HGNC|HGNC:16816,T|missense_variant&NMD_transcript_variant|MODERATE|CHD5|ENSG00000116254|Transcript|ENST00000496404|nonsense_mediated_decay|11/34||||1638|1638|546|N/K|aaC/aaA|||-1||HGNC|HGNC:16816 GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:97,3:0.058:48,1:49,2:37,38:154,210:60:20:0.020,0.030,0.030:0.018,3.223e-03,0.979 0/0:75,0:0.020:45,0:30,0:36,0:154,0:0:0
SNPEFF注释
安装
snpeff的安装方法也比较简单,参考:
http://snpeff.sourceforge.net/download.html#source
Download latest version
wget http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip
Unzip file
unzip snpEff_latest_core.zip
数据库文件,我们需要下载人类参考基因组hg38版本的注释文件Homo_sapiens
,但是:By default SnpEff automatically downloads and installs the database for you, so you don't need to do it manually.软件默认下载了GRCh38.86
,所以我们不需要自己下载,如果需要其他数据库,可以通过下面命令查询和获取下载的链接,有20000多个database,足够了,不过需要注意一下版本的问题:
java -jar snpEff.jar databases | less -S
注释
对于癌症样本,有其特定的注释方法:
http://snpeff.sourceforge.net/SnpEff_manual.version_4_0.html#cancer
需要一个samples_cancer.txt文件,内容如下:
cat snpeff/samples_cancer.txt
case1_germline case1_biorep_A_techrep
case2_germline case2_biorep_A
case3_germline case3_biorep_A
case4_germline case4_biorep_A
case5_germline case5_biorep_A
......
注释的脚本snpeff.sh
如下
dir=~/SRP070662/8.mutect2/SRP070662_filter_vcf
config=~/SRP070662/8.mutect2/SRP070662_filter_vcf/snpeff/config
cat ${config} | while read id
do
echo "start snpeff_annotation for ${id} " `date`
java -Xmx4g -jar snpEff.jar -v \
-cancer -cancerSamples ${dir}/snpeff/samples_cancer.txt \
GRCh38.86 \
${dir}/${id}_filter.vcf > ${dir}/snpeff/${id}_eff.vcf
echo "end snpeff_annotation for ${id} " `date`
done
这个运行速度比较慢一些,一个样本大概需要一分钟的时间,我们还是来看一下注释的结果,同样以这个样本case1_biorep_A_techrep
为例:
head -1 case1_biorep_A_techrep_eff.vcf
chr1 6461445 . G T . PASS DP=21;ECNT=1;NLOD=2.70;N_ART_LOD=-1.049e+00;POP_AF=1.000e-05;P_CONTAM=0.00;P_GERMLINE=-1.853e+00;TLOD=6.33;ANN=T|missense_variant|MODERATE|TNFRSF25|ENSG00000215788|transcript|ENST00000377782.7|protein_coding|10/10|c.1270C>A|p.Arg424Ser|1338/1632|1270/1281|424/426||,T|missense_variant|MODERATE|TNFRSF25|ENSG00000215788|transcript|ENST00000356876.7|protein_coding|10/10|c.1243C>A|p.Arg415Ser|1331/1625|1243/1254|415/417||,T|missense_variant|MODERATE|TNFRSF25|ENSG00000215788|transcript|ENST00000351959.9|protein_coding|9/9|c.1132C>A|p.Arg378Ser|1200/1441|1132/1143|378/380||,T|missense_variant|MODERATE|TNFRSF25|ENSG00000215788|transcript|ENST00000348333.7|protein_coding|9/9|c.1108C>A|p.Arg370Ser|1108/1119|1108/1119|370/372||,T|missense_variant|MODERATE|TNFRSF25|ENSG00000215788|transcript|ENST00000351748.7|protein_coding|5/5|c.694C>A|p.Arg232Ser|694/705|694/705|232/234||,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000480393.5|nonsense_mediated_decay|9/9|c.*582C>A|||||1463|,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000485036.5|nonsense_mediated_decay|9/9|c.*530C>A|||||1215|,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000414040.6|nonsense_mediated_decay|9/9|c.*530C>A|||||1215|,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000502588.5|nonsense_mediated_decay|7/7|c.*530C>A|||||1215|,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000502730.5|nonsense_mediated_decay|5/5|c.*446C>A|||||677|,T|3_prime_UTR_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000510563.5|nonsense_mediated_decay|8/8|c.*530C>A|||||1215|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000377828.5|protein_coding||c.*1299G>T|||||501|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000461727.5|protein_coding||c.*1299G>T|||||516|,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000461703.6|processed_transcript||n.*2091C>A|||||2091|,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000453341.1|retained_intron||n.*2666C>A|||||2666|,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000537245.5|protein_coding||c.*6118C>A|||||4647|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000477679.1|retained_intron||n.*1034G>T|||||1034|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000468561.1|processed_transcript||n.*75G>T|||||75|,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000453260.6|retained_intron||n.*35C>A|||||35|,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000489097.5|retained_intron||n.*4647C>A|||||4647|,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000535355.5|protein_coding||c.*6118C>A|||||4647|,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000377748.5|protein_coding||c.*6118C>A|||||4647|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000434576.1|protein_coding||c.*260G>T|||||75|WARNING_TRANSCRIPT_NO_START_CODON,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000475730.5|processed_transcript||n.*311C>A|||||311|,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000469691.5|retained_intron||n.*549C>A|||||549|,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000515145.1|retained_intron||n.*3122C>A|||||3122|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000416731.5|protein_coding||c.*1299G>T|||||1047|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000475228.5|protein_coding||c.*1393G>T|||||1393|WARNING_TRANSCRIPT_INCOMPLETE,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000633239.1|protein_coding||c.*1299G>T|||||519|WARNING_TRANSCRIPT_NO_START_CODON,T|downstream_gene_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000481401.5|protein_coding||c.*1192C>A|||||1192|WARNING_TRANSCRIPT_INCOMPLETE,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000400913.5|protein_coding||c.*6118C>A|||||4647|,T|downstream_gene_variant|MODIFIER|PLEKHG5|ENSG00000171680|transcript|ENST00000340850.9|protein_coding||c.*6118C>A|||||4647|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000636330.1|protein_coding||c.*5248G>T|||||4774|,T|downstream_gene_variant|MODIFIER|ESPN|ENSG00000187017|transcript|ENST00000636644.1|protein_coding||c.*4182G>T|||||4182|WARNING_TRANSCRIPT_INCOMPLETE,T|non_coding_transcript_exon_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000473343.5|retained_intron|4/4|n.1297C>A||||||,T|non_coding_transcript_exon_variant|MODIFIER|TNFRSF25|ENSG00000215788|transcript|ENST00000513135.5|retained_intron|6/6|n.3189C>A|||||| GT:AD:AF:F1R2:F2R1:MBQ:MFRL:MMQ:MPOS:SA_MAP_AF:SA_POST_PROB 0/1:6,2:0.278:3,2:3,0:29,39:204,184:60:14:0.00,0.253,0.250:0.023,0.025,0.952 0/0:9,0:0.126:3,0:6,0:24,0:169,0:0:0
同样是在原来的vcf文件的INFO
列添加了许多注释信息,这里仅仅是该样本的第一个vcf突变位点,就注释上了大量的信息,其中一个原始的有多个转录本,这个结果和vep注释后拿到的结果非常相似。
号外:生信技能树全国巡讲11月在福州和上海,点击了解报名哈:(福州、上海见!)全国巡讲第19-20站(生信入门课加量不加价)