一款好用的大规模数据选择清除分析软件
点击上面“蓝字”关注我们
1
介绍
群体选择清除(Selective Sweeps)分析是研究群体适应性的过程,计算软件和原理比较多,今天介绍一款对于单个群体大样本量的选择分析软件。SweeD基于复合似然比测试检验全基因组选择清除分析,在SweepFinder算法基础上改进,并且全面优于前者。
2
安装
下载:https://cme.h-its.org/exelixis/resource/download/software/SweeD_v3.2.1_Linux.tar.gz
tar -xzvf SweeD_v3.2.1_Linux.tar.gz
cd SweeD_v3.2.1_Linux
make -f Makefile.gcc
查看:./SweeD -help 会有参数说明
3
3、输入文件
一共支持5种输入文件格式:
1
The SweepFinder format
一共4列:
location: the location of a SNP (SNP位置)
x: the number of sequences carry the derived allele for a SNP (derived allele SNP数目)
n: the number of valid sequences at a SNP (SNP总数)
folded: a binary character which denotes if the SNP is unfolded (0) or folded (1).
2
FASTA format
这个大家很熟悉了,不做多的解说。
3
ms-like format
Hudson’s ms outputs binary data (0 and 1) instead of DNA data (A, C, G, or T). Usually, state 1 is called ‘derived’ and state 0 is called ‘ancestral’.
4
MaCS-like format
MaCS [Chen et al., 2009] is a Markovian coalescent simulator.这个格式不常见这里就不做详细解读。
5
VCF format
VCF格式是我们比较熟悉的,用此格式作为输入计算,简单快捷。
4
运行命令
SweeD -name test -input input.file -grid 10000
其中各参数如下:
-name: Specifies a name for the run and the output files. 定义一个名字
-input: Specifies the name of the input alignment file. Supported file formats: SF (Sweep Finder) format.
-grid: Specifies the number of positions in the alignment where the CLR will be computed.
5
5. 输出结果
输出两个文件:
1)information file (SweeD_Info.runName), which contains information related to the run of the program (the command line for instance). 信息文件包含运行过程相关信息。
2)report file (SweeD_Report.runName), which consists the main output file of the program (the score of the statistic at each position). 该文件就是我们要的结果文件。
主要有3列:
第一列:the alignment positions where the SweeD score is calculated 位置
第二列:the corresponding likelihood value 似然值
第三列:and the corresponding α value, which is a function of the selection coefficient, the recombination rate and the effective population size.
参考文献
Gary K Chen, Paul Marjoram, and Jeffrey D Wall. Fast and flexible simulation of dna sequence data. Genome Res, 19(1):136-142, Jan 2009. doi: 10.1101/gr.083634.108. URL http://dx.doi.org/10.1101/gr.083634.108.
Richard R Hudson. Generating samples under a wright-fisher neutral model of genetic variation.Bioinformatics, 18(2):337-338, Feb 2002.
Pavlos P , Živković Daniel, Alexandros S , et al. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes[J]. Molecular Biology and Evolution(9):9.
作者:小龙
审稿:童蒙
排版:amethyst