一款好用的大规模数据选择清除分析软件

Original 生信阿拉丁生信阿拉丁 2022-05-16

收录于合集 #"利器"专辑 41个

点击上面“蓝字”关注我们

介绍

群体选择清除（Selective Sweeps）分析是研究群体适应性的过程，计算软件和原理比较多，今天介绍一款对于单个群体大样本量的选择分析软件。SweeD基于复合似然比测试检验全基因组选择清除分析，在SweepFinder算法基础上改进，并且全面优于前者。

安装

下载：https://cme.h-its.org/exelixis/resource/download/software/SweeD_v3.2.1_Linux.tar.gz

tar -xzvf SweeD_v3.2.1_Linux.tar.gz
cd SweeD_v3.2.1_Linux
make -f Makefile.gcc

查看：./SweeD -help 会有参数说明

3、输入文件

一共支持5种输入文件格式：

The SweepFinder format

一共4列：

location: the location of a SNP （SNP位置）
x: the number of sequences carry the derived allele for a SNP （derived allele SNP数目）
n: the number of valid sequences at a SNP （SNP总数）
folded: a binary character which denotes if the SNP is unfolded (0) or folded (1).

FASTA format

这个大家很熟悉了,不做多的解说。

ms-like format

Hudson’s ms outputs binary data (0 and 1) instead of DNA data (A, C, G, or T). Usually, state 1 is called ‘derived’ and state 0 is called ‘ancestral’.

MaCS-like format

MaCS [Chen et al., 2009] is a Markovian coalescent simulator.这个格式不常见这里就不做详细解读。

VCF format

VCF格式是我们比较熟悉的，用此格式作为输入计算，简单快捷。

运行命令

SweeD -name test -input input.file -grid 10000

其中各参数如下：
-name: Specifies a name for the run and the output files. 定义一个名字

-input: Specifies the name of the input alignment file. Supported file formats: SF (Sweep Finder) format.

-grid: Specifies the number of positions in the alignment where the CLR will be computed.

5. 输出结果

输出两个文件：
1）information file (SweeD_Info.runName), which contains information related to the run of the program (the command line for instance). 信息文件包含运行过程相关信息。

2）report file (SweeD_Report.runName), which consists the main output file of the program (the score of the statistic at each position). 该文件就是我们要的结果文件。

主要有3列：

第一列：the alignment positions where the SweeD score is calculated 位置
第二列：the corresponding likelihood value 似然值
第三列：and the corresponding α value, which is a function of the selection coefficient, the recombination rate and the effective population size.

参考文献

Gary K Chen, Paul Marjoram, and Jeffrey D Wall. Fast and flexible simulation of dna sequence data. Genome Res, 19(1):136-142, Jan 2009. doi: 10.1101/gr.083634.108. URL http://dx.doi.org/10.1101/gr.083634.108.
Richard R Hudson. Generating samples under a wright-fisher neutral model of genetic variation.Bioinformatics, 18(2):337-338, Feb 2002.
Pavlos P , Živković Daniel, Alexandros S , et al. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes[J]. Molecular Biology and Evolution(9):9.

作者：小龙

审稿：童蒙

排版：amethyst

◆不可错过的单细胞转录组研究新维度：空间转录组

◆Pacbio和Nanopore测序技术之拳王争霸

◆单细胞转录组高级分析介绍

◆单细胞转录组亚群分析

◆单细胞转录组(Single cell RNA)概述