3SEQ Recombination Detection Algorithm
This is the official home page for the 3SEQ recombination detection algorithm. C++ source code, p-value tables, and a manual for the latest version are available from the tabs on the left.
3SEQ is a command-line program that reads in a nucleotide sequence file, in phylip or aligned fasta format, and tests all sequence triplets in the file for a mosaic recombination signal indicating that one of the three sequences (the child) is a recombinant of the other two (the parents). The statistical test used is a non-parametric test for mosaicism whose p-values should be pre-computed once with the 3SEQ executable (or downloaded).
For bug reports or questions, please feel free to email Maciej Boni at [email protected] or Ha Minh Lam at [email protected]
You can contribute to the repository here.
Source code for command-line version of 3SEQ
The latest version is v1.7 build 170612 and the C++ source files can be compiled on Linux/Unix systems and Mac OSX.
3seq_build_170612.zip
This version includes the ability to self-build large p-value tables, faster breakpoint calculations, and a mode where the sequence alignment can be repeatedly subsampled and tested for recombination.
Extract the files by typing unzip 3seq_build_170612.zip
at the command prompt. Then, navigate one directory down to the new files, type “make” to compile the source code (ignore warnings), and type “./3seq” to see if everything compiled correctly; if it did, you should see a list of usage modes. The downloaded archive contains five example sequence files. To make sure 3SEQ is working correctly, you can type
./3seq -i den2.aln
and this should give you some basic information on these dengue virus sequences.
Read the manual to see how to set up your p-value table file, or see the quick start tab on the left.
Type
./3seq -i mtDNA.aln
and this should give you some basic information about these 262 human mitochondrial DNA sequences (see
Kivisild et al, 2006).
In order to test these sequences for recombination, you need to build a p-value table the first time you use 3SEQ. Do this with ./3seq -g myPvalueTable500 500
Building this 500 × 500 × 500 table should just take a few minutes. You can move this file to another location if you like.
To analyze the mitochondrial data using this new p-value table, type ./3seq -f mtDNA.aln -ptable myPvalueTable500 -id myFirstRun
and 3SEQ will test all 262 × 261 × 260 triplets in this sequence alignment. This should take less than ten minutes. From now on, 3SEQ will be associated with myPvalueTable500 and you no longer need to use the -ptable option.
You will see that 3SEQ generates some output to the screen and four files that start with ‘myFirstRun’. The screen output should say that clonal (non-recombinant) evolution should be rejected with p=.0046. The screen output will also be stored in ‘myFirstRun.3s.log’.
The other file to make note of right now is ‘myFirstRun.3s.rec’ which is a tab-delimited file showing which sequences were identified as recombinant (third column). For each recombinant, this file also shows the most likely parents (first two columns), p-values in various formats (columns 7 to 11), and breakpoint ranges (final columns).
Windows Version
Darren Martin very kindly coded up a Windows version of 3SEQ that can be found as part of the RDP4 package.
Note that the Windows and Linux/Mac versions of 3SEQ may have slightly different behaviors. If you have any questions please email Darren or myself.
P-value tables for 3SEQ
A pre-computed binary-format table of p-values can be downloaded from the link below
pvaluetable.2017.700.tgz (~200mb; extracts to 438mb)
The above link is to a public Dropbox file, so you can also simply add this p-value table to your Dropbox and wait for it to download. This file will work on 64-bit systems only. Please let us know if you are using 3seq on a 32-bit system and need help.
An important note on filesystems: you should keep your p-value table file on the same type of filesystem (e.g. ntfs, fat32, ext3) as your 3SEQ executable file, otherwise there may be a read error when 3SEQ attempts to read in the p-values.
Slowly-evolving manual
Download the manual by clicking on this link.
How to cite 3SEQ
When using 3SEQ please cite
Lam HM, Ratmann O, Boni MF. Improved algorithmic complexity for the 3SEQ recombination detection algorithm. Mol Biol Evol, 35(1):247-251, 2018.
When referring to certain core parts of the statistics used, you may want to cite
Boni MF, Posada D, Feldman MW. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics, 176:1035-1047, 2007.
If you are making considerable use of the Hogan-Siegmund approximations, you should cite
Hogan ML, Siegmund D. Large deviations for the maxima of some random fields. Adv Appl Math, 7:2-22, 1986.
June 5 2017
A major upgrade occurred from version 1.1 to version 1.7. The old version 1.1 can be downloaded from
source files for v1.1
and you will need to download this older p-value table
PVT_COMPACT_700.tgz
(~200mb; extracts to 438mb)
The above link is to a public Dropbox file, so you can also simply add this P-value table to your Dropbox and wait for it to download.
The manual for v1.1 is here.
Unless you have a compelling reason to use the old version, we recommend you use the new version.
Funding Acknowledgement
Since 2004, the development of 3SEQ has been supported by a National Science Foundation PhD fellowship, NIH grant R01 GM28016 to Marcus W Feldman, NIH grant HG000205 to the Stanford Genome Technology Center, Medical Research Council grant G0600718 to Dominic Kwiatkowski, and Wellcome Trust grant 077078/Z/05/Z for the Major Overseas Programme in Vietnam.
A special thanks to Serafim Batzoglou for his lectures on hidden Markov Models which inspired the original design of the algorithm.