Summary:
Given a nucleotide, aminoacid or codon data file and a relevant
model, Sequential Addition heuristically reconstructs a phylogenetic
tree for the sequences in the data using the method of sequential
(or stepwise) addition with maximum likelihood criterion.
For a description of the method refer to:
Molecular
Systematics, by David M. Hillis (Editor), Craig Moritz
(Editor), Barbara K. Mable (Editor) 2nd
edition (January 1996) Sinauer Assoc; ISBN: 0878932828 , pg.
482-483.
Input:
A nucleotide, aminoacid or codon data file in any recognizable format. HYPHY
uses the following table
to translate nucleotide ambiguities (or aminoacid
characters). For codon files, any of the predefined
genetic code translation tables can be used to interpret
the data.
The method of sequential addition
starts with a 3 taxa tree and adds sequences to the tree one
at a time. The user can choose to add the sequences in the
given order, i.e. the order they appear in the data file,
or in random order, in which the sequence order will be
randomly shuffled before the algorithm is run.
The user will be prompted to choose
a branch swapping method for the tree reconstruction analysis.
- No Swapping:No branch swapping is performed.
- Complete NNI: Greedy NNI is performed
after each sequence is added (starting with 4 sequence trees). The number
of additional trees examined is proportional to the square of the number of
sequences. Recommended.
- Complete SPR: SPR is performed
after each sequence is added (starting with 4 sequence trees). The number
of additional trees examined is proportional to the cube of the number of
sequences. This is the slowest, but most thorough method.
- Global NNI: Greedy NNI is performed
after ALL sequences are added, i.e. on the final tree. The number
of additional trees examined is proportional to the number of
sequences.
- Global SPR: SPR is performed
after ALL sequences are added, i.e. on the final tree. The number
of additional trees examined is proportional to the square of the number of
sequences.
- NNI+SPR: Greedy NNI is performed
after each sequence is added, except the final tree, which is
subjected to SPR. The number
of additional trees examined is proportional to the square of the number of
sequences.
The selection of the starting 3 taxa tree is likely
to affect the final result and there are 3 possible strategies
to choose the 3 sequences which will comprise the initial tree.
- First 3: First three sequences in the data file are
included in the initial tree.
- Choose 3: The user selects three sequences from the
data file to be included in the initial tree.
- Best 3: HYPHY examines all possible combinations
of three sequences from the data and chooses the one which yields
the highest likelihood tree.
- Random: The tree starting sequences are chosen at random.
Models: Any of the standard
nucleotide , aminoacid
or codon models can
be selected for the analysis. The selected model will be applied
to estimate the maximum likelihood parameter values, and the
likelihood of the tree being tested.
Output: The standard
output depends on the "Likelihood
Output" option selected in "Preferences". By default, that option is to print the maximum
ln-likelihood followed by a the inferred tree string with branch
lengths representing the expected number of substitutions per
codon. For a complete list of output options, refer the Output Formats
page. Also, the estimated vector of equilibrium frequencies is
displayed. The inferred tree topology can be optionally saved
to a file. GUI versions of HyPhy will also display
current best tree in a tree panel.
Result
Processing Tools: After the analysis is finished, the
following options are available in the "Results" submenu of the "Analyses" menu:
- View Results.
- Save Results.
- (Co)Variance Estimates.
- Nonparametric Bootstrap.
Given a number of iterations,
for each iteration: resample (with replacement) the original
sequence data, reconstruct the phylogeny for the simulated data set, write out
the tree string to the user-specified output file; also each simulated tree is checked
against the tree yielded by the original data, and all the clades present in both
trees are noted. At the end of the run, a summary tree is printed; this tree has
the same topology as the tree reconstructed from the original data, and internal
branch lengths represent the proporion (or raw count) of simulated trees which
contained the clade starting at that internal node.
The (co)variance
estimates refer to the parameters in the inferred tree.
|