Summary:
Given a nucleotide, aminoacid or codon data file and a relevant
model, NeighborJoining computes the maximum likelihood pairwise
distance matrix for the data and reconstructs an additive phylogenetic
tree for the sequences, using the method of neighbor joining.
(Saitou and Nei, 1987).
For details on the specifics of the method refer to:
Molecular
Systematics, by David M. Hillis (Editor), Craig Moritz
(Editor), Barbara K. Mable (Editor) 2nd
edition (January 1996) Sinauer Assoc; ISBN: 0878932828 , pg.
488-490.
Input:
A nucleotide, aminoacid or codon data file in any recognizable format. HYPHY
uses the following table
to translate nucleotide ambiguities (or aminoacid
characters). For codon files, any of the predefined
genetic code translation tables can be used to interpret
the data.
The user will be presented with an option
box on what HYPHY should do if NJ encounters a negative
branch length. Two strategies are possible:
- Allow negative branch lengths - negative branch lengths
will not be suppressed.
- Negative branch lengths are forced to zero-
negative branch lengths are set to 0 and the 'balance' is transferred
to the other branch at that node.
Models: Any of the standard
nucleotide , aminoacid
or codon models can
be selected for the analysis. The entries in the initial distance
matrix represent the expected number of substitutions per unit
of evolution (nucleotide, amino acid or codon, respectively)
between each pair of sequences computed according to the selected
model.
Output: The output is
the Newick tree string for the inferred phylogenetic tree including branch lengths. The tree string can
optionally be saved to a file.
Result
Processing Tools:
Nonparametric Bootstrap: given a number of iterations,
for each iteration:
resample (with replacement) the original
sequence data, reconstruct the phylogeny for the simulated data set, write out
the tree string to the user-specified output file; also each simulated tree is checked
against the tree yielded by the original data, and all the clades present in both
trees are noted. At the end of the run, a summary tree is printed; this tree has
the same topology as the tree reconstructed from the original data, and internal
branch lengths represent the proporion (or raw count) of simulated trees which
contained the clade starting at that internal node.
|