Summary:
Given a nucleotide, aminoacid or codon data file and a relevant
model, ClusterAnalysis computes the maximum likelihood pairwise
distance matrix for the data and reconstructs an ultrametric
(i.e. subject to global molecular clock) phylogenetic tree for
the sequences. Four methods of reconstruction can be used:
- UPGMA
- Unweighted pair group method using arithmetic averages.
- WPGMA
- Weighted pair group method using arithmetic averages.
- Complete Linkage
- Minimum Linkage
For details on the specifics of each method refer to:
Molecular
Systematics, by David M. Hillis (Editor), Craig Moritz
(Editor), Barbara K. Mable (Editor) 2nd
edition (January 1996) Sinauer Assoc; ISBN: 0878932828 , pg.
486-487.
Input:
A nucleotide, aminoacid or codon data file in any recognizable format. HYPHY
uses the following table
to translate nucleotide ambiguities (or aminoacid
characters). For codon files, any of the predefined
genetic code translation tables can be used to interpret
the data.
The user will be presented with a dialog
box to select which of the tree building methods listed above
to invoke.
Models: Any of the standard
nucleotide , aminoacid
or codon models can
be selected for the analysis. The entries in the initial distance
matrix represent the expected number of substitutions per unit
of evolution (nucleotide, amino acid or codon, respectively)
between each pair of sequences, computed according to the selected
model.
Output: The output is
the Newick tree string for the inferred phylogenetic tree including branch lengths. The tree string can
optionally be saved to a file.
Result
Processing Tools:
Nonparametric Bootstrap: given a number of iterations,
for each iteration:
resample (with replacement) the original
sequence data, reconstruct the phylogeny for the simulated data set, write out
the tree string to the user-specified output file; also each simulated tree is checked
against the tree yielded by the original data, and all the clades present in both
trees are noted. At the end of the run, a summary tree is printed; this tree has
the same topology as the tree reconstructed from the original data, and internal
branch lengths represent the proporion (or raw count) of simulated trees which
contained the clade starting at that internal node.
|