Navigation Banner
 
  HyPhy Documentation: Standard Analyses: NeighborJoining.bf

      Summary: Given a nucleotide, aminoacid or codon data file and a relevant model, NeighborJoining computes the maximum likelihood pairwise distance matrix for the data and reconstructs an additive phylogenetic tree for the sequences, using the method of neighbor joining. (Saitou and Nei, 1987).

For details on the specifics of the method refer to:

        Molecular Systematics, by David M. Hillis (Editor), Craig Moritz (Editor), Barbara K. Mable (Editor)         2nd edition (January 1996) Sinauer Assoc; ISBN: 0878932828 , pg. 488-490.

     Input: A nucleotide, aminoacid or codon data file in any recognizable format. HYPHY uses the following table to translate nucleotide ambiguities (or aminoacid characters). For codon files, any of the predefined genetic code translation tables can be used to interpret the data.

    The user will be presented with an option box on what HYPHY should do if NJ encounters a negative branch length. Two strategies are possible:

  • Allow negative branch lengths - negative branch lengths will not be suppressed.
  • Negative branch lengths are forced to zero- negative branch lengths are set to 0 and the 'balance' is transferred to the other branch at that node.

    Models: Any of the standard nucleotide , aminoacid or codon models can be selected for the analysis. The entries in the initial distance matrix represent the expected number of substitutions per unit of evolution (nucleotide, amino acid or codon, respectively) between each pair of sequences computed according to the selected model.

    Output: The output is the Newick tree string for the inferred phylogenetic tree including branch lengths. The tree string can optionally be saved to a file.

     Result Processing Tools:
      Nonparametric Bootstrap: given a number of iterations, for each iteration:
      resample (with replacement) the original sequence data, reconstruct the phylogeny for the simulated data set, write out the tree string to the user-specified output file; also each simulated tree is checked against the tree yielded by the original data, and all the clades present in both trees are noted. At the end of the run, a summary tree is printed; this tree has the same topology as the tree reconstructed from the original data, and internal branch lengths represent the proporion (or raw count) of simulated trees which contained the clade starting at that internal node.

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002