Navigation Banner
 
  HyPhy Documentation: Standard Analyses: TopologySearchConstrained.bf

      Summary: Given a nucleotide or aminoacid data file and a relevant model, Topology Search reconstructs a phylogenetic tree for the sequences in the data by exhaustively searching through all possible unrooted trees which match a constraint. The constraint is specified as a Newick tree string. A typical constraint looks like: ((a,b,c),(d,e,f)). This constraint states that species a,b,c and d,e,f should be monophyletic. For instance the tree ((a,(b,c)),(d,e),f) matches the constraint, but ((a,b),(c,d),(e,f)) does not. Presently, this is the only type of constraints supported by TopologySearchConstrained.bf. This method is very slow and should not be used for data sets with more than 9-10 sequences.

     Input: A nucleotide or aminoacid or codon data file in any recognizable format. HYPHY uses the following table to translate nucleotide ambiguities (or aminoacid characters). The analysis also requires a constraint file, with a single Newick tree string.

    Models: Any of the standard nucleotide or aminoacid models can be selected for the analysis. The selected model will be applied to estimate the maximum likelihood parameter values, and the likelihood of the tree being tested.

    Output: The standard output depends on the "Likelihood Output" option selected in "Preferences". By default, that option is to print the maximum ln-likelihood followed by a the inferred tree string with branch lengths representing the expected number of substitutions per codon. For a complete list of output options, refer the Output Formats page. The inferred tree topology can be optionally saved to a file.

    TopologySearch.bf will print out the scores and the topologies of 10 best trees found in the analysis, a table of likelihood score distributions and, optionally, save all the trees and their scores to a file. A run looks like this:

*********** RUNNING CONSTRAINED TREE SEARCH ***********
Constraint:(Human,Chimpanzee,(Gorilla,Orangutan,Gibbon));


Tree#0 (Human,Chimpanzee,((Gorilla,Gibbon),Orangutan)) ==> logLhd = -2696.11
Tree#1 (Human,Chimpanzee,(Gorilla,(Orangutan,Gibbon))) ==> logLhd = -2665.43
Tree#2 (Human,Chimpanzee,((Gorilla,Orangutan),Gibbon)) ==> logLhd = -2696.24

 --------------------- RESULTS --------------------- 

 Total tree count =3

 BestTree =(Human,Chimpanzee,(Gorilla,(Orangutan,Gibbon)))

Tree Constraint:
(Human,Chimpanzee,(Gorilla,Orangutan,Gibbon)Node3);


**************************
*     TREE REPORT        *
**************************


#### BEST TREES #####

1).
(Human,Chimpanzee,(Gorilla,(Orangutan,Gibbon)))
Log-likelihood = -2665.43

2). Worse by: -30.6847
(Human,Chimpanzee,((Gorilla,Gibbon),Orangutan))
Log-likelihood = -2696.11

3). Worse by: -30.8144
(Human,Chimpanzee,((Gorilla,Orangutan),Gibbon))
Log-likelihood = -2696.24


#### STATISTICS #####



+---------------+---------------+---------------+---------------+
| From Best +   |  To Best +    |   Tree Count  |  % of total   |
+---------------+---------------+---------------+---------------+
|             0 |           0.1 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|           0.1 |           0.5 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|           0.5 |           1.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|           1.0 |           5.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|           5.0 |          10.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|          10.0 |          50.0 |             2 |   66.66666667 |
+---------------+---------------+---------------+---------------+
|          50.0 |         100.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|         100.0 |        1000.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|        1000.0 |       10000.0 |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+
|       10000.0 |      Infinity |             0 |    0.00000000 |
+---------------+---------------+---------------+---------------+

Log Likelihood = -2665.42999469179;
Shared Parameters:
R=9.26199

Tree tr=(Human:0.0415676,Chimpanzee:0.0540152,(Gorilla:0.0578091,
(Orangutan:0.100049,Gibbon:0.138828)Node5:0.0532975)Node3:0.0175692);

     Result Processing Tools: After the analysis is finished, the following options are available in the "Results" submenu of the "Analyses" menu:

  • View Results.
  • Save Results.
  • (Co)Variance Estimates.
  • Nonparametric Bootstrap.
  • Given a number of iterations, for each iteration:
          resample (with replacement) the original sequence data, reconstruct the phylogeny for the simulated data set, write out the tree string to the user-specified output file; also each simulated tree is checked against the tree yielded by the original data, and all the clades present in both trees are noted. At the end of the run, a summary tree is printed; this tree has the same topology as the tree reconstructed from the original data, and internal branch lengths represent the proporion (or raw count) of simulated trees which contained the clade starting at that internal node.

       The (co)variance estimates refer to the parameters in the inferred tree.

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002