Navigation Banner
 

     Summary: Given a nucleotide data file, and a tree ModelTest.bf applies hierarchical testing method of ModelTest - a program by David Posada and Keith Crandall (with some slight variations described below). The original citation for the methodology is:

Posada, D. and Crandall, K.A. 1998. ModeltestL testing the model of DNA substitution. Bioinformatics 14(9):817-818.

Please include it with any results arising out of the use of this analysis.

     There are two methods for model selection:

  1. AIC - Akaike Information Criterion, a procedure for ranking non-nested models for goodness of fit. A number of models are fitted to the data and the tree, and the one with the best AIC score is selected. While usually reliable, the asymptotic assumptions of AIC may not always be satisfied in molecular data analysis. Also, about 50 models are fitted to the data, which may be time-consuming for large data sets.
  2. Hierarchical Model Testing - A series of comparisons of nested models, which are tested using robust chi-squared (or mixed chi-squared, for boundaries of parameter space) approximations to the likelihood ratio statistics. This method is faster that AIC, since at most 7 models are fitted to the data, but it since it takes an "all-or-nothing" approach at each step, the "correct" model may never be tested against, b/c the path leading to it was (incorrectly) rejected early on.
     For a much more detailed exposition of the methodology, refer to the documentation for 'ModelTest'.

     Please note, that the implementation of invariant rate classes in HyPhy may differ from that in other packages. When the invariant class is present, HyPhy allows every site (not just constant sites) to contribute to the 'invariant' component of the likelihood. For example, gamma rate variation and the invariant rate class is implemented as a mixture of the gamma distribution and a point mass at zero, as shown here.

     Input: A nucleotide data file in any recognizable format. HYPHY uses the following table to translate nucleotide ambiguities.
    A starting tree is also needed. It may either be present in the data file, or reside in a separate (Newick string) file.
    The user will be prompted to choose which model testing regime to use:

  • AIC based.
  • Hierarchical Testing.
  • Both: AIC followed by Hierarchical Testing.

    The user will specify how many rate classes should be used for discretizing the gamma distribution. For hierarchical testing, the user will provide hypothesis rejection level. This level should be multiplied by the number of comparison steps performed by the analysis to arrive at the p-value for the compound test.

    Models: Selected by the analysis

    Output: Model Test will display detailed progress of the analysis, followed by the model which was selected, it's rate matrix form, and the MLEs for model parameters. For instance:


+--------------------------------+
| RUNNING MODEL TESTING ANALYSIS |
| Based on the program ModelTest |
|              by                |
|        David  Posada           |
|             and                |
|        Keith Crandall          |
|                                |
|    If you use this analysis,   |
| be sure to cite the original   |
| reference, which can be found  |
| in the HyPhy documentation     |
| page for this analysis.        |
+--------------------------------+


Data File is read from "HAL 9000:Programming:DNAProject-Old:liqing:Done:ubiq:ubiq.flt"
6 species:{BLYMUB1,BLYMUB2,RICRMA630,ZMU29160,ZMU29161,ZMUBIS27F};
Total Sites:468;
Distinct Sites:59

Number of rate classes in rate variation models (e.g. 4):8

Model rejection level (e.g. 0.05):0.01


****** RUNNING AIC BASED MODEL SELECTION ******


|  Model     | # prm |    lnL    |    AIC     |
|------------|-------|-----------|------------|
| JC69       |     9 | -1236.777 |   2491.553 |
| JC69+G     |    10 | -1228.443 |   2476.887 |
| JC69+G+I   |    11 | -1228.444 |   2478.887 |
| JC69+I     |    10 | -1228.544 |   2477.089 |
| F81        |    12 | -1220.463 |   2464.925 |
| F81+G      |    13 | -1210.744 |   2447.488 |
| F81+G+I    |    14 | -1210.744 |   2449.488 |
| F81+I      |    13 | -1210.833 |   2447.665 |
| K80        |    10 | -1234.768 |   2489.536 |
| K80+G      |    11 | -1226.374 |   2474.747 |
| K80+G+I    |    12 | -1226.374 |   2476.747 |
| K80+I      |    11 | -1226.501 |   2475.002 |
| HKY85      |    13 | -1217.756 |   2461.512 |
| HKY85+G    |    14 | -1207.119 |   2442.237 |
| HKY85+G+I  |    15 | -1207.119 |   2444.237 |
| HKY85+I    |    14 | -1207.191 |   2442.382 |
| TrNef      |    11 | -1225.174 |   2472.348 |
| TrNef+G    |    12 | -1218.329 |   2460.658 |
| TrNef+G+I  |    13 | -1218.328 |   2462.657 |
| TrNef+I    |    12 | -1218.556 |   2461.112 |
| TrN        |    14 | -1202.380 |   2432.759 |
| TrN+G      |    15 | -1192.128 |   2414.256 |
| TrN+G+I    |    16 | -1192.126 |   2416.251 |
| TrN+I      |    15 | -1192.166 |   2414.332 |
| K81        |    11 | -1232.883 |   2487.767 |
| K81+G      |    12 | -1224.479 |   2472.958 |
| K81+G+I    |    13 | -1224.479 |   2474.959 |
| K81+I      |    12 | -1224.588 |   2473.177 |
| K81uf      |    14 | -1216.034 |   2460.068 |
| K81uf+G    |    15 | -1205.876 |   2441.752 |
| K81uf+G+I  |    16 | -1205.875 |   2443.751 |
| K81uf+I    |    15 | -1205.967 |   2441.934 |
| TIMef      |    12 | -1223.306 |   2470.612 |
| TIMef+G    |    13 | -1216.472 |   2458.943 |
| TIMef+G+I  |    14 | -1216.470 |   2460.941 |
| TIMef+I    |    13 | -1216.679 |   2459.357 |
| TIM        |    15 | -1200.662 |   2431.324 |
| TIM+G      |    16 | -1190.839 |   2413.677 |
| TIM+G+I    |    17 | -1190.838 |   2415.676 |
| TIM+I      |    16 | -1190.852 |   2413.703 |
| TVMef      |    12 | -1224.219 |   2472.437 |
| TVMef+G    |    13 | -1218.046 |   2462.092 |
| TVMef+G+I  |    14 | -1218.047 |   2464.094 |
| TVMef+I    |    13 | -1218.347 |   2462.695 |
| TVM        |    15 | -1212.043 |   2454.086 |
| TVM+G      |    16 | -1203.637 |   2439.273 |
| TVM+G+I    |    17 | -1203.634 |   2441.267 |
| TVM+I      |    16 | -1203.906 |   2439.813 |
| SYM        |    14 | -1213.931 |   2455.863 |
| SYM+G      |    15 | -1208.462 |   2446.925 |
| SYM+G+I    |    16 | -1208.471 |   2448.942 |
| SYM+I      |    15 | -1208.912 |   2447.825 |
| GTR        |    17 | -1196.680 |   2427.360 |
| GTR+G      |    18 | -1188.696 |   2413.391 |
| GTR+G+I    |    19 | -1188.698 |   2415.396 |
| GTR+I      |    18 | -1188.898 |   2413.797 |
|------------|-------|-----------|------------|

AIC based model: GTR+G      , AIC = 2413.39

****** RUNNING HIERARCHICAL MODEL TESTING ******


1). Checking for equilibrium frequencies equality.

    Null:JC69
        Log-likelihood =   -1236.7767, 9 parameters.
    Alt :F81
        Log-likelihood =   -1220.4626, 12 parameters.

    LR statistic       : 32.6282
    Degrees of freedom : 3
    P-Value            : 3.85804e-07

    Null hypothesis rejected
    F81 chosen.

2). Checking for equality of transition and transversion rates.

    Null:F81
        Log-likelihood =   -1220.4626, 12 parameters.
    Alt :HKY85
        Log-likelihood =   -1217.7561, 13 parameters.

    LR statistic       : 5.41302
    Degrees of freedom : 1
    P-Value            : 0.0199872

    Null hypothesis accepted
    F81 chosen.

...Skipping steps 3 through 6...

6). Checking for evidence of rate variation.

    Null:F81
        Log-likelihood =   -1220.4626, 12 parameters.
    Alt :F81+G
        Log-likelihood =   -1210.7442, 13 parameters.

    LR statistic       : 19.437
    Degrees of freedom : 1
    P-Value            : 5.19878e-06

    Null hypothesis rejected
    F81+G chosen.

7). Checking for evidence of an invariant rate class.

    Null:F81+G
        Log-likelihood =   -1210.7442, 13 parameters.
    Alt :F81+G+I
        Log-likelihood =   -1210.7440, 14 parameters.

    LR statistic       : 0.00031819
    Degrees of freedom : 1
    P-Value            : 0.492884

    Null hypothesis accepted
    F81+G chosen.


 ******* Hierarchical Testing based model (F81+G) ******** 

Rate Matrix:
+---+-------+-------+-------+-------+
|   |   A   |   C   |   G   |   T   |
+---+-------+-------+-------+-------+
| A |   *   | 1 *t  | 1 *t  | 1 *t  |
+---+-------+-------+-------+-------+
| C | 1 *t  |   *   | 1 *t  | 1 *t  |
+---+-------+-------+-------+-------+
| G | 1 *t  | 1 *t  |   *   | 1 *t  |
+---+-------+-------+-------+-------+
| T | 1 *t  | 1 *t  | 1 *t  |   *   |
+---+-------+-------+-------+-------+


MLEs for the model selected by Hierarchical Testing:
Log Likelihood = -1210.7441560791;
Shared Parameters:
alpha=0.38137

Tree AIC_TREE=((BLYMUB1:0.0140963,BLYMUB2:0.023026)Node2:0.0187021,RICRMA630:0.0387926,
((ZMU29161:0.0212183,ZMU29160:0.0571707)Node7:0.0159509,ZMUBIS27F:0.0420282)Node6:0.0214617);


 ******* AIC based model (GTR+G      ) ******** 

Rate Matrix:
+---+-------+-------+-------+-------+
|   |   A   |   C   |   G   |   T   |
+---+-------+-------+-------+-------+
| A |   *   | 1 *t  | R1*t  | R2*t  |
+---+-------+-------+-------+-------+
| C | 1 *t  |   *   | R3*t  | R4*t  |
+---+-------+-------+-------+-------+
| G | R1*t  | R3*t  |   *   | R5*t  |
+---+-------+-------+-------+-------+
| T | R2*t  | R4*t  | R5*t  |   *   |
+---+-------+-------+-------+-------+


MLEs for the model selected by AIC:
Log Likelihood = -1303.0656431646;
Shared Parameters:
alpha=0.849356
R1=5.15865
R2=1.87317
R3=0.55211
R4=0.703657
R5=0.455595

Tree AIC_TREE=((BLYMUB1:0.046776,BLYMUB2:0.017686)Node2:0.0589616,RICRMA630:0.025151,
((ZMU29161:0.0413736,ZMU29160:0.0218393)Node7:0.0234768,ZMUBIS27F:0.0243365)Node6:0.0150669);

     Result Processing Tools:

  • View Results. 
  • Save Results. 
  • (Co)Variance Estimates. 
  • Ancestors. 
  •       HYPHY will present the user with a choice of two likelihood functions. Select AIC_LF - it will refer to the likelihood function with the model chosen by AIC, if that method was run, or with the model chosen by hierarchical testing, if only that method was run.

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002