Summary:
Given a nucleotide data file, and a tree ModelTest.bf applies
hierarchical testing method of ModelTest - a program
by David Posada and Keith Crandall (with some slight variations
described below). The original citation for the methodology is:
Posada, D. and Crandall, K.A. 1998. ModeltestL
testing the model of DNA substitution. Bioinformatics
14(9):817-818.
Please include it with any results arising out of the use of
this analysis.
There are two methods for model selection:
- AIC - Akaike Information Criterion, a procedure
for ranking non-nested models for goodness of fit. A number
of models are fitted to the data and the tree, and the one with
the best AIC score is selected. While usually reliable, the
asymptotic assumptions of AIC may not always be satisfied in
molecular data analysis. Also, about 50 models are fitted to
the data, which may be time-consuming for large data sets.
- Hierarchical Model Testing - A series of
comparisons of nested models, which are tested using robust
chi-squared (or mixed chi-squared, for boundaries of parameter
space) approximations to the likelihood ratio statistics.
This method is faster that AIC, since at most 7 models are fitted to the data,
but it since it takes an "all-or-nothing" approach at each
step, the "correct" model may never be tested against, b/c
the path leading to it was (incorrectly) rejected early on.
For a much more detailed exposition of the methodology, refer
to the documentation for 'ModelTest'.
Please note, that the implementation
of invariant rate classes in HyPhy may differ from that in other
packages. When the invariant class is present, HyPhy allows
every site (not just constant sites) to contribute
to the 'invariant' component of the likelihood. For example,
gamma rate variation and the invariant rate class is implemented
as a mixture of the gamma distribution and a point mass at zero,
as shown here.
Input:
A nucleotide data file in any recognizable format. HYPHY
uses the following table
to translate nucleotide ambiguities.
A starting tree is also needed. It may either be present
in the data file, or reside in a separate (Newick string) file.
The user will be prompted to choose which
model testing regime to use:
- AIC based.
- Hierarchical Testing.
- Both: AIC followed by Hierarchical Testing.
The user will specify how many rate classes
should be used for discretizing the gamma distribution. For hierarchical
testing, the user will provide hypothesis rejection level. This
level should be multiplied by the number of comparison steps
performed by the analysis to arrive at the p-value for the compound
test.
Models: Selected by the analysis
Output:
Model Test will display detailed progress of the analysis,
followed by the model which was selected, it's rate matrix form,
and the MLEs for model parameters. For instance:
+--------------------------------+
| RUNNING MODEL TESTING ANALYSIS |
| Based on the program ModelTest |
| by |
| David Posada |
| and |
| Keith Crandall |
| |
| If you use this analysis, |
| be sure to cite the original |
| reference, which can be found |
| in the HyPhy documentation |
| page for this analysis. |
+--------------------------------+
Data File is read from "HAL 9000:Programming:DNAProject-Old:liqing:Done:ubiq:ubiq.flt"
6 species:{BLYMUB1,BLYMUB2,RICRMA630,ZMU29160,ZMU29161,ZMUBIS27F};
Total Sites:468;
Distinct Sites:59
Number of rate classes in rate variation models (e.g. 4):8
Model rejection level (e.g. 0.05):0.01
****** RUNNING AIC BASED MODEL SELECTION ******
| Model | # prm | lnL | AIC |
|------------|-------|-----------|------------|
| JC69 | 9 | -1236.777 | 2491.553 |
| JC69+G | 10 | -1228.443 | 2476.887 |
| JC69+G+I | 11 | -1228.444 | 2478.887 |
| JC69+I | 10 | -1228.544 | 2477.089 |
| F81 | 12 | -1220.463 | 2464.925 |
| F81+G | 13 | -1210.744 | 2447.488 |
| F81+G+I | 14 | -1210.744 | 2449.488 |
| F81+I | 13 | -1210.833 | 2447.665 |
| K80 | 10 | -1234.768 | 2489.536 |
| K80+G | 11 | -1226.374 | 2474.747 |
| K80+G+I | 12 | -1226.374 | 2476.747 |
| K80+I | 11 | -1226.501 | 2475.002 |
| HKY85 | 13 | -1217.756 | 2461.512 |
| HKY85+G | 14 | -1207.119 | 2442.237 |
| HKY85+G+I | 15 | -1207.119 | 2444.237 |
| HKY85+I | 14 | -1207.191 | 2442.382 |
| TrNef | 11 | -1225.174 | 2472.348 |
| TrNef+G | 12 | -1218.329 | 2460.658 |
| TrNef+G+I | 13 | -1218.328 | 2462.657 |
| TrNef+I | 12 | -1218.556 | 2461.112 |
| TrN | 14 | -1202.380 | 2432.759 |
| TrN+G | 15 | -1192.128 | 2414.256 |
| TrN+G+I | 16 | -1192.126 | 2416.251 |
| TrN+I | 15 | -1192.166 | 2414.332 |
| K81 | 11 | -1232.883 | 2487.767 |
| K81+G | 12 | -1224.479 | 2472.958 |
| K81+G+I | 13 | -1224.479 | 2474.959 |
| K81+I | 12 | -1224.588 | 2473.177 |
| K81uf | 14 | -1216.034 | 2460.068 |
| K81uf+G | 15 | -1205.876 | 2441.752 |
| K81uf+G+I | 16 | -1205.875 | 2443.751 |
| K81uf+I | 15 | -1205.967 | 2441.934 |
| TIMef | 12 | -1223.306 | 2470.612 |
| TIMef+G | 13 | -1216.472 | 2458.943 |
| TIMef+G+I | 14 | -1216.470 | 2460.941 |
| TIMef+I | 13 | -1216.679 | 2459.357 |
| TIM | 15 | -1200.662 | 2431.324 |
| TIM+G | 16 | -1190.839 | 2413.677 |
| TIM+G+I | 17 | -1190.838 | 2415.676 |
| TIM+I | 16 | -1190.852 | 2413.703 |
| TVMef | 12 | -1224.219 | 2472.437 |
| TVMef+G | 13 | -1218.046 | 2462.092 |
| TVMef+G+I | 14 | -1218.047 | 2464.094 |
| TVMef+I | 13 | -1218.347 | 2462.695 |
| TVM | 15 | -1212.043 | 2454.086 |
| TVM+G | 16 | -1203.637 | 2439.273 |
| TVM+G+I | 17 | -1203.634 | 2441.267 |
| TVM+I | 16 | -1203.906 | 2439.813 |
| SYM | 14 | -1213.931 | 2455.863 |
| SYM+G | 15 | -1208.462 | 2446.925 |
| SYM+G+I | 16 | -1208.471 | 2448.942 |
| SYM+I | 15 | -1208.912 | 2447.825 |
| GTR | 17 | -1196.680 | 2427.360 |
| GTR+G | 18 | -1188.696 | 2413.391 |
| GTR+G+I | 19 | -1188.698 | 2415.396 |
| GTR+I | 18 | -1188.898 | 2413.797 |
|------------|-------|-----------|------------|
AIC based model: GTR+G , AIC = 2413.39
****** RUNNING HIERARCHICAL MODEL TESTING ******
1). Checking for equilibrium frequencies equality.
Null:JC69
Log-likelihood = -1236.7767, 9 parameters.
Alt :F81
Log-likelihood = -1220.4626, 12 parameters.
LR statistic : 32.6282
Degrees of freedom : 3
P-Value : 3.85804e-07
Null hypothesis rejected
F81 chosen.
2). Checking for equality of transition and transversion rates.
Null:F81
Log-likelihood = -1220.4626, 12 parameters.
Alt :HKY85
Log-likelihood = -1217.7561, 13 parameters.
LR statistic : 5.41302
Degrees of freedom : 1
P-Value : 0.0199872
Null hypothesis accepted
F81 chosen.
...Skipping steps 3 through 6...
6). Checking for evidence of rate variation.
Null:F81
Log-likelihood = -1220.4626, 12 parameters.
Alt :F81+G
Log-likelihood = -1210.7442, 13 parameters.
LR statistic : 19.437
Degrees of freedom : 1
P-Value : 5.19878e-06
Null hypothesis rejected
F81+G chosen.
7). Checking for evidence of an invariant rate class.
Null:F81+G
Log-likelihood = -1210.7442, 13 parameters.
Alt :F81+G+I
Log-likelihood = -1210.7440, 14 parameters.
LR statistic : 0.00031819
Degrees of freedom : 1
P-Value : 0.492884
Null hypothesis accepted
F81+G chosen.
******* Hierarchical Testing based model (F81+G) ********
Rate Matrix:
+---+-------+-------+-------+-------+
| | A | C | G | T |
+---+-------+-------+-------+-------+
| A | * | 1 *t | 1 *t | 1 *t |
+---+-------+-------+-------+-------+
| C | 1 *t | * | 1 *t | 1 *t |
+---+-------+-------+-------+-------+
| G | 1 *t | 1 *t | * | 1 *t |
+---+-------+-------+-------+-------+
| T | 1 *t | 1 *t | 1 *t | * |
+---+-------+-------+-------+-------+
MLEs for the model selected by Hierarchical Testing:
Log Likelihood = -1210.7441560791;
Shared Parameters:
alpha=0.38137
Tree AIC_TREE=((BLYMUB1:0.0140963,BLYMUB2:0.023026)Node2:0.0187021,RICRMA630:0.0387926,
((ZMU29161:0.0212183,ZMU29160:0.0571707)Node7:0.0159509,ZMUBIS27F:0.0420282)Node6:0.0214617);
******* AIC based model (GTR+G ) ********
Rate Matrix:
+---+-------+-------+-------+-------+
| | A | C | G | T |
+---+-------+-------+-------+-------+
| A | * | 1 *t | R1*t | R2*t |
+---+-------+-------+-------+-------+
| C | 1 *t | * | R3*t | R4*t |
+---+-------+-------+-------+-------+
| G | R1*t | R3*t | * | R5*t |
+---+-------+-------+-------+-------+
| T | R2*t | R4*t | R5*t | * |
+---+-------+-------+-------+-------+
MLEs for the model selected by AIC:
Log Likelihood = -1303.0656431646;
Shared Parameters:
alpha=0.849356
R1=5.15865
R2=1.87317
R3=0.55211
R4=0.703657
R5=0.455595
Tree AIC_TREE=((BLYMUB1:0.046776,BLYMUB2:0.017686)Node2:0.0589616,RICRMA630:0.025151,
((ZMU29161:0.0413736,ZMU29160:0.0218393)Node7:0.0234768,ZMUBIS27F:0.0243365)Node6:0.0150669);
Result Processing
Tools:
-
View Results.
-
Save Results.
-
(Co)Variance Estimates.
-
Ancestors.
HYPHY
will present the user with a choice of two likelihood functions.
Select AIC_LF - it will refer to the likelihood function with the model
chosen by AIC, if that method was run, or with the model chosen
by hierarchical testing, if only that method was run.
|