Navigation Banner
 
  HyPhy Documentation: HyPhy GUI Examples: Testing non-nested hypotheses

Description
This example demonstrates HyPhy ability to easily test non-nested models via bootstrap. We will take a small coding dataset for the alpha-spectrin gene, analyze it using two non-nested models: codon model (MG94 with 3x4 frequencies) and a nucleotide model (F81x3), with a separate set of parameters for each of the three nucleotide positions. We compute the likelihood ratio statistics for the null (F81x3) vs the alternative (MG94_3x4) hypotheses and obtain a p-value via parametric bootstrap.
Define the models
Open the data file a_spectrin.cod from the 'data' in 'GUIExamples'. Follow the instructions from this example to set up three partitions - one for each position within a codon, assign F81 to all partitions and obtain MLEs.

Open the data file a_spectrin.cod from the 'data' in 'GUIExamples' again (do not close the previous one). HyPhy will open the same file, but change its name, so that there is no duplication. Make a single partition with all the data in it, change 'Partition Type' to Codon (select Universal codon when a dialog box pops up). Assign the tree topology and the MG94_3x4 model to it. Obtain MLEs.

Set up the bootstrap
We have now defined two models on the same data set: F81x3 (null) and MG94_3x4 (alternative). Since the models treat the data differently, there is no way to arrive at a good asymptotic distribution of the LR statistic. We will simulate this distribution using the bootstrap. Simulated data sets will be generated using F81x3 and reanalyzed both with F81x3 and MG94_3x4. When data is generated with a nucleotide model, there will inevitably be some stop codons in it. To resolve this issue, codon models treat stop codons as missing data.

To begin the boostrap, switch to the F81x3 (null) data panel and select 'General Bootstrap' in the 'Data' menu. In the window that just appeared, select a_spectrin2 (the name of the data panel which defines the alternative), as the alternative hypothesis, and choose start 'Parametric Bootstrap' with the 1st button of the bootstrap window.

Do 100 bootstrap iterations.

Interpreting bootstrap results
As bootstrap LR values begin filling the bootstrap table, you will notice that some values are negative, i.e. the null provides better fit to simulated data that the alternative. This isn't a cause for concern, because the models are not nested, and it is conceivable that a codon model may fit nucleotide data worse that a good nucleotide model.

However the simulated p-value turns out to be 0, because the LR of the original data is almost 84. My bootstrapped LR histogram looked like this:

This little simulation suggests that for coding data, codon models (with fever model parameters), offer significantly better fits compared to parameter rich nucleotide models. This conclusion is quite believable indeed.

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002