Description
|
This example demonstrates HyPhy ability to easily test
non-nested models via bootstrap. We will take a small
coding dataset for the alpha-spectrin gene,
analyze it using two non-nested models:
codon model (MG94 with 3x4 frequencies) and a nucleotide
model (F81x3), with a separate set of parameters for each
of the three nucleotide positions. We compute the likelihood
ratio statistics for the null (F81x3) vs the alternative
(MG94_3x4) hypotheses and obtain a p-value via parametric
bootstrap.
|
Define the models
|
Open the data file a_spectrin.cod from the 'data' in 'GUIExamples'.
Follow the instructions from this example
to set up three partitions - one for each position within a codon,
assign F81 to all partitions and obtain MLEs.
Open the data file a_spectrin.cod from the 'data' in 'GUIExamples' again
(do not close the previous one).
HyPhy will open the same file, but change its name, so that there is no
duplication. Make a single partition with all the data in it, change
'Partition Type' to Codon (select Universal codon when a dialog box pops up).
Assign the tree topology and the MG94_3x4 model to it. Obtain MLEs.
|
Set up the bootstrap
|
We have now defined two models on the same data set: F81x3 (null) and
MG94_3x4 (alternative). Since the models treat the data differently,
there is no way to arrive at a good asymptotic distribution of
the LR statistic. We will simulate this distribution using the
bootstrap. Simulated data sets will be generated using F81x3 and
reanalyzed both with F81x3 and MG94_3x4. When data is generated
with a nucleotide model, there will inevitably be some stop codons
in it. To resolve this issue, codon models treat stop codons as
missing data.
To begin the boostrap, switch to the F81x3 (null)
data panel and select 'General Bootstrap' in the 'Data' menu.
In the window that just appeared, select a_spectrin2 (the
name of the data panel which defines the alternative),
as the alternative hypothesis, and choose start 'Parametric
Bootstrap' with the 1st button of the bootstrap window.
Do 100 bootstrap iterations.
|
Interpreting bootstrap results
|
As bootstrap LR values begin filling the bootstrap table,
you will notice that some values are negative, i.e.
the null provides better fit to simulated data that the
alternative. This isn't a cause for concern, because
the models are not nested, and it is conceivable that
a codon model may fit nucleotide data worse that a
good nucleotide model.
However the simulated p-value turns out to be 0, because
the LR of the original data is almost 84. My bootstrapped
LR histogram looked like this:
This little simulation suggests that for coding data, codon
models (with fever model parameters), offer significantly better
fits compared to parameter rich nucleotide models. This conclusion
is quite believable indeed.
|
|