Prerequisites | This example uses the result of Simple MLE Analysis. Open the file 'Example1.bf' located in folder 'Saves' in the 'GUI Examples' directory, using 'Open Batch File' (Command-O) HyPhy menu option. This will load a dataset, define
a single partition containing the entire dataset, assign a tree and an evolutionary model
(HKY85 with gamma rate variation) to the data, and load MLEs of model parameters from
the previous analysis. In the data window, click on the last button in the first column (the tooltip for the button is: 'Display Table of Parameter Values' or press Command-H, or select the menu
item: 'Show Parameters' from the 'Likelihood Value'. This should open a table with parameter estimates. For demonstration purposes we will also open a tree window (double click on
the brown_tree column in the parameter table to do that), and arrange the two windows side by side, as shown here:
|
Define the hypotheses | Switch back to the parameter table window (by clicking
in it), if it isn't already the frontmost window. We can think of a hypothesis as a set of
constraints on model parameters - each set of constraints reduces the number of
degrees of freedom in the model, and each two sets of constraints which are nested (i.e.
one can be obtained from the other by adding/removing constraints from only ONE of
the sets) can be tested for significance using likelihood ratio testing methods.
Presently we have an unconstrained model, which is a suitable alternative
hypothesis for our example. To define the unconstrained model as an alternative
hypothesis, select pulldown menu from the parameter window (it shows 'Current LF')
by default and choose 'Save LF State'. HyPhy will display a warning message!
Why is that? The reason for it is that in order for the likelihood ratio test
to work, we must ensure that all the parameter values are actually maximum likelihood
estimates. When we saved the analysis from the previous example and then loaded it
back, HyPhy didn't know whether parameter values are MLEs or not. Since we
know that the parameter values are indeed MLEs, we can ignore the warning.
HyPhy will then prompt you to enter the name for the likelihood state; choose
'Full Model' (any other descriptive name will be fine). HyPhy will prefix the name
we just chose by '[-]' to remind you that it can't guarantee that parameter values
we saved in that likelihood function state are indeed MLEs. Alternatively,
we could choose cancel in the warning, then reoptimize the likelihood function
- using 'Optimize LF' from the 'Likelihood Menu' and then save the likelihood state.
Next, we should define the constraints which will form our null hypothesis. We begin
with a very simple hypothesis: the length of the 'Human' branch is
the same as the length of the 'Chimpanzee' branch. Since the length of
the branch is a function of branch model parameters, setting the model
parameters equal, will constrain branch lengths to be equal as well. In
the parameter table, select the branch parameters for 'Human' and 'Chimpanzee',
using Shift-click to select more than one row. To constrain the parameters to be equal,
choose the ':=' button (2nd in the button row). A pull-down menu will appear,
letting you choose the right hand side of the constraint.
Both choices are equivalent, so it is a matter of preference which parameter
you wish to leave independent. Several things happened:
- the icon of one of the parameters changed, indicating that it is now a constrained parameter
- the constraint column in the parameter table changed to reflect the constraint
we just imposed
- the tree display updated to show equal branch lengths for 'Human'
and 'Chimpanzee'
- the log-likelihood and parameter count numbers in the status bar
at the bottom of the parameter window changed as well
- parameter estimates in the table are no longer MLEs
The windows should now look like this:
We now optimize the constrained function (Command-T or 'Optimize LF' from
the 'Likelihood Menu'). This will take a few seconds, and you can watch the
progress in the status bar of the console window. Finally, save the constrained
likelihood state (the same way we saved the unconstrained state before), choosing
the name 'Human=Chimp'. Note that no warning messages are displayed now,
since HyPhy knows that we just optimized the LF, so the parameter values
are in fact MLEs. You can toggle between saved states to restore parameter values
and compare them (try it). For our example the differences will be minute.
|
Test the hypothesis | Finally, we set the hypotheses as null
and alternative. Switch to 'Human=Chimp' (if it isn't the present state)
and select 'Select as null' from the pulldown menu. Note that the name
of the hypothesis changed to refect that it is now the null. Switch to
'[-]Full Model' and selec 'Select as alternative' from the pulldown menu.
Engage the pulldown menu again and observe that the testing items at the
bottom of the menu have been activated.
First, let us conduct a simple LR test. Choose the 'LRT' menu item from
the pulldown. The results will be printed to the console.
2*LR = 0.299696
DF = 1
P-Value = 0.584073
2*LR (2 times the log-likelihood ratio) is the definition of the likelihood ratio statistic, DF
is the number of parameters, and P-Value is calculated using the Chi-squared approximation
to the distribution of the likelihood ratio statistic for nested
hypotheses This P-Value will be invalid for non-nested hypotheses
(but we can use bootstrap for those). The P-Value of 0.58 suggests that we accept
the null hypothesis, i.e. that the branch lengths are indeed equal. We would
typically reject the null, if the P-Value is less than a small number (significance
level), for instance 0.01.
|
Bootstrap |
Next, try the bootstrap. Parametric bootstrap simulates data under the
null hypothesis and computes the LR statistic for each data replicate.
We can tabulate the (simulated) distribution of the LR and use it to
compute a bootstrap p-value. Non-parametric bootstrap simulates data
using the general multinomial model (i.e. by sampling with replacement).
In that regard it is not really non-parametric, just the number
of parameters is very large - 4^5=1024 in the case of our 5 sequence
nucleotide dataset without deletions. Choose 'Parametric Bootstrap'
from the pulldown menu and enter 100 as the number of replicates. You
can also choose to save each replicate data set to a file, so that
if an abnormal result shows up, the dataset which produced it can be examined.
The bootstrap will take a few minutes, so now may be a good time to get a cup of
coffee:) A new window pops up as soon as you start the bootstrap - the bootstrap
window. As soon as each iterate is finished, the LR statistic for
that iterate is added to the table. If it is larger than the LR statistic
for the original data set, the entry is in bold. At the top of
the window, you will find a dynamically updated P-Value. You can stop
the run at any time by pressing the stop button in the bootstrap window
(the bootstrap will stop as soon as the replicate in progress has finished).
The final bootstrap window looks like this (you will have different numbers,
of course, because each simulated sample is [pseudo]random).
Once the bootstrap is finished, the first result to observe is the
simulated p-value (in our example it should be close to the asymptotic
p-value). This p-value, for a sufficiently large number of iterates, is valid
for all types of model comparisons, nested or not.
The bootstrap result table can be saved to a file, and
you can also examine the simulated distribution of the LR statistic,
by using the histogram button in the bootstrap window. Select
'Histogram [Default]' to let HyPhy decide how many bins to place
the data in (choose [Custom] to do that yourself).
The green portion represents all the simulated LR values less
than the original LR, and the red values are the values which
are greater. Simulated p-value is the ratio of the red area
to the total area. This distribution has a shape very similar to the
chi-squared distribution with one degree of freedom (which is the asymptotic
distribution of the LR statistic in our example).
|
|