Navigation Banner
 
  HyPhy Documentation: HyPhy GUI Examples: Simple Hypothesis Testing

Prerequisites
This example uses the result of Simple MLE Analysis. Open the file 'Example1.bf' located in folder 'Saves' in the 'GUI Examples' directory, using 'Open Batch File' (Command-O) HyPhy menu option. This will load a dataset, define a single partition containing the entire dataset, assign a tree and an evolutionary model (HKY85 with gamma rate variation) to the data, and load MLEs of model parameters from the previous analysis. In the data window, click on the last button in the first column (the tooltip for the button is: 'Display Table of Parameter Values' or press Command-H, or select the menu item: 'Show Parameters' from the 'Likelihood Value'. This should open a table with parameter estimates. For demonstration purposes we will also open a tree window (double click on the brown_tree column in the parameter table to do that), and arrange the two windows side by side, as shown here:

Define the hypotheses
Switch back to the parameter table window (by clicking in it), if it isn't already the frontmost window. We can think of a hypothesis as a set of constraints on model parameters - each set of constraints reduces the number of degrees of freedom in the model, and each two sets of constraints which are nested (i.e. one can be obtained from the other by adding/removing constraints from only ONE of the sets) can be tested for significance using likelihood ratio testing methods. Presently we have an unconstrained model, which is a suitable alternative hypothesis for our example. To define the unconstrained model as an alternative hypothesis, select pulldown menu from the parameter window (it shows 'Current LF') by default and choose 'Save LF State'. HyPhy will display a warning message! Why is that? The reason for it is that in order for the likelihood ratio test to work, we must ensure that all the parameter values are actually maximum likelihood estimates. When we saved the analysis from the previous example and then loaded it back, HyPhy didn't know whether parameter values are MLEs or not. Since we know that the parameter values are indeed MLEs, we can ignore the warning. HyPhy will then prompt you to enter the name for the likelihood state; choose 'Full Model' (any other descriptive name will be fine). HyPhy will prefix the name we just chose by '[-]' to remind you that it can't guarantee that parameter values we saved in that likelihood function state are indeed MLEs. Alternatively, we could choose cancel in the warning, then reoptimize the likelihood function - using 'Optimize LF' from the 'Likelihood Menu' and then save the likelihood state.

Next, we should define the constraints which will form our null hypothesis. We begin with a very simple hypothesis: the length of the 'Human' branch is the same as the length of the 'Chimpanzee' branch. Since the length of the branch is a function of branch model parameters, setting the model parameters equal, will constrain branch lengths to be equal as well. In the parameter table, select the branch parameters for 'Human' and 'Chimpanzee', using Shift-click to select more than one row. To constrain the parameters to be equal, choose the ':=' button (2nd in the button row). A pull-down menu will appear, letting you choose the right hand side of the constraint. Both choices are equivalent, so it is a matter of preference which parameter you wish to leave independent. Several things happened:

  • the icon of one of the parameters changed, indicating that it is now a constrained parameter
  • the constraint column in the parameter table changed to reflect the constraint we just imposed
  • the tree display updated to show equal branch lengths for 'Human' and 'Chimpanzee'
  • the log-likelihood and parameter count numbers in the status bar at the bottom of the parameter window changed as well
  • parameter estimates in the table are no longer MLEs
The windows should now look like this:

We now optimize the constrained function (Command-T or 'Optimize LF' from the 'Likelihood Menu'). This will take a few seconds, and you can watch the progress in the status bar of the console window. Finally, save the constrained likelihood state (the same way we saved the unconstrained state before), choosing the name 'Human=Chimp'. Note that no warning messages are displayed now, since HyPhy knows that we just optimized the LF, so the parameter values are in fact MLEs. You can toggle between saved states to restore parameter values and compare them (try it). For our example the differences will be minute.
Test the hypothesis
Finally, we set the hypotheses as null and alternative. Switch to 'Human=Chimp' (if it isn't the present state) and select 'Select as null' from the pulldown menu. Note that the name of the hypothesis changed to refect that it is now the null. Switch to '[-]Full Model' and selec 'Select as alternative' from the pulldown menu. Engage the pulldown menu again and observe that the testing items at the bottom of the menu have been activated.

First, let us conduct a simple LR test. Choose the 'LRT' menu item from the pulldown. The results will be printed to the console.

     2*LR = 0.299696
     DF = 1
     P-Value = 0.584073 
2*LR (2 times the log-likelihood ratio) is the definition of the likelihood ratio statistic, DF is the number of parameters, and P-Value is calculated using the Chi-squared approximation to the distribution of the likelihood ratio statistic for nested hypotheses This P-Value will be invalid for non-nested hypotheses (but we can use bootstrap for those). The P-Value of 0.58 suggests that we accept the null hypothesis, i.e. that the branch lengths are indeed equal. We would typically reject the null, if the P-Value is less than a small number (significance level), for instance 0.01.
Bootstrap

Next, try the bootstrap. Parametric bootstrap simulates data under the null hypothesis and computes the LR statistic for each data replicate. We can tabulate the (simulated) distribution of the LR and use it to compute a bootstrap p-value. Non-parametric bootstrap simulates data using the general multinomial model (i.e. by sampling with replacement). In that regard it is not really non-parametric, just the number of parameters is very large - 4^5=1024 in the case of our 5 sequence nucleotide dataset without deletions. Choose 'Parametric Bootstrap' from the pulldown menu and enter 100 as the number of replicates. You can also choose to save each replicate data set to a file, so that if an abnormal result shows up, the dataset which produced it can be examined. The bootstrap will take a few minutes, so now may be a good time to get a cup of coffee:) A new window pops up as soon as you start the bootstrap - the bootstrap window. As soon as each iterate is finished, the LR statistic for that iterate is added to the table. If it is larger than the LR statistic for the original data set, the entry is in bold. At the top of the window, you will find a dynamically updated P-Value. You can stop the run at any time by pressing the stop button in the bootstrap window (the bootstrap will stop as soon as the replicate in progress has finished). The final bootstrap window looks like this (you will have different numbers, of course, because each simulated sample is [pseudo]random).

Once the bootstrap is finished, the first result to observe is the simulated p-value (in our example it should be close to the asymptotic p-value). This p-value, for a sufficiently large number of iterates, is valid for all types of model comparisons, nested or not. The bootstrap result table can be saved to a file, and you can also examine the simulated distribution of the LR statistic, by using the histogram button in the bootstrap window. Select 'Histogram [Default]' to let HyPhy decide how many bins to place the data in (choose [Custom] to do that yourself).

The green portion represents all the simulated LR values less than the original LR, and the red values are the values which are greater. Simulated p-value is the ratio of the red area to the total area. This distribution has a shape very similar to the chi-squared distribution with one degree of freedom (which is the asymptotic distribution of the LR statistic in our example).

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002