Navigation Banner
 
  HyPhy Documentation: HyPhy GUI Examples: How to define a model

Description
This example illustrates how to define a complicated model using the model component of HyPhy GUI. The model component can be used to define rate matrices, and, to some extent, equilibrium frequencies.

We will define a codon model, which is an extension of the MG94 model, and whose rate matrix is a cross between MG94 and the general reversible rate matrix for nucleotide substitution. While this model is quite tedious to define, it will illustrate some concepts that are useful for defining simpler models.

The rate matrix
Each model in HyPhy is defined by its rate matrix and the vector of equilibrium frequencies. The first step in model definition is to define the rate matrix

For our example, the general rate matrix entry for substituting codon x with codon y is:

  • t if x->y is a one step synonymous A->C or C->A
  • t*AG if x->y is a one step synonymous A->G or G->A
  • t*AT if x->y is a one step synonymous A->T or G->T
  • t*CG if x->y is a one step synonymous C->G or G->C
  • t*CT if x->y is a one step synonymous C->T or T->C
  • t*GT if x->y is a one step synonymous G->T or T->G
  • R*t if x->y is a one step non-synonymous A->C or C->A
  • R*t*AG if x->y is a one step non-synonymous A->G or G->A
  • R*t*AT if x->y is a one step non-synonymous A->T or G->T
  • R*t*CG if x->y is a one step non-synonymous C->G or G->C
  • R*t*CT if x->y is a one step non-synonymous C->T or T->C
  • R*t*GT if x->y is a one step non-synonymous G->T or T->G
t is the branch length, and all other terms are rates shared by all branches in the tree. R is the non-synonymous/synonymous ratio, which we will assume is drawn from the general gamma distribution. We represent that by replacing R with c*R where c is drawn from the unit mean gamma. Also, rate matrix entries should be multiplied by the equilibrium frequencies of the target nucleotide, but we omit that for brevity, and this multiplication will be taken care of automatically, when we select the equilibrium frequencies.

In HyPhy terms, t is a local variable, R,AC,AG,AT,CT,CG,GT are global variables, and c is a category variable. We will need this classification later.

Starting a new model
To begin defining a new model, select 'New Model' from the 'New' submenu of the 'File' menu. Make the following selections in the dialog that appears:

We could start from the MG94 matrix and modify it, but it is more instructive to start from scratch. The genetic code will be hardwired into the model, so it can only be used with codon data with that code.
Defining matrix entries
After the initial dialog, the following 61x61 blank table appears:

The main pane of the window shows the rate matrix, and the cell at the intersection of row x and column y represents the rate of change from x to y. Diagonal entries not editable, because their values are determined by the rest of the row entries. All models are assumed to be time-reversible, so the rate matrix is forced to be symmetric in rates.

The simplest way to start defining rate entries is to double click in a cell and enter the rate directly. Let's try that. AAA->AAC is a one step non-synonymous substitution, so the rate should be "R*c*t". Double click in the cell and type that in. HyPhy displays a warning. Why? Presently, there are no model parameters defined at all. The formula you entered, contains 3 (c,R and t). HyPhy is simply confirming whether you want to add these parameters (to safeguard against typos). Dismiss the warning. Note that the cell in AAC->AAA has also been set to maintain symmetry. Annoyingly, the column is now too narrow to display the entire formula. You can resize the column manually, by clicking in the column header on the line between AAA and AAC and dragging it a bit to the right, or let HyPhy auto fit the columns, by clicking on the leftmost button in the buttonbar at the bottom of the window.

The three parameters we just added, should be of types local (t), global (R) and category (c). When HyPhy added them to the model, they were all assumed to be local. To see that, pull down the 'Parameters' menu. 'R' is selected (checked). Select 'Edit Parameter' from the same menu. A dialog appears. We see that 'R' is classed as local, which is not what we want. Change that to global:

Pull down the 'Parameters' menu again. Observe that R is now in the section of global variables. Select 'c', then pull down the 'Parameters' menu once more, select 'Edit Parameter' and change the class of 'c' to category. Pull down the 'Parameters' menu one last time. It should look like this:

Selecting matrix entries
We could, in theory, select rate matrix entries one at a time and enter the rates manually, but with over 250 non-trivial cells to define that will rapidly become painful, and very error prone. Fortunately, HyPhy provides a way to quickly select cells by type. Let us say, we want to select all cells that should have the rate "R*t*c". Those cells would correspond to one step non-synonymous A<->C substitutions. We can do that in a few mouse clicks!

Look at the bottom of the window and notice the pulldowns 'Class' and 'To'. Click on 'Class' - a long menu appears. Choose 'One-step' from that menu. Hmm, nothing happened. To choose all 'One-step' substitutions, click on the 7th button (the first in the second cluster, tooltip 'Replace Current Selection'). Voila, a number of cells became highlighted. Scroll around the table and convince yourself that the selected cells are indeed all one-step substitutions. This is a step in the right direction, but we need to refine this selection. Go back to the 'Class' menu, and find 'Non-synonymous'. We want to take our current selection (all one-step) and intersect it with all non-synonymous substituions. That will produce all non-synonymous one-step substitutions. To accomplish this, click on the intersect button (3rd in the 2nd cluster). Observe that the current selection has been reduced. The cells which are now selected are all non-synonymous one-step substitutions. Lastly, choose 'A-C' from the 'Class' menu and intersect again. We have now selected all one-step non-synonymous A<->C substitutions.

You can use other set operations (Union, Difference and Symmetric Difference) to build up compilcated selections from a set of predefined ones.

Setting values of multiple cells.
Now that we have selected all the cells that have rate "R*t*c", we'd like to assign that rate to all cells. To that end, locate a text box in the right bottom corner of the model window and type in "R*t*c" there. The expression is in red, meaning that HyPhy hasn't verified whether it is a valid formula or not. To verify the formula, click on the check button (or press 'Enter'). The expression turns green - all is good. Now click on the paste button (3rd in the 1st cluster) and watch all the cells in the selection acquire the proper rate. Auto-fit the columns again. The table should look like this now:

Repeat the same procedure for the remaining 11 substitution types: select appropriate cells using 'Class' and intersection, type in the rate into the expression holder, paste to selection. HyPhy will prompt you to confirm the addition of new parameters every time a new rate is added. Accept all the warnings (or turn them off).

Change the type of parameters AG,AT,CG,CT,GT to global, the same way we changed the type of R above.

Selecting equilibrium frequencies.
Select 'Observed Nuc 9 params' from the Eq.Fr. menu, i.e. approximate the frequency of codon (ijk) by the product of the observed frequency of nucleotide i in position 1, j in position 2, k in position 3 (corrected for stop codons). There is a way to add more options to this menu, but it is beyond the scope of this example.
Selecting rate distribution.
Select 'Unit Gamma' from 'Rate Variation' in the 'Model' menu. There is a way to add more options to this menu, but it is beyond the scope of this example.
Saving the model.
Choose 'Save' from the 'File' menu, and give the model a descriptive name (no spaces though), for example: 'MG94xREV_3x4_Univ'.
Editing the model later.
You can edit the model later by using 'Object Inspector' from the 'Windows' menu, choosing 'Models' and double clicking on the model you wish to edit. If you try it with the model we just saved, you will notice that the names of global variables have been prefixed with 'globalVariable_' and for category variables: 'categoryVariable_'. This is normal, HyPhy does it for technical reasons.

You can also delete a model from the Object Inspector.

Using the model.
Once you have saved the model, it becomes available for all analyses with the proper data type (codon in our example). The name of the model will appear in the list of choices in the model column of the data panel.

 
Sergei L. Kosakovsky Pond and Spencer V. Muse, 1997-2002