Description
|
This example illustrates how to define a complicated model
using the model component of HyPhy GUI. The model component can
be used to define rate matrices, and, to some extent, equilibrium
frequencies.
We will define a codon model, which is an extension of the MG94
model, and whose rate matrix is a cross between MG94 and the
general reversible rate matrix for nucleotide substitution.
While this model is quite tedious to define, it will illustrate some
concepts that are useful for defining simpler models.
|
The rate matrix
|
Each model in HyPhy is defined by its rate matrix and the vector
of equilibrium frequencies. The first step in model definition
is to define the rate matrix
For our example, the general rate matrix entry for substituting codon x with codon y is:
- t if x->y is a one step synonymous A->C or C->A
- t*AG if x->y is a one step synonymous A->G or G->A
- t*AT if x->y is a one step synonymous A->T or G->T
- t*CG if x->y is a one step synonymous C->G or G->C
- t*CT if x->y is a one step synonymous C->T or T->C
- t*GT if x->y is a one step synonymous G->T or T->G
- R*t if x->y is a one step non-synonymous A->C or C->A
- R*t*AG if x->y is a one step non-synonymous A->G or G->A
- R*t*AT if x->y is a one step non-synonymous A->T or G->T
- R*t*CG if x->y is a one step non-synonymous C->G or G->C
- R*t*CT if x->y is a one step non-synonymous C->T or T->C
- R*t*GT if x->y is a one step non-synonymous G->T or T->G
t is the branch length, and all other terms are rates shared by
all branches in the tree. R is the non-synonymous/synonymous
ratio, which we will assume is drawn from the general gamma distribution.
We represent that by replacing R with c*R where
c is drawn from the unit mean gamma. Also, rate matrix entries
should be multiplied by the equilibrium frequencies of the target nucleotide,
but we omit that for brevity, and this multiplication will be taken care
of automatically, when we select the equilibrium frequencies.
In HyPhy terms, t is a local variable, R,AC,AG,AT,CT,CG,GT
are global variables, and c is a category variable.
We will need this classification later.
|
Starting a new model
|
To begin defining a new model, select 'New Model' from the 'New' submenu
of the 'File' menu. Make the following selections in the dialog that appears:
We could start from the MG94 matrix and modify it, but it is
more instructive to start from scratch. The genetic code will
be hardwired into the model, so it can only be used with codon
data with that code.
|
Defining matrix entries
|
After the initial dialog, the following 61x61 blank table appears:
The main pane of the window shows the rate matrix, and the cell at
the intersection of row x and column y represents the rate of
change from x to y. Diagonal entries not editable,
because their values are determined by the rest of the row entries.
All models are assumed to be time-reversible, so the rate matrix
is forced to be symmetric in rates.
The simplest way to start defining rate entries is to double click
in a cell and enter the rate directly. Let's try that. AAA->AAC is
a one step non-synonymous substitution, so the rate should be "R*c*t".
Double click in the cell and type that in. HyPhy displays a warning.
Why? Presently, there are no model parameters defined at all. The
formula you entered, contains 3 (c,R and t). HyPhy is simply confirming
whether you want to add these parameters (to safeguard against
typos). Dismiss the warning. Note that the cell in AAC->AAA has also
been set to maintain symmetry. Annoyingly, the column is now too narrow
to display the entire formula. You can resize the column manually,
by clicking in the column header on the line between AAA and AAC and
dragging it a bit to the right, or let HyPhy auto fit the columns,
by clicking on the leftmost button in the buttonbar at the bottom
of the window.
The three parameters we just added, should be of types local (t),
global (R) and category (c). When HyPhy added them to the model,
they were all assumed to be local. To see that, pull down the
'Parameters' menu. 'R' is selected (checked). Select 'Edit Parameter'
from the same menu. A dialog appears. We see that 'R' is classed
as local, which is not what we want. Change that to global:
Pull down the 'Parameters' menu again. Observe that R is now
in the section of global variables. Select 'c', then pull down
the 'Parameters' menu once more, select 'Edit Parameter' and
change the class of 'c' to category. Pull down the 'Parameters'
menu one last time. It should look like this:
|
Selecting matrix entries
|
We could, in theory, select rate matrix entries one at a time and enter
the rates manually, but with over 250 non-trivial cells to define that
will rapidly become painful, and very error prone. Fortunately, HyPhy
provides a way to quickly select cells by type. Let us say, we want
to select all cells that should have the rate "R*t*c". Those cells
would correspond to one step non-synonymous A<->C substitutions.
We can do that in a few mouse clicks!
Look at the bottom of the window and notice the pulldowns 'Class'
and 'To'. Click on 'Class' - a long menu appears. Choose 'One-step'
from that menu. Hmm, nothing happened. To choose all 'One-step'
substitutions, click on the 7th button (the first in the second
cluster, tooltip 'Replace Current Selection'). Voila, a number
of cells became highlighted. Scroll around the table and convince
yourself that the selected cells are indeed all one-step substitutions.
This is a step in the right direction, but we need to refine this selection.
Go back to the 'Class' menu, and find 'Non-synonymous'. We want
to take our current selection (all one-step) and intersect
it with all non-synonymous substituions. That will produce
all non-synonymous one-step substitutions. To accomplish this,
click on the intersect button (3rd in the 2nd cluster). Observe
that the current selection has been reduced. The cells which are now selected
are all non-synonymous one-step substitutions. Lastly, choose 'A-C'
from the 'Class' menu and intersect again. We have now selected all
one-step non-synonymous A<->C substitutions.
You can use other set operations (Union, Difference and Symmetric
Difference) to build up compilcated selections from a set
of predefined ones.
|
Setting values of multiple cells.
|
Now that we have selected all the cells that have rate "R*t*c", we'd like
to assign that rate to all cells. To that end, locate a text box
in the right bottom corner of the model window and type in "R*t*c" there.
The expression is in red, meaning that HyPhy hasn't verified whether
it is a valid formula or not. To verify the formula, click on the check
button (or press 'Enter'). The expression turns green - all is good. Now click
on the paste button (3rd in the 1st cluster) and watch all the
cells in the selection acquire the proper rate. Auto-fit the columns again.
The table should look like this now:
Repeat the same procedure for the remaining 11 substitution types: select
appropriate cells using 'Class' and intersection, type in the rate
into the expression holder, paste to selection. HyPhy will prompt
you to confirm the addition of new parameters every time a new
rate is added. Accept all the warnings (or turn them off).
Change the type of parameters AG,AT,CG,CT,GT to global, the same
way we changed the type of R above.
|
Selecting equilibrium frequencies.
|
Select 'Observed Nuc 9 params' from the Eq.Fr. menu, i.e.
approximate the frequency of codon (ijk) by the product
of the observed frequency of nucleotide i in position 1,
j in position 2, k in position 3 (corrected for stop codons).
There is a way to add more options to this menu, but it is
beyond the scope of this example.
|
Selecting rate distribution.
|
Select 'Unit Gamma' from 'Rate Variation' in the 'Model' menu.
There is a way to add more options to this menu, but it is
beyond the scope of this example.
|
Saving the model.
|
Choose 'Save' from the 'File' menu, and give the model
a descriptive name (no spaces though), for example:
'MG94xREV_3x4_Univ'.
|
Editing the model later.
|
You can edit the model later by using 'Object
Inspector' from the 'Windows' menu, choosing
'Models' and double clicking on the model
you wish to edit. If you try it with the model
we just saved, you will notice that the names
of global variables have been prefixed with
'globalVariable_' and for category variables:
'categoryVariable_'. This is normal, HyPhy
does it for technical reasons.
You can also delete a model from the Object
Inspector.
|
Using the model.
|
Once you have saved the model, it becomes
available for all analyses with the proper
data type (codon in our example). The name
of the model will appear in the
list of choices in the model column
of the data panel.
|
|