Hasegawa-Kishino-Yano

Hasegawa Kishino Yano (HKY) [1] is a nucleotide substitution model which accounts for nucleotide equilibrium distribution rates of transitions (\(\alpha\)) and transversions(\(\beta\)).

\[ Q_{ij} = \begin{bmatrix} . & \alpha \pi_C & \beta \pi_A & \beta \pi_G \\ \alpha \pi_T & . & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & . & \alpha \pi_G \\ \beta \pi_T & \beta \pi_C & \alpha \pi_A & . \\ \end{bmatrix} \] In PalantiR The HKY model is most commonly used to define nucleotide transition rates for Mutation Selection model, but it can also be used as a stand-alone simulation model.

Example

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
pi <- c(.3, .2, .25, .25)

Create model:

hky <- HasegawaKishinoYano(
    equilibrium = pi,
    transition_rate = 2,
    transversion_rate = 1)

We first need to provide a sequence that to start off the simulation at the root. We can sample a sequence from the equilibrium distribution of the model we are simulating:

s <- sample_sequence(model = hky, length = 100)

Simulate:

sim <- simulate_over_phylogeny(
    phylogeny = mammals, 
    model = hky, 
    sequence = s)

Substitution history

Plot substitution history:

plot(sim, sites = 0:10)

Transition substitutions are shown in green, transversions in red.

Check substitution history table:

head(sim$substitutions)
site node time from to nucleotide_from nucleotide_to transition color
0 2 0.0559405 1 3 C G FALSE #E84B2A
0 4 0.0201200 3 2 G A TRUE #16A35E
0 27 0.1591636 2 1 A C FALSE #E84B2A
0 30 0.0073097 2 0 A T FALSE #E84B2A
0 36 0.1160653 0 2 T A FALSE #E84B2A
0 38 0.0052137 2 0 A T FALSE #E84B2A

We can change the colors on the plot by changing columns in the substitutions dataframe:

transition_substitutions <- sim$substitutions$transition == TRUE
sim$substitutions$color[transition_substitutions] <- "orange"
sim$substitutions$color[!transition_substitutions] <- "blue"
plot(sim, sites = 0:10)

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

References

  1. Hasegawa, M., Kishino, H. and Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160-174.

de Koning Lab. 2016-2017 lab.jasondk.io