Markov-modulated Mutation Selection (MMMS) is a multi-model extension of the Mutation Selection model. The model contains several mutation selection models, and is allowed to switch between them according to a Markov process. In essence, the MS models are nested within the larger parent Markov process.
The transition rates matrix \(S\) of the MMMS model can be written as:
\[ S = Q^* + R = \begin{bmatrix} Q_1 & 0 & 0 & 0 \\ 0 & Q_2 & 0 & 0 \\ 0 & 0 & Q_3 & 0 \\ 0 & 0 & 0 & Q_4 \\ \end{bmatrix} + \left \{ \begin{bmatrix} . & \alpha I & \beta I & \gamma I \\ \alpha I & . & \delta I & \epsilon I \\ \beta I & \delta I & . & \zeta I \\ \gamma I & \epsilon I & \zeta I & . \\ \end{bmatrix} \begin{bmatrix} \pi_1 & 0 & 0 & 0 \\ 0 & \pi_2 & 0 & 0 \\ 0 & 0 & \pi_3 & 0\\ 0 & 0 & 0 &\pi_4 \\ \end{bmatrix} \right \} \]
\(Q_i\) are the individual Mutation Selection models. The term in curly brackets is the GTR model, which is the parent model, governing the switches between the individual MS models. While GTR is used here, any switching scheme can be applied.
Markov-Modulated processes can be used to model temporal heterogeneity in MS parameters. For example, we can model changing population size by assigning a different population size parameter to each of the sub-models. It is possible to simulate fitness heterogeneity in the same way.
We will simulate switching between models of four different population sizes: \({100, 1000, 10000, 10000}\).
Load model parameters:
# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")
# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)
# Load amino acid equilibrium vector
# Amino acid order is:
# A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y
aa_pi <- as.matrix(read.csv("../data/amino_acid_equilibrium.csv"))
Create model:
# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)
# Parent switching model
gtr <- GeneralTimeReversible(
equilibrium = rep(1/4, 4),
exchangeability = matrix(rep(1/16, 16), nrow = 4))
# Create codon substitution models
pop_sizes <- c(100, 1000, 10000, 10000)
pop_models <- lapply(pop_sizes, function(pop_size) {
psi <- equilibrium_to_fitness(aa_pi[5,], pop_size)
MutationSelection(pop_size, 1e-8, hky, psi)
})
# Markov-Modulated MS
mmms <- MarkovModulatedMutationSelection(pop_models, gtr)
Sample sequence from model equilibrium distribution to start at root:
s <- sample_sequence(model = mmms, length = 100)
Simulate:
sim <- simulate_over_phylogeny(
phylogeny = mammals,
model = mmms,
sequence = s)
Plot substitution history:
plot(sim, sites = 0:10)
Synonymous substiutions are shown in green, non-synonymous in red. Switches between models shown in blue.
Check substitution history table:
head(sim$substitutions)
site | node | time | from | to | model_from | model_to | codon_from | codon_to | synonymous | model_switch | color |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0.0090680 | 165 | 164 | 2 | 2 | AGA | AGC | FALSE | FALSE | #E84B2A |
0 | 2 | 0.0131823 | 164 | 148 | 2 | 2 | AGC | CGC | FALSE | FALSE | #E84B2A |
0 | 2 | 0.0822084 | 148 | 147 | 2 | 2 | CGC | CGT | TRUE | FALSE | #16A35E |
0 | 2 | 0.1295953 | 147 | 149 | 2 | 2 | CGT | CGA | TRUE | FALSE | #16A35E |
0 | 2 | 0.1563959 | 149 | 145 | 2 | 2 | CGA | CAA | FALSE | FALSE | #E84B2A |
0 | 2 | 0.1843145 | 145 | 137 | 2 | 2 | CAA | CTA | FALSE | FALSE | #E84B2A |
To export FASTA
sequence:
as.fasta(sim$alignment, file = "palantir_sim.fa")
Show alignment in a scrollable viewer:
plot(sim$alignment)
de Koning Lab. 2016-2017 lab.jasondk.io