Shared Substitution Times

Shared substitution times is a switching simulation type similar to Markov-Modulated Mutation-Selection. However, instead of simulating each site with independent switching in the parent model, the switch times are shared across sites. In essence, this is a conditioned simulation, given times of parent model switches.

Unlike Shared Model Substitution History, this simulation does not share switch identity.

This type of simulation is useful to emulate fitness switches, such that fitness changes at every site at the same time. However, the identity of the fitness switch might differ.

Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))

psi_1 <- equilibrium_to_fitness(aa_pi[1, ], 1000)
psi_2 <- equilibrium_to_fitness(aa_pi[2, ], 1000)

Create models:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Parent switching model
gtr <- GeneralTimeReversible(
    equilibrium = rep(1/2, 2),
    exchangeability = matrix(rep(1/4, 4), nrow = 2))

# Create codon models

# First fitness vector
ms_1 <- MutationSelection(
    population_size = 1000, 
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi_1)

# Second fitness vector
ms_2 <- MutationSelection(
    population_size = 1000,
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi_2)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(ms_1, 25)

Simulate:

sim <- simulate_with_shared_time_heterogeneity(
    phylogeny = mammals,
    switching_model = gtr,
    switching_rate = 1,
    substitution_models = list(ms_1, ms_2),
    sequence = s,
    start_mode = 0)

This simulation will take considerably more time than any other type. The switch modes are not shared across sites, so the sampling matrices have to be rescaled for each site separately.

Note that we have to specify which mode the root starts with using start_mode argument. We can additionally accelerate switching model substitution rate with switching_rate.

Substitution history

Plot substitution history:

plot(sim, circle_opacity = 0.1)

Synonymous substitutions are shown in green, non-synonymous in red. Since the switch types differ across sites, they are not shown on the phylogeny. After every model switch, there is a burst of non-synonymous substitutions.

Check substitution history table:

head(sim$substitutions)
site node time from to codon_from codon_to amino_acid_from amino_acid_to synonymous color
0 1 0.0484343 4 7 TCT TCG S S TRUE #16A35E
0 2 0.0135992 7 52 TCG GCG S A FALSE #E84B2A
0 2 0.1352313 52 51 GCG GCA A A TRUE #16A35E
0 3 0.0172231 51 59 GCA GGA A G FALSE #E84B2A
0 26 0.0794740 59 60 GGA GGG G G TRUE #16A35E
0 27 0.0722890 59 57 GGA GGT G G TRUE #16A35E

Check model switching history table. Each site will have its own set of intervals. Each row corresponds to a tree node (branch). state is the index of the model, from and to are branch time points where the model applies:

head(sim$intervals[[1]])
node state from to
1 0 0 0.071664
2 0 0 0.234728
3 0 0 0.023664
4 0 0 0.020593
5 0 0 0.004937
6 0 0 0.015494

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

de Koning Lab. 2016-2017 lab.jasondk.io