Shared Substitution History

Shared substitution history is a switching simulation type similar to Markov-Modulated Mutation-Selection. However, instead of simulating each site with independent switching in the parent model, the switches are shared across sites. In essence, this is a conditioned simulation, given identities and times of parent model switches.

Unlike Shared Model Time History, this simulation shares times and identities.

This type of simulation is useful to emulate population size switches, such that population size change is common for all sites in the alignment.

Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))

psi <- equilibrium_to_fitness(aa_pi[1, ], 1000)

Create models:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Parent switching model
gtr <- GeneralTimeReversible(
    equilibrium = rep(1/2, 2),
    exchangeability = matrix(rep(1, 4), nrow = 2))

# Create codon models

# Small population size
ms_1k <- MutationSelection(
    population_size = 1000, 
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

# Large population size
ms_20k <- MutationSelection(
    population_size = 20000,
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(ms_1k, 100)

Simulate:

sim <- simulate_with_shared_substitution_heterogeneity(
    phylogeny = mammals,
    switching_model = gtr,
    switching_rate = 10,
    substitution_models = list(ms_1k, ms_20k),
    sequence = s,
    start_mode = 0)

Note that we have to specify which mode the root starts with using start_mode argument. We can additionally accelerate switching model substitution rate with switching_rate.

Substitution history

Plot substitution history:

plot(sim, circle_opacity = 0.1)

Synonymous substitutions are shown in green, non-synonymous in red. Branch modes are shown as colored branches. Each mode has its own color. After every model switch, there is a burst of non-synonymous substitutions.

Check substitution history table:

head(sim$substitutions)
site node time from to codon_from codon_to amino_acid_from amino_acid_to synonymous color
0 23 0.0748879 58 59 GGC GGA G G TRUE #16A35E
0 35 0.0347219 58 59 GGC GGA G G TRUE #16A35E
0 40 0.0386773 58 59 GGC GGA G G TRUE #16A35E
0 70 0.0067702 58 57 GGC GGT G G TRUE #16A35E
0 76 0.0060759 58 57 GGC GGT G G TRUE #16A35E
0 81 0.0362307 58 50 GGC GCC G A FALSE #E84B2A

Check model switching history table. Each row corresponds to a tree node (branch). state is the index of the model, from and to are branch time points where the model applies:

head(sim$intervals)
node state from to
1 0 0 0.071664
2 0 0 0.234728
3 0 0 0.023664
4 0 0 0.020593
5 0 0 0.004937
6 0 0 0.015494

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

de Koning Lab. 2016-2017 lab.jasondk.io