Load model parameters:
# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")
# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)
# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))
psi <- equilibrium_to_fitness(aa_pi[1, ], 1000)
Create models:
# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)
# Parent switching model
gtr <- GeneralTimeReversible(
equilibrium = rep(1/2, 2),
exchangeability = matrix(rep(1, 4), nrow = 2))
# Create codon models
# Small population size
ms_1k <- MutationSelection(
population_size = 1000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi)
# Large population size
ms_20k <- MutationSelection(
population_size = 20000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi)
Sample sequence from model equilibrium distribution to start at root:
s <- sample_sequence(ms_1k, 100)
Simulate:
sim <- simulate_with_shared_substitution_heterogeneity(
phylogeny = mammals,
switching_model = gtr,
switching_rate = 10,
substitution_models = list(ms_1k, ms_20k),
sequence = s,
start_mode = 0)
Note that we have to specify which mode the root starts with using start_mode
argument. We can additionally accelerate switching model substitution rate with switching_rate
.
Plot substitution history:
plot(sim, circle_opacity = 0.1)
Synonymous substitutions are shown in green, non-synonymous in red. Branch modes are shown as colored branches. Each mode has its own color. After every model switch, there is a burst of non-synonymous substitutions.
Check substitution history table:
head(sim$substitutions)
site | node | time | from | to | codon_from | codon_to | amino_acid_from | amino_acid_to | synonymous | color |
---|---|---|---|---|---|---|---|---|---|---|
0 | 23 | 0.0748879 | 58 | 59 | GGC | GGA | G | G | TRUE | #16A35E |
0 | 35 | 0.0347219 | 58 | 59 | GGC | GGA | G | G | TRUE | #16A35E |
0 | 40 | 0.0386773 | 58 | 59 | GGC | GGA | G | G | TRUE | #16A35E |
0 | 70 | 0.0067702 | 58 | 57 | GGC | GGT | G | G | TRUE | #16A35E |
0 | 76 | 0.0060759 | 58 | 57 | GGC | GGT | G | G | TRUE | #16A35E |
0 | 81 | 0.0362307 | 58 | 50 | GGC | GCC | G | A | FALSE | #E84B2A |
Check model switching history table. Each row corresponds to a tree node (branch). state
is the index of the model, from
and to
are branch time points where the model applies:
head(sim$intervals)
node | state | from | to |
---|---|---|---|
1 | 0 | 0 | 0.071664 |
2 | 0 | 0 | 0.234728 |
3 | 0 | 0 | 0.023664 |
4 | 0 | 0 | 0.020593 |
5 | 0 | 0 | 0.004937 |
6 | 0 | 0 | 0.015494 |
To export FASTA
sequence:
as.fasta(sim$alignment, file = "palantir_sim.fa")
Show alignment in a scrollable viewer:
plot(sim$alignment)
de Koning Lab. 2016-2017 lab.jasondk.io