Load model parameters:
# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")
# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)
# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))
psi_1 <- equilibrium_to_fitness(aa_pi[1, ], 1000)
psi_2 <- equilibrium_to_fitness(aa_pi[2, ], 1000)
Create models:
# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)
# Parent switching model
gtr <- GeneralTimeReversible(
equilibrium = rep(1/2, 2),
exchangeability = matrix(rep(1/4, 4), nrow = 2))
# Create codon models
# First fitness vector
ms_1 <- MutationSelection(
population_size = 1000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi_1)
# Second fitness vector
ms_2 <- MutationSelection(
population_size = 1000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi_2)
Sample sequence from model equilibrium distribution to start at root:
s <- sample_sequence(ms_1, 25)
Simulate:
sim <- simulate_with_shared_time_heterogeneity(
phylogeny = mammals,
switching_model = gtr,
switching_rate = 1,
substitution_models = list(ms_1, ms_2),
sequence = s,
start_mode = 0)
This simulation will take considerably more time than any other type. The switch modes are not shared across sites, so the sampling matrices have to be rescaled for each site separately.
Note that we have to specify which mode the root starts with using start_mode
argument. We can additionally accelerate switching model substitution rate with switching_rate
.
Plot substitution history:
plot(sim, circle_opacity = 0.1)
Synonymous substitutions are shown in green, non-synonymous in red. Since the switch types differ across sites, they are not shown on the phylogeny. After every model switch, there is a burst of non-synonymous substitutions.
Check substitution history table:
head(sim$substitutions)
site | node | time | from | to | codon_from | codon_to | amino_acid_from | amino_acid_to | synonymous | color |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0.0484343 | 4 | 7 | TCT | TCG | S | S | TRUE | #16A35E |
0 | 2 | 0.0135992 | 7 | 52 | TCG | GCG | S | A | FALSE | #E84B2A |
0 | 2 | 0.1352313 | 52 | 51 | GCG | GCA | A | A | TRUE | #16A35E |
0 | 3 | 0.0172231 | 51 | 59 | GCA | GGA | A | G | FALSE | #E84B2A |
0 | 26 | 0.0794740 | 59 | 60 | GGA | GGG | G | G | TRUE | #16A35E |
0 | 27 | 0.0722890 | 59 | 57 | GGA | GGT | G | G | TRUE | #16A35E |
Check model switching history table. Each site will have its own set of intervals. Each row corresponds to a tree node (branch). state
is the index of the model, from
and to
are branch time points where the model applies:
head(sim$intervals[[1]])
node | state | from | to |
---|---|---|---|
1 | 0 | 0 | 0.071664 |
2 | 0 | 0 | 0.234728 |
3 | 0 | 0 | 0.023664 |
4 | 0 | 0 | 0.020593 |
5 | 0 | 0 | 0.004937 |
6 | 0 | 0 | 0.015494 |
To export FASTA
sequence:
as.fasta(sim$alignment, file = "palantir_sim.fa")
Show alignment in a scrollable viewer:
plot(sim$alignment)
de Koning Lab. 2016-2017 lab.jasondk.io