Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))

psi_1 <- equilibrium_to_fitness(aa_pi[1, ], 1000)
psi_2 <- equilibrium_to_fitness(aa_pi[2, ], 1000)

Create models:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Parent switching model
gtr <- GeneralTimeReversible(
    equilibrium = rep(1/2, 2),
    exchangeability = matrix(rep(1/4, 4), nrow = 2))

# Create codon models

# First fitness vector
ms_1 <- MutationSelection(
    population_size = 1000, 
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi_1)

# Second fitness vector
ms_2 <- MutationSelection(
    population_size = 1000,
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi_2)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(ms_1, 25)

Simulate:

sim <- simulate_with_shared_time_heterogeneity(
    phylogeny = mammals,
    switching_model = gtr,
    switching_rate = 1,
    substitution_models = list(ms_1, ms_2),
    sequence = s,
    start_mode = 0)

This simulation will take considerably more time than any other type. The switch modes are not shared across sites, so the sampling matrices have to be rescaled for each site separately.

Note that we have to specify which mode the root starts with using start_mode argument. We can additionally accelerate switching model substitution rate with switching_rate.

Substitution history

Plot substitution history:

plot(sim, circle_opacity = 0.1)

Synonymous substitutions are shown in green, non-synonymous in red. Since the switch types differ across sites, they are not shown on the phylogeny. After every model switch, there is a burst of non-synonymous substitutions.

Check substitution history table:

head(sim$substitutions)

node	time	from	to	codon_from	codon_to	amino_acid_from	amino_acid_to	synonymous	color
1	0.0484343	4	7	TCT	TCG	S	S	TRUE	#16A35E
2	0.0135992	7	52	TCG	GCG	S	A	FALSE	#E84B2A
2	0.1352313	52	51	GCG	GCA	A	A	TRUE	#16A35E
3	0.0172231	51	59	GCA	GGA	A	G	FALSE	#E84B2A
26	0.0794740	59	60	GGA	GGG	G	G	TRUE	#16A35E
27	0.0722890	59	57	GGA	GGT	G	G	TRUE	#16A35E

Check model switching history table. Each site will have its own set of intervals. Each row corresponds to a tree node (branch). state is the index of the model, from and to are branch time points where the model applies:

head(sim$intervals[[1]])

node	state	from	to
1	0	0	0.071664
2	0	0	0.234728
3	0	0	0.023664
4	0	0	0.020593
5	0	0	0.004937
6	0	0	0.015494

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

Shared Substitution Times

Example

Setup

Substitution history

Alignment