Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))

psi <- equilibrium_to_fitness(aa_pi[1, ], 1000)

Create models:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Parent switching model
gtr <- GeneralTimeReversible(
    equilibrium = rep(1/2, 2),
    exchangeability = matrix(rep(1, 4), nrow = 2))

# Create codon models

# Small population size
ms_1k <- MutationSelection(
    population_size = 1000, 
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

# Large population size
ms_20k <- MutationSelection(
    population_size = 20000,
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(ms_1k, 100)

Simulate:

sim <- simulate_with_shared_substitution_heterogeneity(
    phylogeny = mammals,
    switching_model = gtr,
    switching_rate = 10,
    substitution_models = list(ms_1k, ms_20k),
    sequence = s,
    start_mode = 0)

Note that we have to specify which mode the root starts with using start_mode argument. We can additionally accelerate switching model substitution rate with switching_rate.

Substitution history

Plot substitution history:

plot(sim, circle_opacity = 0.1)

Synonymous substitutions are shown in green, non-synonymous in red. Branch modes are shown as colored branches. Each mode has its own color. After every model switch, there is a burst of non-synonymous substitutions.

Check substitution history table:

head(sim$substitutions)

node	time	from	to	codon_from	codon_to	amino_acid_from	amino_acid_to	synonymous	color
23	0.0748879	58	59	GGC	GGA	G	G	TRUE	#16A35E
35	0.0347219	58	59	GGC	GGA	G	G	TRUE	#16A35E
40	0.0386773	58	59	GGC	GGA	G	G	TRUE	#16A35E
70	0.0067702	58	57	GGC	GGT	G	G	TRUE	#16A35E
76	0.0060759	58	57	GGC	GGT	G	G	TRUE	#16A35E
81	0.0362307	58	50	GGC	GCC	G	A	FALSE	#E84B2A

Check model switching history table. Each row corresponds to a tree node (branch). state is the index of the model, from and to are branch time points where the model applies:

head(sim$intervals)

node	state	from	to
1	0	0	0.071664
2	0	0	0.234728
3	0	0	0.023664
4	0	0	0.020593
5	0	0	0.004937
6	0	0	0.015494

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

Shared Substitution History

Example

Setup

Substitution history

Alignment