Branch heterogeneous simulations specify different models (called modes
) for each branch on a phylogeny. This is done by providing a newick tree with model indexes instead of branch length, and a list of models.
For more details on time-heterogeneous simulation, see Temporal Heterogeneity Rescaling.
Here is an example file, which represents a model switch on the primate lineage in the mammalian phylogeny:
scan("../data/mammals_switch.newick", what = character(), quiet = T)
## [1] "((((((((((((((Human:1,Pan_troglodytes:1):1,Gorilla_gorilla:1):1,Pongo_abelii:1):1,Nomascus_leucogenys:1):1,(Macaca_mulatta:1,Papio_hamadryas:1):1):1,Callithrix_jacchus:1):1,Tarsius_syrichta:1):1,(Microcebus_murinus:1,Otolemur_garnettii:1):1):1,Tupaia_belangeri:1):1,(((((Mus_musculus:0,Rattus_norvegicus:0):0,Dipodomys_ordii:0):0,Cavia_porcellus:0):0,Ictidomys_tridecemlineatus:0):0,(Oryctolagus_cuniculus:0,Ochotona_princeps:0):0):0):0,((((Vicugna_pacos:0,(Tursiops_truncatus:0,(Bos_taurus:0,Ovis_aries:0):0):0):0,Sus_scrofa:0.):0,((Equus_caballus:0,(Felis_catus:0,((Ailuropoda_melanoleuca:0,Mustela_putorius_furo:0):0,Canis_familiaris:0):0):0):0,(Myotis_lucifugus:0,Pteropus_vampyrus:0):0):0):0,(Erinaceus_europaeus:0,Sorex_araneus:0):0):0):0,(((Loxodonta_africana:0,Procavia_capensis:0):0,Echinops_telfairi:0):0,(Dasypus_novemcinctus:0,Choloepus_hoffmanni:0):0):0):0,(Monodelphis_domestica:0,(Macropus_eugenii:0,Sarcophilus_harrisii:0):0):0):0,Ornithorhynchus_anatinus:0);"
This phylogeny can be loaded with the Phylogeny
constructor, with an additional type = "mode"
argument.
# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")
# Load branch modes
mammal_switches <- Phylogeny("../data/mammals_switch.newick", type = "mode")
We can view the phylogeny and the assigned modes:
plot(mammals, mammal_switches)
In the example below, we simulate on a mammalian phylogeny with a population size expansion on the primate lineage.
Load model parameters:
# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)
# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))
psi <- equilibrium_to_fitness(aa_pi[1, ], 1000)
Create models:
# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)
# Create codon models
# Small population size
ms_1k <- MutationSelection(
population_size = 1000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi)
# Large population size
ms_20k <- MutationSelection(
population_size = 20000,
mutation_rate = 1e-8,
nucleotide_model = hky,
fitness = psi)
Sample sequence from model equilibrium distribution to start at root:
s <- sample_sequence(ms_1k, 100)
Simulate:
sim <- simulate_over_interval_phylogeny(
phylogeny = mammals,
mode_phylogeny = mammal_switches,
models = list(ms_1k, ms_20k),
sequence = s,
start_mode = 0)
Note that we have to specify which mode the root starts with using start_mode
argument.
Plot substitution history:
plot(sim, circle_opacity = 0.1)
Synonymous substiutions are shown in green, non-synonymous in red. Branch modes are shown as colored branches. According to expectation, we observe a burst of non-synonymous substitutions at the start of the population size shift. As time progresses, the number of non-synonymous substitutions goes down.
Check substitution history table:
head(sim$substitutions)
site | node | time | from | to | codon_from | codon_to | amino_acid_from | amino_acid_to | synonymous | color |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 0.2177547 | 26 | 25 | CGC | CGT | R | R | TRUE | #16A35E |
0 | 4 | 0.0202598 | 25 | 41 | CGT | AGT | R | S | FALSE | #E84B2A |
0 | 7 | 0.0025960 | 41 | 57 | AGT | GGT | S | G | FALSE | #E84B2A |
0 | 25 | 0.0197621 | 41 | 57 | AGT | GGT | S | G | FALSE | #E84B2A |
0 | 26 | 0.0127671 | 41 | 37 | AGT | AAT | S | N | FALSE | #E84B2A |
0 | 27 | 0.0152121 | 41 | 57 | AGT | GGT | S | G | FALSE | #E84B2A |
To export FASTA
sequence:
as.fasta(sim$alignment, file = "palantir_sim.fa")
Show alignment in a scrollable viewer:
plot(sim$alignment)
de Koning Lab. 2016-2017 lab.jasondk.io