Branch Heterogeneity

Branch heterogeneous simulations specify different models (called modes) for each branch on a phylogeny. This is done by providing a newick tree with model indexes instead of branch length, and a list of models.

For more details on time-heterogeneous simulation, see Temporal Heterogeneity Rescaling.

Here is an example file, which represents a model switch on the primate lineage in the mammalian phylogeny:

scan("../data/mammals_switch.newick", what = character(), quiet = T)
## [1] "((((((((((((((Human:1,Pan_troglodytes:1):1,Gorilla_gorilla:1):1,Pongo_abelii:1):1,Nomascus_leucogenys:1):1,(Macaca_mulatta:1,Papio_hamadryas:1):1):1,Callithrix_jacchus:1):1,Tarsius_syrichta:1):1,(Microcebus_murinus:1,Otolemur_garnettii:1):1):1,Tupaia_belangeri:1):1,(((((Mus_musculus:0,Rattus_norvegicus:0):0,Dipodomys_ordii:0):0,Cavia_porcellus:0):0,Ictidomys_tridecemlineatus:0):0,(Oryctolagus_cuniculus:0,Ochotona_princeps:0):0):0):0,((((Vicugna_pacos:0,(Tursiops_truncatus:0,(Bos_taurus:0,Ovis_aries:0):0):0):0,Sus_scrofa:0.):0,((Equus_caballus:0,(Felis_catus:0,((Ailuropoda_melanoleuca:0,Mustela_putorius_furo:0):0,Canis_familiaris:0):0):0):0,(Myotis_lucifugus:0,Pteropus_vampyrus:0):0):0):0,(Erinaceus_europaeus:0,Sorex_araneus:0):0):0):0,(((Loxodonta_africana:0,Procavia_capensis:0):0,Echinops_telfairi:0):0,(Dasypus_novemcinctus:0,Choloepus_hoffmanni:0):0):0):0,(Monodelphis_domestica:0,(Macropus_eugenii:0,Sarcophilus_harrisii:0):0):0):0,Ornithorhynchus_anatinus:0);"

This phylogeny can be loaded with the Phylogeny constructor, with an additional type = "mode" argument.

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load branch modes
mammal_switches <- Phylogeny("../data/mammals_switch.newick", type = "mode")

We can view the phylogeny and the assigned modes:

plot(mammals, mammal_switches)

Example

In the example below, we simulate on a mammalian phylogeny with a population size expansion on the primate lineage.

Setup

Load model parameters:

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid equilibrium frequencies
# These will be converted to population-specific fitness vectors
aa_pi <- as.matrix(read.table("../data/psi-c50-1.txt"))

psi <- equilibrium_to_fitness(aa_pi[1, ], 1000)

Create models:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Create codon models

# Small population size
ms_1k <- MutationSelection(
    population_size = 1000, 
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

# Large population size
ms_20k <- MutationSelection(
    population_size = 20000,
    mutation_rate = 1e-8, 
    nucleotide_model = hky,
    fitness = psi)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(ms_1k, 100)

Simulate:

sim <- simulate_over_interval_phylogeny(
    phylogeny = mammals,
    mode_phylogeny = mammal_switches,
    models = list(ms_1k, ms_20k),
    sequence = s,
    start_mode = 0)

Note that we have to specify which mode the root starts with using start_mode argument.

Substitution history

Plot substitution history:

plot(sim, circle_opacity = 0.1)

Synonymous substiutions are shown in green, non-synonymous in red. Branch modes are shown as colored branches. According to expectation, we observe a burst of non-synonymous substitutions at the start of the population size shift. As time progresses, the number of non-synonymous substitutions goes down.

Check substitution history table:

head(sim$substitutions)
site node time from to codon_from codon_to amino_acid_from amino_acid_to synonymous color
0 2 0.2177547 26 25 CGC CGT R R TRUE #16A35E
0 4 0.0202598 25 41 CGT AGT R S FALSE #E84B2A
0 7 0.0025960 41 57 AGT GGT S G FALSE #E84B2A
0 25 0.0197621 41 57 AGT GGT S G FALSE #E84B2A
0 26 0.0127671 41 37 AGT AAT S N FALSE #E84B2A
0 27 0.0152121 41 57 AGT GGT S G FALSE #E84B2A

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

de Koning Lab. 2016-2017 lab.jasondk.io