Mutation-Selection

Mutation Selection (MS) model (1, 2) is the core model implemented in PalantiR. Mutation Selection is a codon substitution model that describes the interplay between mutation rate, selection strength, and population size.

For a substitution from codon \(i\) to codon \(j\): \[ s_{ij} = \psi_j - \psi_i \\ Q_{ij} = 2N_e \mu_{ij} \frac{1-e^{s_{ij}}}{1-e^{-2N_es_{ij}}} \] Only codons differing in one nucleotide positions are considered. Rates of transitions for multiple codon differences are \(0\).

The substitution rates are also scaled to assign particular meaning to branch length of a phylogeny. For more detail, see Rate Scaling.

Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid fitness vector
# Amino acid order is:
# A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y
aa_psi <- as.matrix(read.csv("../data/amino_acid_fitness_N_1000.csv"))

Create model:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Create codon substitution model
ms <- MutationSelection(
    population_size = 1000,
    mutation_rate = 1e-8,
    nucleotide_model = hky,
    fitness = aa_psi[1,])

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(model = ms, length = 100)

Simulate:

sim <- simulate_over_phylogeny(
    phylogeny = mammals,
    model = ms,
    sequence = s)

Substitution history

Plot substitution history:

plot(sim, sites = 0:10)

Synonymous substiutions are shown in green, non-synonymous in red.

Check substitution history table:

head(sim$substitutions)
site node time from to codon_from codon_to amino_acid_from amino_acid_to synonymous color
0 7 0.0077853 39 38 AAA AAC K N FALSE #E84B2A
0 8 0.0408456 38 9 AAC TAC N Y FALSE #E84B2A
0 23 0.1364336 38 22 AAC CAC N H FALSE #E84B2A
0 25 0.0172276 39 38 AAA AAC K N FALSE #E84B2A
0 25 0.0749027 38 42 AAC AGC N S FALSE #E84B2A
0 25 0.0794302 42 43 AGC AGA S R FALSE #E84B2A

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

References

  1. Golding, B. and Felsenstein, J. (1990). A maximum likelihood approach to the detection of selection from a phylogeny. Journal of Molecular Evolution, 31(6):511–523.
  2. Yang, Z. and Nielsen, R. (2008). Mutation-Selection Models of Codon Substitution and Their Use to Estimate Selective Strengths on Codon Usage. Molecular Biology and Evolution. 25(3):568-579.
  3. Hasegawa, M., Kishino, H. and Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160-174.

de Koning Lab. 2016-2017 lab.jasondk.io