Co-Evolution

Co-Evolution is an extension of a Mutation-Selection codon model, which considers pairs of codons.

The model adds an amino-acid correlation matrix \(\delta\), which expresses how “linked” two amino acid states are within a pair.

For example, \(\delta_{A, K} > 1\) means that \(AK\) is a favoured pair, while \(\delta_{W, Y} < 1\) means \(WY\) is disfavoured. \(\delta = 1\) signifies lack of association.

The selection coefficient is calculated for a pair of sites, assuming additive fitness.

For a substitution from codon pair \(ij\) to codon pair \(kl\):

\[ s_{ij, kl} = (\psi_k + \psi_l)\delta_{kl} - (\psi_i + \psi_j)\delta_{ij} \\ Q_{ij, kl} = 2N_e \mu_{ik} \mu_{jl} \frac{1-e ^ {s_{ij, kl}}}{1-e^{-2 N_e s_{ij, kl}}} \] Only codons differing in one nucleotide positions are considered. Rates of transitions for multiple codon differences are \(0\).

Example

Setup

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
nuc_pi <- c(.3, .2, .25, .25)

# Load amino acid fitness vector
# Amino acid order is:
# A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y
aa_psi <- as.matrix(read.csv("../data/amino_acid_fitness_N_1000.csv"))

# Create a "flat" delta matrix - all pairs are independent
delta <- matrix(rep(1, 20 * 20), nrow = 20)

Create model:

# Create nucleotide substitution model
hky <- HasegawaKishinoYano(nuc_pi)

# Create codon pair substitution model
co <- CoEvolution(
    population_size = 1000,
    mutation_rate = 1e-8,
    nucleotide_mode = hky,
    fitness_1 = aa_psi[1,],
    fitness_2 = aa_psi[2,],
    delta = delta)

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(model = co, length = 100)

Simulate:

sim <- simulate_over_phylogeny(
    phylogeny = mammals,
    model = co,
    sequence = s)

Substitution history

Plot substitution history:

plot(sim)

Synonymous substiutions are shown in green, non-synonymous in red.

Check substitution history table:

head(sim$substitutions)
site node time from to pair_from pair_to pair_amino_acid_from pair_amino_acid_to first_synonymous second_synonymous first_color second_color
0 66 0.2115673 3288 2800 GAT,GAA GTT,GAA D,E V,E FALSE TRUE #E84B2A #16A35E
3 45 0.0371289 1338 850 CAT,GGT CTT,GGT H,G L,G FALSE TRUE #E84B2A #16A35E
4 37 0.0562539 3359 3372 GAA,TCT GAA,CCT E,S E,P TRUE FALSE #16A35E #E84B2A
10 24 0.0287241 1193 1254 CCA,ACC CCG,ACC P,T P,T TRUE TRUE #16A35E #16A35E
10 61 0.0048337 1193 3145 CCA,ACC GCA,ACC P,T A,T FALSE TRUE #E84B2A #16A35E
10 66 0.0363491 1193 2169 CCA,ACC ACA,ACC P,T T,T FALSE TRUE #E84B2A #16A35E

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

de Koning Lab. 2016-2017 lab.jasondk.io