General Time-Reversible

General Time-Reversible (GTR) [1] is the most general nucleotide substitution model which holds the time-reversibility property.

\[ Q_{ij} = \begin{bmatrix} . & \alpha I & \beta I & \gamma I \\ \alpha I & . & \delta I & \epsilon I \\ \beta I & \delta I & . & \zeta I \\ \gamma I & \epsilon I & \zeta I & . \\ \end{bmatrix} \begin{bmatrix} \pi_1 & 0 & 0 & 0 \\ 0 & \pi_2 & 0 & 0 \\ 0 & 0 & \pi_3 & 0\\ 0 & 0 & 0 &\pi_4 \\ \end{bmatrix} \] In PalantiR, GTR is used as a general state-transition model. It can also be simulated from directly.

Example

Load model parameters:

# Load phylogeny
mammals <- Phylogeny("../data/mammals.newick")

# Load nucleotide equilibrium distribution
# Nucleotide order is T,C,A,G
pi <- c(.3, .2, .25, .25)

# Load exchangeability parameters
e <- c(a = 1, b = 2, g = 2, d = 1, e = 1, z = 2)
ex <- matrix(c(0,      e["a"], e["b"], e["g"],
               e["a"], 0     , e["d"], e["e"],
               e["b"], e["d"], 0     , e["z"],
               e["g"], e["e"], e["z"], 0), nrow = 4)

Create model:

gtr <- GeneralTimeReversible(
    equilibrium = pi,
    exchangeability = ex)

# To simulate from GTR as a nucleotide model, model type needs to be set manually:
gtr$type <- "nucleotide"

Sample sequence from model equilibrium distribution to start at root:

s <- sample_sequence(model = gtr, length = 100)

Simulate:

sim <- simulate_over_phylogeny(
    phylogeny = mammals, 
    model = gtr, 
    sequence = s)

Substitution history

Plot substitution history:

plot(sim, sites = 0:10)

Transition substitutions are shown in green, transversions in red.

Check substitution history table:

head(sim$substitutions)
site node time from to nucleotide_from nucleotide_to transition color
0 1 0.0089000 3 2 G A TRUE #16A35E
0 2 0.1208661 2 3 A G TRUE #16A35E
0 50 0.0218375 3 0 G T FALSE #E84B2A
0 72 0.0017142 3 1 G C FALSE #E84B2A
0 73 0.0382366 3 2 G A TRUE #16A35E
0 73 0.1461714 2 3 A G TRUE #16A35E

Alignment

To export FASTA sequence:

as.fasta(sim$alignment, file = "palantir_sim.fa")

Show alignment in a scrollable viewer:

plot(sim$alignment)

References

  1. Tavaré, S. (1986). Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences (PDF). Lectures on Mathematics in the Life Sciences. American Mathematical Society 17:57–86.

de Koning Lab. 2016-2017 lab.jasondk.io