This vignette provides information
about how the package stores and outputs information about fossil
sampling times via the fossils
object and available options
for simulating fossil recovery.
The fossils
object contains information about fossil
sampling times and their relationship to the corresponding
phylo
and taxonomy
objects. In some simulation
conditions we may know the age of fossils precisely. Alternatively,
there may be some uncertainty associated with specimen ages, i.e. we
only know the age to within some stratigraphic interval, which can be
incorporated into the fossils
object.
The object contains a dataframe with species and age information for each fossil with the following 4 fields:
sp
the label of the corresponding species. This
label matches the edge labels in the corresponding phylo
object or the species labels in the corresponding taxonomy
object if taxonomic information was provided
edge
the label of the sampled node or tip in the
phylogeny, i.e the node at the end of the edge along which the fossil
was sampled
hmin
the age of the fossil or the youngest bound of
the time interval in which the fossil was sampled
hmax
the oldest bound of the time interval in which
the fossil was sampled. This is equal to hmin
if exact
sampling times are known
The object also contains a logical variable
from.taxonomy
indicating whether the fossil record was
simulated using a taxonomy
object, in which case
from.taxonomy = TRUE
, or from a phylo
object
in which case from.taxonomy = FALSE
(the default).
The simplest model of fossil recovery is a Poisson sampling process,
where fossils are sampled along lineages with constant rate ψ (function argument
rate
).
Fossils can be simulated using a phylo
object or an
existing taxonomy
object generated using the function
sim.taxonomy
.
If a phylo
rather than a taxonomy
object is
passed to the function, it will assume that all speciation has occurred
via symmetric (i.e. bifurcating, β = 1) speciation, i.e. every edge
in the tree represents a unique species. This applies to all functions
used to simulate fossils.
# set the random number generator seed to generate the same results using the same code
set.seed(1234)
# simulate a tree using TreeSim conditioned on tip number
lambda = 1
mu = 0.2
tips = 8
t = TreeSim::sim.bd.taxa(n = tips, numbsim = 1, lambda = lambda, mu = mu)[[1]]
# t is an object of class phylo
t
#>
#> Phylogenetic tree with 8 tips and 7 internal nodes.
#>
#> Tip labels:
#> t8, t4, t3, t1, t6, t2, ...
#>
#> Rooted; includes branch lengths.
# use t$edge, t$edge.length, t$root.edge to see the tree attributes
# Simulate fossils
rate = 3 # poisson sampling rate
f = sim.fossils.poisson(rate = rate, tree = t)
f
#> sp edge hmin hmax
#> 1 1 1 0.58684752 0.58684752
#> 2 1 1 0.17795617 0.17795617
#> 3 1 1 0.58613520 0.58613520
#> 4 1 1 0.57453591 0.57453591
#> 5 2 2 0.14585530 0.14585530
#> 6 2 2 0.70851792 0.70851792
#> 7 2 2 0.72224865 0.72224865
#> 8 6 6 0.82173813 0.82173813
#> 9 6 6 0.35835122 0.35835122
#> 10 7 7 0.04035432 0.04035432
#> ...
#> Fossil record with 19 occurrences representing 9 species
#> Fossil record not simulated using taxonomy: all speciation events are assumed to be symmetric
Each entry in the fossils
dataframe corresponds to a
fossil sampling time. sp
and edge
are the
species and edge labels, respectively, which will be the same here since
we assume symmetric speciation.
hmin
and hmax
are the youngest and oldest
sampling ages associated with each fossil, respectively. Since this
function returns exact fossil sampling times hmin
and
hmax
are equal.
# simulate taxonomy under mixed speciation
beta = 0.5 # probability of symmetric speciation
lambda.a = 1.2 # rate of anagenetic speciation
s = sim.taxonomy(tree = t, beta = beta, lambda.a = lambda.a)
# simulate fossils
f = sim.fossils.poisson(rate = rate, taxonomy = s)
f
#> sp edge hmin hmax
#> 1 1 1 1.1276605 1.1276605
#> 2 1 1 0.7431218 0.7431218
#> 3 1 1 0.1493434 0.1493434
#> 4 4 15 0.8912792 0.8912792
#> 5 4 15 0.4560055 0.4560055
#> 6 7 7 0.4670495 0.4670495
#> 7 7 7 0.4936152 0.4936152
#> 8 7 7 0.6791209 0.6791209
#> 9 7 7 0.3928259 0.3928259
#> 10 16 2 0.3277610 0.3277610
#> ...
#> Fossil record with 31 occurrences representing 12 species
#> Fossils record simulated from or assigned to an existing taxonomy
# plot the output and color fossils by taxonomy
plot(f, tree = t, taxonomy = s, show.taxonomy = TRUE)
Note that sp
and edge
are not all equal,
since some species are assigned to multiple edges, and some edges are
associated with multiple species.
Under the interval-dependent model, fossil recovery changes in a
piece-wise manner across a set of user defined intervals. Interval ages
can be specified using a fixed set of uniform length intervals, by
passing the function the maximum interval age (max.age
) and
the number of intervals (strata
), or a vector of
non-uniform intervals (interval.ages
).
The function tree.max
can be used to assign a maximum
age based on tree age. If the tree has a root.edge
the
function will return the origin time, otherwise the function returns the
root age.
# define max interval age based on tree age
max.age = tree.max(t)
# define the number of intervals
strata = 4
# define a vector of sampling rates, where the first entry corresponds the youngest interval
rates = c(1, 0.1, 1, 0.1)
# simulate fossils
f = sim.fossils.intervals(tree = t, rates = rates, max.age = max.age, strata = strata)
# plot the output
plot(f, tree = t, show.strata = TRUE, strata = strata)
If no maximum is provided to the plotting function and
strata
> 1, the function tree.max
is used
to define max.age
.
# the following will sample a random set of interval ages between 0 and max.age
times = c(0, sort(runif(3, min = 0, max = max.age)), max.age)
# define a vector of sampling rates from youngest to oldest
rates = c(1, 0.1, 1, 0.1)
# simulate fossils
f = sim.fossils.intervals(tree = t, rates = rates, interval.ages = times)
# plot the output and show sampling or proxy data
plot(f, tree = t, show.strata = TRUE, interval.ages = times, show.proxy = TRUE, proxy = rates)
Fossil recovery can also be specified using a set of per-interval probabilities. At most one fossil per species will be sampled per interval using this approach.
# define max interval age based on tree age
max.age = tree.max(t)
# define the number of intervals
strata = 4
# define a vector of sampling probabilities from youngest to oldest
probabilities = c(1.0, 0.5, 0.2, 0.1)
# simulate fossils
f = sim.fossils.intervals(tree = t, probabilities = probabilities, max.age = max.age, strata = strata)
# plot the output
plot(f, tree = t, show.strata = TRUE, strata = strata)
To incorporate age uncertainty into the output using the function
sim.fossils.intervals
set
use.exact.times = FALSE
. In this case hmin
and
hmax
will represent the youngest and oldest age associated
with the fossil, respectively. When we plot the output the function will
place the fossils at the mid-point of the corresponding interval. This
means in some cases fossils will not appear along their corresponding
edge.
# simulate fossils
f = sim.fossils.intervals(tree = t, probabilities = probabilities, max.age = max.age, strata = strata, use.exact.times = FALSE)
f
#> sp edge hmin hmax
#> 1 1 1 0.0000000 0.5640402
#> 2 1 1 0.5640402 1.1280804
#> 3 2 2 0.0000000 0.5640402
#> 4 3 3 0.0000000 0.5640402
#> 5 6 6 0.0000000 0.5640402
#> 6 7 7 0.0000000 0.5640402
#> 7 8 8 0.0000000 0.5640402
#> 8 13 13 0.5640402 1.1280804
#> Fossil record with 8 occurrences representing 7 species
#> Fossil record not simulated using taxonomy: all speciation events are assumed to be symmetric
# plot the output
plot(f, tree = t, show.strata = TRUE, strata = strata)
Alternatively, if you simulate fossils using a function that only
returns exact fossil sampling times, e.g. using the
sim.fossils.poisson
function, you can later assign fossils
to different intervals using the sim.interval.ages
function. You can also generate output using this function that respect
the species duration times by setting
use.species.ages = TRUE
, i.e. the function will not return
a value for hmin
or hmax
that are younger or
older than the true species age. To use this option you also need to
pass a phylo
or taxonomy
object to the
function.
The function sim.fossils.environment
can be used to
simulate fossil recovery that reflects species environmental
preferences. The variables that control the distribution and abundance
of species also control depositional environment, meaning changes in
ecological or environmental gradient across a given area will be
reflected in the patterns of deposition. A species response curve, which
describes the abundance of a species relative to an environmental
gradient, can therefore be used to model distributions of fossil
occurrences. See the plot below for examples.
Holland (1995) described a three
parameter Gaussian model for simulating environment-dependent fossil
recovery. In this model each species has three parameters, species
preferred depth (PD),
depth tolerance (DT)
and peak abundance (PA) (function arguments
PD
, DT
and PA
). The probability
of fossil recovery during a given interval of time is a function of
water depth, given by,
Prcollectioni(d) = PA * exp(−(d − PD)2/(2 * DT2)),
where d is current water
depth, PD and DT are the mean and the
standard deviation of the distribution, respectively, and PA describes the amplitude
of the distribution. Here the model is described in the context of
marine taxa, however the model and the function
sim.fossils.environment
can be applied to any environmental
or depositional context, in which some measure of environmental gradient
can be generated.
The following code snippet illustrate how to explore the role of the model parameters PD, DT and PA in determining species response curves.
# define a set of colors for 4 species
# color palette chosen using RColorBrewer
cols = c("#E41A1C", "#377EB8", "#984EA3", "#4DAF4A") # red, blue, purple, green
# species response curve function
response.curve = function(x, pd, dt, pa) pa * exp(-(x - pd)^2/ (2*dt^2))
# define species variables # species 1
PA = 1 # peak abundance
PD = 2 # preferred depth
DT = 1 # depth tolerance
# plot species resonse curve
curve(response.curve(x, PD, DT, PA), -2, 6, bty = "n", xlab = "relative depth", ylab = "Pr(collection)", xlim = c(-6,6), lwd = 1.5, col = cols[1])
# redefine depth tolerance and compare the output # species 2
DT = 0.5
curve(response.curve(x, PD, DT, PA), -1, 5, add = TRUE, lwd = 1.5, col = cols[2])
# redefine preferred depth and compare the output # species 3
PD = -2
curve(response.curve(x, PD, DT, PA), -5, 1, add = TRUE, lwd = 1.5, col = cols[3])
# redefine prferred depth and peak abundance # species 4
PD = -4
PA = 0.5
curve(response.curve(x, PD, DT, PA), -6, -2.2, add = TRUE, lwd = 1.5, col = cols[4])
The user is free to use a vector containing any gradient values to
simulate fossil occurrences. In the following example we will use a
vector of gradient values generated using the function
sim.gradient
, which return values for the following simple
sine wave function, y = d * sin(cycles * pi * (x − 1/4)),
where x is the number of
intervals or strata. The function output can be used to represent
changes in relative water depth, where the number of cycles may be
equivalent to the number of transgression/regression events.
Interval ages can be specified in the same way as the function
sim.fossils.intervals
via max.age
and
strata
or the intervals
vector.
# define the interval parameters
strata = 20
times = seq(0, tree.max(t), length.out = strata+1)
# simulate gradient values
wd = sim.gradient(strata, depth = 6)
# wd is a vector representing relative water depth
wd
#> [1] -6.0000000 -4.7348431 -1.4729129 2.4101725 5.2768425 5.9181678
#> [7] 4.0636894 0.4954761 -3.2816889 -5.6749035 -5.6749035 -3.2816889
#> [13] 0.4954761 4.0636894 5.9181678 5.2768425 2.4101725 -1.4729129
#> [19] -4.7348431 -6.0000000
# define species trait values # species 1
PD = 1
DT = 2
PA = 1
# simulate fossils
f = sim.fossils.environment(tree = t, interval.ages = times, proxy.data = wd, PD = PD, DT = DT, PA = PA)
# plot output and include proxy data & preferred depth
plot(f, tree = t, interval.ages = times, show.strata = TRUE, show.proxy = TRUE, proxy.data = wd, show.preferred.environ = TRUE, preferred.environ = PD, fossil.col = cols[1])
# redefine species trait values # species 2
DT = 0.5
# simulate fossils
f = sim.fossils.environment(tree = t, interval.ages = times, proxy.data = wd, PD = PD, DT = DT, PA = PA)
# plot output
plot(f, tree = t, interval.ages = times, show.strata = TRUE, show.proxy = TRUE, proxy.data = wd, show.preferred.environ = TRUE, preferred.environ = PD, fossil.col = cols[2])
# redefine species trait values # species 3
PD = -2
# simulate fossils
f = sim.fossils.environment(tree = t, interval.ages = times, proxy.data = wd, PD = PD, DT = DT, PA = PA)
# plot output
plot(f, tree = t, interval.ages = times, show.strata = TRUE, show.proxy = TRUE, proxy.data = wd, show.preferred.environ = TRUE, preferred.environ = PD, fossil.col = cols[3])
# define species trait values # species 4
PD = -4
PA = 0.5
# simulate fossils
f = sim.fossils.environment(tree = t, interval.ages = times, proxy.data = wd, PD = PD, DT = DT, PA = PA)
# plot output
plot(f, tree = t, interval.ages = times, show.strata = TRUE, show.proxy = TRUE, proxy.data = wd, show.preferred.environ = TRUE, preferred.environ = PD, fossil.col = cols[4])
Variation in fossil recovery parameters across different lineages or
species can be generated using the function
sim.trait.values
. The function output is a vector of
simulated trait values that can be used to specify the rate
parameter in sim.fossils.poisson
or the PD
,
DT
and PA
parameters in
sim.fossils.environment
.
Trait values can be simulated for a given phylo
or
taxonomy
object. If the function is provided with a
phylo
object, trait values are simulated assuming
bifurcating speciation and simulated trait values are output for each
edge in the order in which they appear in the phylo
object
dataframe. If the tree also has a root edge
(tree$root.edge
), the first entry in the vector will
correspond to the first entry in the vector. If the function is provided
with a taxonomy
object, simulated trait values are output
for each species in the order in which they appear in the
taxonomy
object dataframe.
Under the independent
model a new trait value is drawn
for each species from any valid user-specified distribution
dist
. To be valid the function just has to return a single
value.
# define the initial rate at the root or origin
rate = 1
# define the distribution used to sample new rates
# in this case an exponential with mean = 1
dist = function() { rexp(1) }
# simulate trait values under the independent trait values model
rates = sim.trait.values(init = rate, tree = t, model = "independent", dist = dist)
# simulate fossils
f = sim.fossils.poisson(tree = t, rate = rates)
# plot the output
plot(f, tree = t)
The parameter change.pr
is the probability that changes
in trait values are coincident with speciation events. At each
speciation event change occurs with probability change.pr
and new values are drawn from any valid user-specified distribution
dist
. If change.pr = 1
a change will occur at
every speciation event and the model will be the same as above.
# define the initial value at the root or origin
rate = 1
# define the distribution used to sample new rates
# in this case an exponential with a mean ~ 3
dist = function() { rexp(1, 1/4) }
# define the probability of the trait value changing at each speciation event
change.pr = 0.5
# simulate trait values under the independent trait values model
rates = sim.trait.values(init = rate, tree = t, model = "independent", dist = dist, change.pr = change.pr)
# simulate fossils
f = sim.fossils.poisson(tree = t, rate = rates)
# plot the output
plot(f, tree = t)
Lineage specific values generated using sim.trait.values
can also be passed to the the function
sim.fossils.environment
to define the PD
,
DT
, and PA
parameters.
# define constant values for preferred depth and depth tolerance
PD = 2
DT = 0.5
# simulate lineage variable peak abundance values
# define the distribution used to sample new PA values
# in this case a uniform in the interval 0, 1
dist = function() { runif(1) }
# simulate trait values under the independent model
PA = sim.trait.values(init = 0.1, tree = t, model = "independent", dist = dist)
# simulate fossils
f = sim.fossils.environment(tree = t, interval.ages = times, proxy.data = wd, PD = PD, DT = DT, PA = PA)
# plot the output
plot(f, tree = t, show.strata = TRUE, interval.ages = times)
The functions sim.extant.samples
and
sim.tip.samples
can be used to simulate incomplete extant
species and tip sampling, respectively. The parameter rho
is the probability that an extant species or a tip will be sampled.
See the paleotree
vignette to see how
FossilSim
objects can be converted into
paleotree
objects.