sdmTMB model structure
The complete sdmTMB model can be written as
$$
\begin{aligned}
\mathbb{E}[y_{\boldsymbol{s},t}] &= \mu_{\boldsymbol{s},t},\\
\mu_{\boldsymbol{s},t} &=
f^{-1} \left( \boldsymbol{X}^{\mathrm{main}}_{\boldsymbol{s},t}
\boldsymbol{\beta} +
O_{\boldsymbol{s},t} +
\alpha_g +
\boldsymbol{X}^{\mathrm{tvc}}_{\boldsymbol{s},t} \boldsymbol{\gamma_t} +
\boldsymbol{X}^{\mathrm{svc}}_{\boldsymbol{s},t} \zeta_{\boldsymbol{s}}
+
\omega_{\boldsymbol{s}} +
\epsilon_{\boldsymbol{s},t} \right),
\end{aligned}
$$
where
- ys, t
represents the response data at point s and time t;
- μ represents the
mean;
- f represents a link
function (e.g., log or logit) and f−1 represents its
inverse;
- Xmain,
Xtvc, and
Xsvc
represent design matrices (the superscript identifiers ‘main’ = main
effects, ‘tvc’ = time varying coefficients, and ‘svc’ = spatially
varying coefficients);
- β represents a
vector of fixed-effect coefficients;
- Os, t
represents an offset: a covariate (usually log transformed) with a
coefficient fixed at one;
- αg
represents random intercepts by group g, αg ∼ N(0, σα2);
- γt
represents time-varying coefficients (a random walk), γt ∼ N(γt − 1, σγ2);
- ζs
represents spatially varying coefficients (a random field), ζs ∼ MVN(0, Σζ);
- ωs
represents a spatial component (a random field), ωs ∼ MVN(0, Σω);
and
- ϵs, t
represents a spatiotemporal component (a random field), ϵs, t ∼ MVN(0, Σϵ).
A single sdmTMB model will rarely, if ever, contain all of the above
components. Next, we will split the model to describe the various parts
in more detail using ‘…’ to represent
the other optional components.
Main effects
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left(
\boldsymbol{X}^{\mathrm{main}}_{\boldsymbol{s},t} \boldsymbol{\beta}
\ldots \right)
\end{aligned}
$$
Within sdmTMB()
, Xs, tmainβ
is defined by the formula
argument and represents the
main-effect model matrix and a corresponding vector of coefficients.
This main effect formula can contain optional penalized smoothers or
non-linear functions as defined below.
Smoothers
Smoothers in sdmTMB are implemented with the same formula syntax
familiar to mgcv (Wood 2017) users fitting
GAMs (generalized additive models). Smooths are implemented in the
formula using + s(x)
, which implements a smooth from
mgcv::s()
. Within these smooths, the same syntax commonly
used in mgcv::s()
can be applied, e.g. 2-dimensional
smooths may be constructed with + s(x, y)
; smooths can be
specific to various factor levels, + s(x, by = group)
;
smooths can vary according to a continuous variable,
+ s(x, by = x2)
; the basis function dimensions may be
specified, e.g. + s(x, k = 4)
(see
?mgcv::choose.k
); and various types of splines may be
constructed such as cyclic splines to model seasonality,
e.g. + s(month, bs = "cc", k = 12)
.
While mgcv can fit unpenalized (e.g., B-splines) or penalized splines
(P-splines), sdmTMB only implements penalized splines. The penalized
splines are constructed in sdmTMB using the function
mgcv::smooth2random()
, which transforms splines into random
effects (and associated design matrices) that are estimable in a
mixed-effects modelling framework. This is the same approach as is
implemented in the R packages gamm4 (Wood &
Scheipl 2020) and brms (Bürkner
2017).
Linear break-point threshold models
The linear break-point or “hockey stick” model can be used to
describe threshold or asymptotic responses. This function consists of
two pieces, so that for x < b1, s(x) = x ⋅ b0,
and for x > b1, s(x) = b1 ⋅ b0.
In both cases, b0
represents the slope of the function up to a threshold, and the product
b1 ⋅ b0
represents the value at the asymptote. No constraints are placed on
parameters b0 or
b1.
These models can be fit by including + breakpt(x)
in the
model formula, where x
is a covariate. The formula can
contain a single break-point covariate.
Logistic threshold models
Models with logistic threshold relationships between a predictor and
the response can be fit with the form
s(x) = τ + ψ [1 + e−ln (19) ⋅ (x − s50)/(s95 − s50)]−1,
where s represents the
logistic function, ψ is a
scaling parameter (controlling the height of the y-axis for the
response; unconstrained), τ is
an intercept, s50 is a
parameter controlling the point at which the function reaches 50% of the
maximum (ψ), and s95 is a parameter controlling the
point at which the function reaches 95% of the maximum. The parameter
s50 is unconstrained but s95 is constrained to be larger than
s50.
These models can be fit by including + logistic(x)
in
the model formula, where x
is a covariate. The formula can
contain a single logistic covariate.
Spatial random fields
Spatial random fields, ωs, are
included if spatial = 'on'
(or TRUE
) and
omitted if spatial = 'off'
(or FALSE
).
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
\omega_{\boldsymbol{s}} + \ldots \right),\\
\boldsymbol{\omega} &\sim \operatorname{MVNormal} \left(
\boldsymbol{0}, \boldsymbol{\Sigma}_\omega \right),\\
\end{aligned}
$$ The marginal standard deviation of ω is indicated by
Spatial SD
in the printed model output or as
sigma_O
in the output of
sdmTMB::tidy(fit, "ran_pars")
. The ‘O’ is for ‘omega’
(ω).
Internally, the random fields follow a Gaussian Markov random field
(GMRF)
ω ∼ MVNormal(0, σω2Qω−1),
where Qω is a
sparse precision matrix and σω2
is the marginal variance.
Spatiotemporal random fields
Spatiotemporal random fields are included by default if there are
multiple time elements (time
argument is not
NULL
) and can be set to IID (independent and identically
distributed, 'iid'
; default), AR(1) ('ar1'
),
random walk ('rw'
), or off ('off'
) via the
spatiotemporal
argument. These text values are case
insensitive.
Spatiotemporal random fields are represented by ϵt
within sdmTMB. This has been chosen to match the representation in VAST
(Thorson 2019). The marginal standard
deviation of ϵt is
indicated by Spatiotemporal SD
in the printed model output
or as sigma_E
in the output of
sdmTMB::tidy(fit, "ran_pars")
. The ‘E’ is for ‘epsilon’
(ϵ).
IID spatiotemporal random fields
IID spatiotemporal random fields
(spatiotemporal = 'iid'
) can be represented as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
\epsilon_{\boldsymbol{s},t} + \ldots \right),\\
\boldsymbol{\epsilon_{t}} &\sim \operatorname{MVNormal} \left(
\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon} \right).
\end{aligned}
$$
where ϵs, t
represent random field deviations at point s and time t. The random fields are assumed
independent across time steps.
Similarly to the spatial random fields, these spatiotemporal random
fields (including all versions described below) are parameterized
internally with a sparse precision matrix (Qϵ)
ϵt ∼ MVNormal(0, σϵ2Qϵ−1).
AR(1) spatiotemporal random fields
First-order auto regressive, AR(1), spatiotemporal random fields
(spatiotemporal = 'ar1'
) add a parameter defining the
correlation between random field deviations from one time step to the
next. They are defined as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
\delta_{\boldsymbol{s},t} \ldots \right),\\
\boldsymbol{\delta}_{t=1} &\sim \operatorname{MVNormal}
(\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon}),\\
\boldsymbol{\delta}_{t>1} &= \rho \boldsymbol{\delta}_{t-1} +
\sqrt{1 - \rho^2} \boldsymbol{\epsilon_{t}}, \:
\boldsymbol{\epsilon_{t}} \sim \operatorname{MVNormal}
\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon} \right),
\end{aligned}
$$ where ρ is the
correlation between subsequent spatiotemporal random fields. The $\rho \boldsymbol{\delta}_{t-1} + \sqrt{1 -
\rho^2}$ term scales the spatiotemporal variance by the
correlation such that it represents the steady-state marginal variance.
The correlation ρ allows for
mean-reverting spatiotemporal fields, and is constrained to be −1 < ρ < 1. Internally, the
parameter is estimated as ar1_phi
, which is unconstrained.
The parameter ar1_phi
is transformed to ρ with ρ = 2(logit−1(ar1_phi
) − 1).
Random walk spatiotemporal random fields (RW)
Random walk spatiotemporal random fields
(spatiotemporal = 'rw'
) represent a model where the
difference in spatiotemporal deviations from one time step to the next
are IID. They are defined as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
\delta_{\boldsymbol{s},t} + \ldots \right),\\
\boldsymbol{\delta}_{t=1} &\sim \operatorname{MVNormal}
(\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon}),\\
\boldsymbol{\delta}_{t>1} &= \boldsymbol{\delta}_{t-1}
+ \boldsymbol{\epsilon_{t}}, \:
\boldsymbol{\epsilon_{t}} \sim \operatorname{MVNormal}
\left(\boldsymbol{0}, \boldsymbol{\Sigma}_{\epsilon} \right),
\end{aligned}
$$
where the distribution of the spatiotemporal field in the initial
time step is the same as for the AR(1) model, but the absence of the
ρ parameter allows the
spatiotemporal field to be non-stationary in time. Note that, in
contrast to the AR(1) parametrization, the variance is no longer the
steady-state marginal variance.
Time-varying regression parameters
Parameters can be modelled as time-varying according to a random walk
or first-order autoregressive, AR(1), process. The time-series model is
defined by time_varying_type
. For all types:
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots
+ \boldsymbol{X}^{\mathrm{tvc}}_{\boldsymbol{s},t}
\boldsymbol{\gamma_{t}} + \ldots \right),
\end{aligned}
$$ where γt
is an optional vector of time-varying regression parameters and Xs, ttvc
is the corresponding model matrix with covariate values. This is defined
via the time_varying
argument, assuming that the
time
argument is also supplied a column name.
time_varying
takes a one-sided formula.
~ 1
implies a time-varying intercept.
For time_varying_type = 'rw'
, the first time step is
estimated independently:
$$
\begin{aligned}
\gamma_{t=1} &\sim \operatorname{Uniform} \left(-\infty, \infty
\right),\\
\gamma_{t>1} &\sim \operatorname{Normal} \left(\gamma_{t-1},
\sigma^2_{\gamma} \right).
\end{aligned}
$$
In this case, the first time-step value is given an implicit uniform
prior. I.e., the same variable should not appear in the fixed effect
formula since the initial value is estimated as part of the time-varying
formula. The formula time_varying = ~ 1
implicitly
represents a time-varying intercept (assuming the time
argument has been supplied) and, this case, the intercept should be
omitted from the main effects (formula ~ + 0 + ...
or
formula ~ -1 + ...
).
For time_varying_type = 'rw0'
, the first time step is
estimated from a mean-zero prior:
$$
\begin{aligned}
\gamma_{t=1} &\sim \operatorname{Normal} \left(0,
\sigma^2_{\gamma} \right),\\
\gamma_{t>1} &\sim \operatorname{Normal} \left(\gamma_{t-1},
\sigma^2_{\gamma} \right).
\end{aligned}
$$ In this case, the time-varying variable (including the
intercept) should be included in the main effects. We suggest
using this formulation, but leave the 'rw'
option so that
legacy code works.
For time_varying_type = 'ar1'
:
$$
\begin{aligned}
\gamma_{t=1} &\sim \operatorname{Normal} \left(0,
\sigma^2_{\gamma} \right),\\
\gamma_{t>1} &\sim \operatorname{Normal}
\left(\rho_\gamma\gamma_{t-1}, \sqrt{1 - \rho_\gamma^2}
\sigma^2_{\gamma} \right),
\end{aligned}
$$ where ργ is the
correlation between subsequent time steps. The first time step is given
a mean-zero prior.
Spatially varying coefficients (SVC)
Spatially varying coefficient models are defined as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
\boldsymbol{X}^{\mathrm{svc}}_{\boldsymbol{s}, t} \zeta_{\boldsymbol{s}}
+ \ldots \right),\\
\boldsymbol{\zeta} &\sim \operatorname{MVNormal} \left(
\boldsymbol{0}, \boldsymbol{\Sigma}_\zeta \right),
\end{aligned}
$$
where ζ is a random
field representing a spatially varying coefficient. Usually, Xs, tsvc
would represent a prediction matrix that is constant spatially for a
given time t as defined by a
one-sided formula supplied to spatial_varying
. For example
spatial_varying = ~ 0 + x
, where 0
omits the
intercept.
The random fields are parameterized internally with a sparse
precision matrix (Qζ)
ζ ∼ MVNormal(0, σζ2Qζ−1).
IID random or multi-level intercepts
Multilevel/hierarchical intercepts are defined as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots + \alpha_{g} +
\ldots \right),\\
\alpha_g &\sim \operatorname{Normal} \left(0, \sigma_\alpha^2
\right),\\
\end{aligned}
$$
where αg is an example
optional “random” intercept—an intercept with mean zero that varies by
level g and is constrained by
σα. This
is defined by the formula
argument via the
(1 | g)
syntax as in lme4 or glmmTMB. There can be multiple
random intercepts, despite only showing one above. E.g.,
(1 | g1) + (1 | g2)
, in which case they are assumed
independent and uncorrelated from each other.
Offset terms
Offset terms can be included through the offset
argument
in sdmTMB()
. These are included in the linear predictor
as
$$
\begin{aligned}
\mu_{\boldsymbol{s},t} &= f^{-1} \left( \ldots +
O_{\boldsymbol{s},t} + \ldots \right),
\end{aligned}
$$
where Os, t
is an offset term—a log transformed variable without a
coefficient (assuming a log link). The offset is not included
in the prediction. Therefore, if offset
represents a
measure of effort, for example, the prediction is for one unit of effort
(log(1) = 0
).
Observation model families
Here we describe the main observation families that are available in
sdmTMB and comment on their parametrization, statistical properties,
utility, and code representation in sdmTMB.
Binomial
Binomial (N, μ)
where N is the size or number
of trials, and μ is the
probability of success for each trial. If N = 1, the distribution becomes the
Bernoulli distribution. Internally, the distribution is parameterized as
the robust
version in TMB, which is numerically stable when probabilities
approach 0 or 1. Following the structure of stats::glm()
,
lme4, and glmmTMB, a binomial family can be specified in one of 4
ways:
- the response may be a factor (and the model classifies the first
level versus all others)
- the response may be binomial (0/1)
- the response can be a matrix of form
cbind(success, failure)
, or
- the response may be the observed proportions, and the
weights
argument is used to specify the Binomial size
(N) parameter
(probabilty ~ ..., weights = N
).
Code defined within
TMB.
Example: family = binomial(link = "logit")
Beta
Beta (μϕ, (1 − μ)ϕ)
where μ is the mean and ϕ is a precision parameter. This
parametrization follows Ferrari &
Cribari-Neto (2004) and the betareg R package (Cribari-Neto & Zeileis 2010). The variance
is μ(1 − μ)/(ϕ + 1).
Code defined within
TMB.
Example: family = Beta(link = "logit")
Gamma
$$
\operatorname{Gamma} \left( \phi, \frac{\mu}{\phi} \right)
$$ where ϕ represents
the Gamma shape and μ/ϕ represents the scale.
The mean is μ and variance is
μ ⋅ ϕ2.
Code defined within
TMB.
Example: family = Gamma(link = "log")
Gaussian
Normal (μ, ϕ2)
where μ is the mean and ϕ is the standard deviation. The
variance is ϕ2.
Example: family = Gaussian(link = "identity")
Code defined within
TMB.
Lognormal
sdmTMB uses the bias-corrected lognormal distribution where ϕ represents the standard deviation
in log-space:
$$
\operatorname{Lognormal} \left( \log \mu - \frac{\phi^2}{2}, \phi^2
\right).
$$ Because of the bias correction, 𝔼[y] = μ and Var[log y] = ϕ2.
Code defined within
sdmTMB based on the TMB dnorm()
normal density.
Example: family = lognormal(link = "log")
Negative Binomial 1 (NB1)
NB1 (μ, ϕ)
where μ is the mean and
ϕ is the dispersion parameter.
The variance scales linearly with the mean Var[y] = μ + μ/ϕ
(Hilbe 2011). Internally, the distribution
is parameterized as the robust
version in TMB.
Code defined within
sdmTMB based on NB2 and borrowed from glmmTMB.
Example: family = nbinom1(link = "log")
Negative Binomial 2 (NB2)
NB2 (μ, ϕ)
where μ is the mean and
ϕ is the dispersion parameter.
The variance scales quadratically with the mean Var[y] = μ + μ2/ϕ
(Hilbe 2011). The NB2 parametrization is
more commonly seen in ecology than the NB1. Internally, the distribution
is parameterized as the robust
version in TMB.
Code defined within
TMB.
Example: family = nbinom2(link = "log")
Poisson
Poisson (μ) where μ represents the mean and Var[y] = μ.
Code defined within
TMB.
Example: family = poisson(link = "log")
Student-t
Student-t (μ, ϕ, ν)
where ν, the degrees of
freedom (df
), is a user-supplied fixed parameter. Lower
values of ν result in heavier
tails compared to the Gaussian distribution. Above approximately
df = 20
, the distribution becomes very similar to the
Gaussian. The Student-t distribution with a low degrees of freedom
(e.g., ν ≤ 7) can be helpful
for modelling data that would otherwise be suitable for Gaussian but
needs an approach that is robust to outliers (e.g., Anderson et al. 2017).
Code defined within
sdmTMB based on the dt()
distribution in TMB.
Example: family = student(link = "log", df = 7)
Tweedie
Tweedie (μ, p, ϕ), 1 < p < 2
where μ is the mean, p is the power parameter constrained
between 1 and 2, and ϕ is the
dispersion parameter. The Tweedie distribution can be helpful for
modelling data that are positive and continuous but also contain
zeros.
Internally, p is
transformed from logit−1(thetaf
) + 1
to constrain it between 1 and 2 and is estimated as an unconstrained
variable.
The source
code is implemented as in the cplm package (Zhang 2013) and is based on Dunn & Smyth (2005). The TMB version is
defined here.
Example: family = tweedie(link = "log")
Gamma mixture
This is a 2 component mixture that extends the Gamma
distribution,
$$
(1 - p) \cdot \operatorname{Gamma} \left( \phi,
\frac{\mu_{1}}{\phi} \right) + p \cdot \operatorname{Gamma} \left(
\phi, \frac{\mu_{2}}{\phi} \right),
$$ where ϕ represents
the Gamma shape, μ1/ϕ represents
the scale for the first (smaller component) of the distribution, μ2/ϕ represents
the scale for the second (larger component) of the distribution, and
p controls the contribution of
each component to the mixture (also interpreted as the probability of
larger events).
The mean is (1 − p) ⋅ μ1 + p ⋅ μ2
and the variance is (1 − p)2 ⋅ μ1 ⋅ ϕ2 + (p)2 ⋅ μ2 ⋅ ϕ2.
Here, and for the other mixture distributions, the probability of the
larger mean can be obtained from
plogis(fit$model$par[["logit_p_mix"]])
and the ratio of the
larger mean to the smaller mean can be obtained from
1 + exp(fit$model$par[["log_ratio_mix"]])
. The standard
errors are available in the TMB sdreport:
fit$sd_report
.
If you wish to fix the probability of a large (i.e., extreme) mean,
which can be hard to estimate, you can use the map
list.
E.g.:
sdmTMB(...,
control = sdmTMBcontrol(
start = list(logit_p_mix = qlogis(0.05)), # 5% probability of 'mu2'
map = list(logit_p_mix = factor(NA)) # don't estimate
)
)
Example: family = gamma_mix(link = "log")
. See also
family = delta_gamma_mix()
for an extension incorporating
this distribution with delta models.
Lognormal mixture
This is a 2 component mixture that extends the lognormal
distribution,
$$
(1 - p) \cdot \operatorname{Lognormal} \left( \log \mu_{1} -
\frac{\phi^2}{2}, \phi^2 \right) + p \cdot \operatorname{Lognormal}
\left( \log \mu_{2} - \frac{\phi^2}{2}, \phi^2 \right).
$$
Because of the bias correction, 𝔼[y] = (1 − p) ⋅ μ1 + p ⋅ μ2
and Var[log y] = (1 − p)2 ⋅ ϕ2 + p2 ⋅ ϕ2.
As with the Gamma mixture, p controls the contribution of each
component to the mixture (also interpreted as the probability of larger
events).
Example: family = lognormal_mix(link = "log")
. See also
family = delta_lognormal_mix()
for an extension
incorporating this distribution with delta models.
Negative binomial 2 mixture
This is a 2 component mixture that extends the NB2 distribution,
(1 − p) ⋅ NB2 (μ1, ϕ) + p ⋅ NB2 (μ2, ϕ)
where μ1 is the
mean of the first (smaller component) of the distribution, μ2 is the mean of the
larger component, and p
controls the contribution of each component to the mixture.
Example: family = nbinom2_mix(link = "log")
Gaussian random fields
Matérn parameterization
The Matérn defines the covariance Φ(sj, sk)
between spatial locations sj and sk as
Φ(sj, sk) = τ2/Γ(ν)2ν − 1(κdjk)νKν(κdjk),
where τ2
controls the spatial variance, ν controls the smoothness, Γ represents the Gamma function,
djk
represents the distance between locations sj and sk, Kν represents
the modified Bessel function of the second kind, and κ represents the decorrelation rate.
The parameter ν is set to 1 to
take advantage of the Stochastic Partial Differential Equation (SPDE)
approximation to the GRF to greatly increase computational efficiency
(Lindgren et al. 2011).
Internally, the parameters κ
and τ are converted to range
and marginal standard deviation σ as $\textrm{range} = \sqrt{8} / \kappa$ and
$\sigma = 1 / \sqrt{4 \pi \exp \left(2
\log(\tau) + 2 \log(\kappa) \right) }$.
In the case of a spatiotemporal model with both spatial and
spatiotemporal fields, if share_range = TRUE
in
sdmTMB()
(the default), then a single κ and range are estimated with
separate σω and σϵ. This often
makes sense since data are often only weakly informative about κ. If
share_range = FALSE
, then separate κω and κϵ are
estimated. The spatially varying coefficient field always shares κ with the spatial random field.
Projection A
matrix
The values of the spatial variables at the knots are multiplied by a
projection matrix A
that bilinearly interpolates from the knot locations to the values at
the locations of the observed or predicted data (Lindgren & Rue 2015)
ω* = Aω,
where ω*
represents the values of the spatial random fields at the observed
locations or predicted data locations. The matrix A has a row for each data
point or prediction point and a column for each knot. Three non-zero
elements on each row define the weight of the neighbouring 3 knot
locations for location s. The same bilinear
interpolation happens for any spatiotemporal random fields
ϵt* = Aϵt.
Anisotropy
TMB allows for anisotropy, where spatial covariance may be asymmetric
with respect to latitude and longitude (full
details). Anisotropy can be turned on or off with the logical
anisotropy
argument to sdmTMB()
. There are a
number of ways to implement anisotropic covariance (Fuglstad et al. 2015), and we adopt a
2-parameter rotation matrix H. The elements of H are defined by the
parameter vector x so
that H1, 1 = x1,
H1, 2 = H2, 1 = x2
and H2, 2 = (1 + x22)/x1.
Once a model is fitted with sdmTMB()
, the anisotropy
relationships may be plotted using the plot_anisotropy()
function, which takes the fitted object as an argument. If a barrier
mesh is used, anisotropy is disabled.
Incorporating physical barriers into the SPDE
In some cases the spatial domain of interest may be complex and
bounded by some barrier such as by land or water (e.g., coastlines,
islands, lakes). SPDE models allow for physical barriers to be
incorporated into the modelling (Bakka et
al. 2019). With sdmTMB()
models, the mesh
construction occurs in two steps: the user (1) constructs a mesh with a
call to sdmTMB::make_mesh()
, and (2) passes the mesh to
sdmTMB::add_barrier_mesh()
. The barriers must be
constructed as sf
objects (Pebesma
2018) with polygons defining the barriers. See
?sdmTMB::add_barrier_mesh
for an example.
The barrier implementation requires the user to select a fraction
value (range_fraction
argument) that defines the fraction
of the usual spatial range when crossing the barrier (Bakka et al. 2019). For example, if
the range was estimated at 10 km, range_fraction = 0.2
would assume that the range was 2 km across the barrier. This would let
the spatial correlation decay 5 times faster with distance. From
experimentation, values around 0.1 or 0.2 seem to work well but values
much lower than 0.1 can result in convergence issues.
This
website by Francesco Serafini and Haakon Bakka provides an
illustration with INLA. The implementation within TMB was borrowed from
code written by Olav Nikolai Breivik and Hans Skaug at the TMB Case Studies
Github site.
Optimization
Optimization details
The sdmTMB model is fit by maximum marginal likelihood. Internally, a
TMB (Kristensen et al. 2016)
model template calculates the marginal log likelihood and its gradient,
and the negative log likelihood is minimized via the non-linear
optimization routine stats::nlminb()
in R (Gay 1990; R Core Team 2021). Random effects are
estimated at values that maximize the log likelihood conditional on the
estimated fixed effects and are integrated over via the Laplace
approximation (Kristensen et al.
2016).
Like AD Model Builder (Fournier et
al. 2012), TMB allows for parameters to be fit in phases and
we include the multiphase
argument in
sdmTMB::sdmTMBcontrol()
to allow this. For high-dimensional
models (many fixed and random effects), phased estimation may be faster
and result in more stable convergence. In sdmTMB, phased estimation
proceeds by first estimating all fixed-effect parameters contributing to
the likelihood (holding random effects constant at initial values). In
the second phase, the random-effect parameters (and their variances) are
also estimated. Fixed-effect parameters are also estimated in the second
phase and are initialized at their estimates from the first phase.
In some cases, a single call to stats::nlminb()
may not
be result in convergence (e.g., the maximum gradient of the marginal
likelihood with respect to fixed-effect parameters is not small enough
yet), and the algorithm may need to be run multiple times. In the
sdmTMB::sdmTMBcontrol()
function, we include an argument
nlminb_loops
that will restart the optimization at the
previous best values. The number of nlminb_loops
should
generally be small (e.g., 2 or 3 initially), and defaults to 1. For some
sdmTMB models, the Hessian may also be unstable and need to be
re-evaluated. We do this optionally with the
stats::optimHess()
routine after the call to
stats::nlminb()
. The stats::optimHess()
function implements a Newton optimization routine to find the Hessian,
and we include the argument newton_loops
in
sdmTMB::sdmTMBcontrol()
to allow for multiple function
evaluations (each starting at the previous best value). By default, this
is not included and newton_loops
is set to 0. If a model is
already fit, the function sdmTMB::run_extra_optimization()
can run additional optimization loops with either routine to further
reduce the maximum gradient.
Assessing convergence
Much of the guidance around diagnostics and glmmTMB also applies to
sdmTMB, e.g. the
glmmTMB vignette on troubleshooting. Optimization with
stats::nlminb()
involves specifying the number of
iterations and evaluations (eval.max
and
iter.max
) and the tolerances (abs.tol
,
rel.tol
, x.tol
, xf.tol
)—a greater
number of iterations and smaller tolerance thresholds increase the
chance that the optimal solution is found, but more evaluations
translates into longer computation time. Warnings of
non-positive-definite Hessian matrices (accompanied by parameters with
NA
s for standard errors) often mean models are improperly
specified given the data. Standard errors can be observed in the output
of print.sdmTMB()
or by checking
fit$sd_report
. The maximum gradient of the marginal
likelihood with respect to fixed-effect parameters can be checked by
inspecting (fit$gradients
). Guidance varies, but the
maximum gradient should likely be at least < 0.001 before assuming the fitting
routine is consistent with convergence. If maximum gradients are already
relatively small, they can often be reduced further with additional
optimization calls beginning at the previous best parameter vector as
described above with sdmTMB::run_extra_optimization()
.
References
Anderson, S.C., Branch, T.A., Cooper, A.B. & Dulvy, N.K. (2017).
Black-swan events in animal populations. Proceedings of the National
Academy of Sciences, 114, 3252–3257.
Bakka, H., Vanhatalo, J., Illian, J., Simpson, D. & Rue, H. (2019).
Non-stationary
Gaussian models with physical barriers.
arXiv:1608.03787 [stat]. Retrieved from
https://arxiv.org/abs/1608.03787
Bürkner, P.-C. (2017).
brms: An R package for
Bayesian multilevel models using Stan.
Journal of Statistical Software,
80, 1–28.
Cribari-Neto, F. & Zeileis, A. (2010). Beta regression in
R. Journal of Statistical Software,
34, 1–24.
Dunn, P.K. & Smyth, G.K. (2005).
Series evaluation of
Tweedie exponential dispersion model densities.
Statistics and Computing,
15, 267–280.
Edwards, A.M. & Auger-Méthé, M. (2019). Some guidance on using
mathematical notation in ecology. Methods in Ecology and
Evolution, 10, 92–99.
Ferrari, S. & Cribari-Neto, F. (2004).
Beta regression for
modelling rates and proportions.
Journal of Applied
Statistics,
31, 799–815.
Fournier, D.A., Skaug, H.J., Ancheta, J., Ianelli, J., Magnusson, A.,
Maunder, M.N., Nielsen, A. & Sibert, J. (2012). AD model builder:
Using automatic differentiation for statistical inference of highly
parameterized complex nonlinear models. Optimization Methods and
Software, 27, 233–249.
Fuglstad, G.-A., Lindgren, F., Simpson, D. & Rue, H. (2015).
Exploring a new class of non-stationary spatial Gaussian
random fields with varying local anisotropy. Statistica Sinica,
25, 115–133.
Gay, D.M. (1990). Usage summary for selected optimization routines.
Computing Science Technical Report, 153, 1–21.
Hilbe, J.M. (2011). Negative Binomial Regression.
Cambridge University Press.
Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H. & Bell, B.M.
(2016).
TMB: Automatic
differentiation and Laplace approximation.
Journal
of Statistical Software,
70, 1–21.
Lindgren, F. & Rue, H. (2015).
Bayesian spatial modelling
with R-INLA.
Journal of Statistical Software,
63, 1–25.
Lindgren, F., Rue, H. & Lindström, J. (2011). An explicit link
between Gaussian fields and Gaussian Markov
random fields: The stochastic partial differential equation approach.
J. R. Stat. Soc. Ser. B Stat. Methodol., 73,
423–498.
Pebesma, E. (2018).
Simple Features for R:
Standardized Support for Spatial Vector Data.
The R
Journal,
10, 439–446. Retrieved from
https://doi.org/10.32614/RJ-2018-009
R Core Team. (2021).
R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
Retrieved from
https://www.R-project.org/
Wood, S.N. (2017). Generalized additive models: An introduction with
R, 2nd edn. Chapman and Hall/CRC.
Wood, S. & Scheipl, F. (2020).
gamm4: Generalized Additive
Mixed Models using ’mgcv’ and ’lme4’. Retrieved from
https://CRAN.R-project.org/package=gamm4