nabla differentiates any R function exactly — no
symbolic algebra, no finite-difference grid, no loss of precision. Pass
a function and a point to D(), and you get the exact
derivative:
Functions passed to D() use x[1],
x[2], … indexing — the same convention as R’s
optim(). For multi-parameter functions,
gradient(), hessian(), and
jacobian() provide the standard objects of calculus:
The D operator is the core of nabla.
Applied once, it gives the first derivative. Applied twice (via
order = 2), it gives the second derivative:
f <- function(x) x[1]^2 + sin(x[1])
# First derivative: f'(x) = 2x + cos(x)
D(f, pi / 4)
#> [1] 2.277903
# Verify against the analytical formula
2 * (pi / 4) + cos(pi / 4)
#> [1] 2.277903
# Second derivative: f''(x) = 2 - sin(x)
D(f, pi / 4, order = 2)
#> [,1]
#> [1,] 1.292893gradient() is equivalent to D(f, x) for
scalar-valued functions:
The function and its AD-computed derivative over \([0, 2\pi]\), with the evaluation point \(x = \pi/4\) marked:
f <- function(x) x[1]^2 + sin(x[1])
xs <- seq(0, 2 * pi, length.out = 200)
# Compute f(x) and f'(x) at each grid point using D()
fx <- sapply(xs, function(xi) f(xi))
fpx <- sapply(xs, function(xi) D(f, xi))
# Mark the evaluation point x = pi/4
x_mark <- pi / 4
oldpar <- par(mar = c(4, 4, 2, 1))
plot(xs, fx, type = "l", col = "steelblue", lwd = 2,
xlab = "x", ylab = "y",
main = expression(f(x) == x^2 + sin(x) ~ "and its derivative"),
ylim = range(c(fx, fpx)))
lines(xs, fpx, col = "firebrick", lwd = 2, lty = 2)
points(x_mark, f(x_mark), pch = 19, col = "steelblue", cex = 1.3)
points(x_mark, D(f, x_mark), pch = 19, col = "firebrick", cex = 1.3)
abline(v = x_mark, col = "grey60", lty = 3)
legend("topleft", legend = c("f(x)", "f'(x) via AD"),
col = c("steelblue", "firebrick"), lty = c(1, 2), lwd = 2,
bty = "n")D() handles control flow, loops, and higher-order
functions transparently. No special annotation is needed — just write
ordinary R:
# if/else branching
safe_log <- function(x) {
if (x[1] > 0) log(x[1]) else -Inf
}
D(safe_log, 2) # 1/2 = 0.5
#> [1] 0.5
D(safe_log, -1) # 0 (constant branch)
#> [1] 0
# for loop
poly <- function(x) {
result <- 0
for (i in 1:5) result <- result + x[1]^i
result
}
D(poly, 2) # 1 + 2*2 + 3*4 + 4*8 + 5*16 = 129
#> [1] 129
# Reduce
f_reduce <- function(x) Reduce("+", lapply(1:4, function(i) x[1]^i))
D(f_reduce, 2)
#> [1] 49More complex compositions also propagate correctly:
For functions of several variables, gradient() returns
the gradient vector, hessian() returns the Hessian matrix,
and jacobian() handles vector-valued functions:
# Gradient of a 2-parameter function
f2 <- function(x) x[1]^2 * x[2]
gradient(f2, c(3, 4)) # c(2*3*4, 3^2) = c(24, 9)
#> [1] 24 9
# Hessian
hessian(f2, c(3, 4))
#> [,1] [,2]
#> [1,] 8 6
#> [2,] 6 0
# Jacobian of a vector-valued function
f_vec <- function(x) list(x[1] * x[2], x[1]^2 + x[2])
jacobian(f_vec, c(3, 4))
#> [,1] [,2]
#> [1,] 4 3
#> [2,] 6 1The D operator composes for higher-order derivatives.
D(f, x, order = 2) returns the Hessian;
D(f, x, order = 3) returns a third-order tensor:
Three methods for computing the derivative of \(f(x) = x^3 \sin(x)\) at \(x = 2\):
f <- function(x) x[1]^3 * sin(x[1])
f_num <- function(x) x^3 * sin(x) # plain numeric version
x0 <- 2
# 1. Analytical: f'(x) = 3x^2 sin(x) + x^3 cos(x)
analytical <- 3 * x0^2 * sin(x0) + x0^3 * cos(x0)
# 2. Finite differences
h <- 1e-8
finite_diff <- (f_num(x0 + h) - f_num(x0 - h)) / (2 * h)
# 3. Automatic differentiation
ad_result <- D(f, x0)
# Compare
data.frame(
method = c("Analytical", "Finite Diff", "AD (nabla)"),
derivative = c(analytical, finite_diff, ad_result),
error_vs_analytical = c(0, finite_diff - analytical, ad_result - analytical)
)
#> method derivative error_vs_analytical
#> 1 Analytical 7.582394 0.00000e+00
#> 2 Finite Diff 7.582394 -1.17109e-07
#> 3 AD (nabla) 7.582394 0.00000e+00The AD result matches the analytical derivative to machine precision — zero error — because dual-number arithmetic propagates exact derivative rules through every operation. Finite differences, by contrast, introduce truncation error of order \(O(h^2)\) for central differences; here the error is roughly \(10^{-7}\), reflecting the choice of \(h = 10^{-8}\) (the optimal step size for double-precision central differences is approximately \(\epsilon_{\mathrm{mach}}^{1/3} \approx 6 \times 10^{-6}\), so our step is in the right ballpark). A bar chart makes this difference vivid:
errors <- abs(c(finite_diff - analytical, ad_result - analytical))
# Use log10 scale; clamp AD error to .Machine$double.eps if exactly zero
errors[errors == 0] <- .Machine$double.eps
oldpar <- par(mar = c(4, 5, 2, 1))
bp <- barplot(log10(errors),
names.arg = c("Finite Diff", "AD (nabla)"),
col = c("coral", "steelblue"),
ylab = expression(log[10] ~ "|error|"),
main = "Absolute error vs analytical derivative",
ylim = c(-16, 0), border = NA)
abline(h = log10(.Machine$double.eps), lty = 2, col = "grey40")
text(mean(bp), log10(.Machine$double.eps) + 0.8,
"machine epsilon", cex = 0.8, col = "grey40")This advantage becomes critical for optimization algorithms that depend on accurate gradients.
Any optimizer can find the MLE — the point estimate is the easy part.
What nabla gives you is the exact Hessian
at the MLE, and that’s where the real statistical payoff lies.
The observed information matrix \(J(\hat\theta) = -H(\hat\theta)\) is the negative Hessian of the log-likelihood at the MLE. Its inverse gives the asymptotic variance-covariance matrix:
\[\text{Var}(\hat\theta) \approx J(\hat\theta)^{-1}\]
Inaccurate Hessians mean inaccurate standard errors and confidence
intervals. Finite-difference Hessians introduce \(O(h^2)\) error that propagates directly
into your CIs. With nabla, the Hessian is exact to machine
precision — so your standard errors and confidence intervals are as
accurate as the asymptotic theory allows.
Higher-order derivatives push further:
D(f, x, order = 3) yields the third-order derivative
tensor, which quantifies the asymptotic skewness of the
MLE’s sampling distribution — a correction that standard Wald intervals
miss entirely.
Under the hood, nabla uses dual numbers
— values of the form \(a +
b\varepsilon\), where \(\varepsilon^2 =
0\). Evaluating a function \(f\)
on a dual number \(x + \varepsilon\)
yields:
\[f(x + \varepsilon) = f(x) + f'(x)\,\varepsilon\]
The “value” part gives \(f(x)\) and the “epsilon” part gives \(f'(x)\), computed automatically by propagating \(\varepsilon\) through every arithmetic operation. This is forward-mode automatic differentiation (AD).
The low-level constructors expose dual numbers directly. In this
style, functions use bare x (not x[1]) because
they receive a single dualr object rather than a
vector:
# A dual variable: value = 3, derivative seed = 1
x <- dual_variable(3)
value(x)
#> [1] 3
deriv(x)
#> [1] 1
# A dual constant: value = 5, derivative seed = 0
k <- dual_constant(5)
value(k)
#> [1] 5
deriv(k)
#> [1] 0
# Explicit constructor
y <- dual(2, 1)
value(y)
#> [1] 2
deriv(y)
#> [1] 1Setting deriv = 1 means “I’m differentiating with
respect to this variable.” Setting deriv = 0 means “this is
a constant.”
All standard arithmetic operators propagate derivatives:
x <- dual_variable(3)
# Addition: d/dx(x + 2) = 1
r_add <- x + 2
value(r_add)
#> [1] 5
deriv(r_add)
#> [1] 1
# Subtraction: d/dx(5 - x) = -1
r_sub <- 5 - x
value(r_sub)
#> [1] 2
deriv(r_sub)
#> [1] -1
# Multiplication: d/dx(x * 4) = 4
r_mul <- x * 4
value(r_mul)
#> [1] 12
deriv(r_mul)
#> [1] 4
# Division: d/dx(1/x) = -1/x^2 = -1/9
r_div <- 1 / x
value(r_div)
#> [1] 0.3333333
deriv(r_div)
#> [1] -0.1111111
# Power: d/dx(x^3) = 3*x^2 = 27
r_pow <- x^3
value(r_pow)
#> [1] 27
deriv(r_pow)
#> [1] 27All standard mathematical functions are supported, with derivatives computed via the chain rule:
x <- dual_variable(1)
# exp: d/dx exp(x) = exp(x)
r_exp <- exp(x)
value(r_exp)
#> [1] 2.718282
deriv(r_exp)
#> [1] 2.718282
# log: d/dx log(x) = 1/x
r_log <- log(x)
value(r_log)
#> [1] 0
deriv(r_log)
#> [1] 1
# sin: d/dx sin(x) = cos(x)
x2 <- dual_variable(pi / 4)
r_sin <- sin(x2)
value(r_sin)
#> [1] 0.7071068
deriv(r_sin) # cos(pi/4)
#> [1] 0.7071068
# sqrt: d/dx sqrt(x) = 1/(2*sqrt(x))
x3 <- dual_variable(4)
r_sqrt <- sqrt(x3)
value(r_sqrt)
#> [1] 2
deriv(r_sqrt) # 1/(2*2) = 0.25
#> [1] 0.25
# Gamma-related
x4 <- dual_variable(3)
r_lgamma <- lgamma(x4)
value(r_lgamma) # log(2!) = log(2)
#> [1] 0.6931472
deriv(r_lgamma) # digamma(3)
#> [1] 0.9227843Dual numbers integrate with standard R patterns:
# sum() works
a <- dual_variable(2)
b <- dual_constant(3)
total <- sum(a, b, dual_constant(1))
value(total) # 6
#> [1] 6
deriv(total) # 1 (only a has deriv = 1)
#> [1] 1
# prod() works
p <- prod(a, dual_constant(3))
value(p) # 6
#> [1] 6
deriv(p) # 3 (product rule: 3*1 + 2*0)
#> [1] 3
# c() creates a dual_vector
v <- c(a, b)
length(v)
#> [1] 2
# is.numeric returns TRUE (for compatibility)
is.numeric(dual_variable(1))
#> [1] TRUEIn most cases you won’t need these constructors directly —
D(), gradient(), hessian(), and
jacobian() handle dual number creation internally. But
understanding the mechanism helps when debugging or building custom AD
workflows.
This vignette introduced D() and showed that AD computes
exact derivatives where finite differences cannot. For practical
applications:
vignette("mle-workflow") applies
gradient(), hessian(), and the observed
information matrix to five increasingly complex MLE problems, from
Normal to logistic regression, and builds a Newton-Raphson optimizer
powered entirely by AD.vignette("higher-order") demonstrates
D(f, x, order = 2) for second derivatives, with
applications to curvature analysis, Taylor expansion, and exact Hessians
for MLE standard errors.vignette("optimizer-integration")
shows how to plug nabla gradients into R’s built-in
optimizers (optim, nlminb) for
production-quality MLE fitting.vignette("mle-skewness") uses
D(f, x, order = 3) to compute the asymptotic skewness of
MLEs — a third-order analysis that goes beyond standard Wald confidence
intervals.