Statistical inference—constructing confidence intervals, testing hypotheses, and quantifying uncertainty—is central to data analysis. For PMM2 estimators, bootstrap methods provide a powerful, distribution-free approach to inference that:
This vignette provides a comprehensive guide to bootstrap inference for both linear regression and time series models fitted with PMM2.
The bootstrap, introduced by Efron (1979), resamples from the data to estimate the sampling distribution of a statistic. For regression models:
# Generate data with skewed errors
n <- 200
x <- rnorm(n, mean = 5, sd = 2)
# True parameters
beta_0 <- 10
beta_1 <- 2.5
# Skewed errors: chi-squared distribution
errors <- rchisq(n, df = 4) - 4
# Response
y <- beta_0 + beta_1 * x + errors
dat <- data.frame(y = y, x = x)
# Fit PMM2 model
fit_pmm2 <- lm_pmm2(y ~ x, data = dat, verbose = FALSE)
# Perform bootstrap inference
boot_results <- pmm2_inference(fit_pmm2,
formula = y ~ x,
data = dat,
B = 1000,
seed = 123)
# Display summary
summary(boot_results)
#> Estimate Std.Error t.value p.value conf.low
#> Min. :2.563 Min. :0.07524 Min. :22.84 Min. :0 Min. :2.413
#> 1st Qu.:4.352 1st Qu.:0.16282 1st Qu.:25.65 1st Qu.:0 1st Qu.:4.027
#> Median :6.142 Median :0.25041 Median :28.45 Median :0 Median :5.641
#> Mean :6.142 Mean :0.25041 Mean :28.45 Mean :0 Mean :5.641
#> 3rd Qu.:7.931 3rd Qu.:0.33799 3rd Qu.:31.26 3rd Qu.:0 3rd Qu.:7.254
#> Max. :9.721 Max. :0.42558 Max. :34.07 Max. :0 Max. :8.868
#> conf.high
#> Min. : 2.710
#> 1st Qu.: 4.675
#> Median : 6.640
#> Mean : 6.640
#> 3rd Qu.: 8.604
#> Max. :10.569Interpretation:
# Extract bootstrap samples for visualization
# (Note: This requires accessing internals - in practice, use summary output)
# Refit to get bootstrap samples
set.seed(123)
B <- 1000
boot_samples <- matrix(0, nrow = B, ncol = 2)
for(b in 1:B) {
# Resample residuals
res_b <- sample(residuals(fit_pmm2), size = n, replace = TRUE)
# Generate bootstrap data
y_b <- fitted(fit_pmm2) + res_b
dat_b <- dat
dat_b$y <- y_b
# Refit
fit_b <- lm_pmm2(y ~ x, data = dat_b, verbose = FALSE)
boot_samples[b, ] <- coef(fit_b)
}
# Plot bootstrap distributions
par(mfrow = c(2, 2))
# Intercept
hist(boot_samples[, 1], breaks = 30, main = "Bootstrap: Intercept",
xlab = "Intercept", col = "lightblue", border = "white")
abline(v = beta_0, col = "red", lwd = 2, lty = 2)
abline(v = coef(fit_pmm2)[1], col = "darkblue", lwd = 2)
legend("topright", legend = c("True", "Estimate"),
col = c("red", "darkblue"), lty = c(2, 1), lwd = 2, cex = 0.8)
# Q-Q plot for intercept
qqnorm(boot_samples[, 1], main = "Q-Q Plot: Intercept")
qqline(boot_samples[, 1], col = "red", lwd = 2)
# Slope
hist(boot_samples[, 2], breaks = 30, main = "Bootstrap: Slope",
xlab = "Slope", col = "lightgreen", border = "white")
abline(v = beta_1, col = "red", lwd = 2, lty = 2)
abline(v = coef(fit_pmm2)[2], col = "darkblue", lwd = 2)
legend("topright", legend = c("True", "Estimate"),
col = c("red", "darkblue"), lty = c(2, 1), lwd = 2, cex = 0.8)
# Q-Q plot for slope
qqnorm(boot_samples[, 2], main = "Q-Q Plot: Slope")
qqline(boot_samples[, 2], col = "red", lwd = 2)Observations:
The pmm2_inference() function provides multiple CI
methods:
The simplest approach: use empirical quantiles of the bootstrap distribution.
\[ CI_{95\%} = \left[ \hat{\beta}^*_{(0.025)}, \hat{\beta}^*_{(0.975)} \right] \]
Advantages: Simple, transformation-respecting Disadvantages: Can be biased if bootstrap distribution is skewed
Adjusts for bias in the bootstrap distribution:
\[ \text{Bias} = \mathbb{E}[\hat{\beta}^*] - \hat{\beta} \]
Advantages: Corrects systematic bias Disadvantages: Requires larger B (≥ 1000)
# Fit OLS for comparison
fit_ols <- lm(y ~ x, data = dat)
# Extract standard errors and CIs
se_ols <- summary(fit_ols)$coefficients[, "Std. Error"]
ci_ols <- confint(fit_ols)
# Bootstrap CIs (from summary)
boot_summary <- boot_results
# Create comparison table
comparison <- data.frame(
Parameter = c("Intercept", "Slope"),
True = c(beta_0, beta_1),
PMM2_Estimate = coef(fit_pmm2),
OLS_SE = se_ols,
PMM2_Boot_SE = boot_summary$Std.Error,
SE_Ratio = boot_summary$Std.Error / se_ols
)
print(comparison, row.names = FALSE, digits = 3)
#> Parameter True PMM2_Estimate OLS_SE PMM2_Boot_SE SE_Ratio
#> Intercept 10.0 9.72 0.59 0.4256 0.721
#> Slope 2.5 2.56 0.11 0.0752 0.686
cat("\n=== Confidence Interval Comparison ===\n\n")
#>
#> === Confidence Interval Comparison ===
cat("OLS 95% CI (Normal-based):\n")
#> OLS 95% CI (Normal-based):
print(ci_ols)
#> 2.5 % 97.5 %
#> (Intercept) 8.639023 10.967187
#> x 2.330172 2.762934
cat("\nPMM2 95% CI (Bootstrap Percentile):\n")
#>
#> PMM2 95% CI (Bootstrap Percentile):
print(data.frame(
Parameter = rownames(ci_ols),
Lower = boot_summary$conf.low,
Upper = boot_summary$conf.high
))
#> Parameter Lower Upper
#> 1 (Intercept) 8.867981 10.568794
#> 2 x 2.413477 2.710258Key Finding: PMM2 bootstrap standard errors are typically 10-30% smaller than OLS standard errors when errors are skewed, reflecting higher statistical efficiency.
Bootstrap enables flexible hypothesis testing without normality assumptions.
# H0: Slope = 0 (no relationship)
# From bootstrap summary
boot_summary <- boot_results
cat("=== Hypothesis Test: Slope = 0 ===\n\n")
#> === Hypothesis Test: Slope = 0 ===
cat("t-statistic:", boot_summary$t.value[2], "\n")
#> t-statistic: 34.06547
cat("p-value: ", boot_summary$p.value[2], "\n\n")
#> p-value: 0
if(boot_summary$p.value[2] < 0.05) {
cat("Decision: Reject H0 at 5% level\n")
cat("Conclusion: Significant relationship between x and y\n")
} else {
cat("Decision: Fail to reject H0\n")
}
#> Decision: Reject H0 at 5% level
#> Conclusion: Significant relationship between x and yFor multiple predictors, we might test if two slopes are equal.
# Generate multiple regression data
n <- 250
x1 <- rnorm(n)
x2 <- rnorm(n)
errors <- rexp(n, rate = 1) - 1
# True model: similar coefficients
beta <- c(5, 1.5, 1.8, -0.5)
y <- beta[1] + beta[2]*x1 + beta[3]*x2 + beta[4]*x1*x2 + errors
dat_multi <- data.frame(y = y, x1 = x1, x2 = x2)
# Fit model
fit_multi <- lm_pmm2(y ~ x1 + x2 + x1:x2, data = dat_multi, verbose = FALSE)
# Bootstrap
boot_multi <- pmm2_inference(fit_multi,
formula = y ~ x1 + x2 + x1:x2,
data = dat_multi,
B = 1000,
seed = 456)
# Test H0: beta_x1 = beta_x2
summary(boot_multi)
#> Estimate Std.Error t.value p.value
#> Min. :-0.5119 Min. :0.04254 Min. :-12.03 Min. :0
#> 1st Qu.: 1.0492 1st Qu.:0.04258 1st Qu.: 24.63 1st Qu.:0
#> Median : 1.6523 Median :0.04313 Median : 38.29 Median :0
#> Mean : 1.9362 Mean :0.04838 Mean : 35.26 Mean :0
#> 3rd Qu.: 2.5392 3rd Qu.:0.04893 3rd Qu.: 48.92 3rd Qu.:0
#> Max. : 4.9521 Max. :0.06474 Max. : 76.49 Max. :0
#> conf.low conf.high
#> Min. :-0.5944 Min. :-0.4313
#> 1st Qu.: 0.9637 1st Qu.: 1.1299
#> Median : 1.5677 Median : 1.7340
#> Mean : 1.8418 Mean : 2.0284
#> 3rd Qu.: 2.4459 3rd Qu.: 2.6325
#> Max. : 4.8263 Max. : 5.0767Time series bootstrap requires special care due to temporal dependence.
# Generate AR(1) data with skewed innovations
n <- 300
phi <- 0.7
innovations <- rgamma(n, shape = 2, rate = 2) - 1
x <- numeric(n)
x[1] <- innovations[1]
for(i in 2:n) {
x[i] <- phi * x[i-1] + innovations[i]
}
# Fit AR(1) model
fit_ar <- ar_pmm2(x, order = 1, method = "pmm2", include.mean = FALSE)
# Bootstrap inference for time series
boot_ar <- ts_pmm2_inference(fit_ar,
B = 1000,
seed = 789)
# Summary
summary(boot_ar)
#> Estimate Std.Error t.value p.value
#> Min. :0.6787 Min. :0.03457 Min. :19.64 Min. :0
#> 1st Qu.:0.6787 1st Qu.:0.03457 1st Qu.:19.64 1st Qu.:0
#> Median :0.6787 Median :0.03457 Median :19.64 Median :0
#> Mean :0.6787 Mean :0.03457 Mean :19.64 Mean :0
#> 3rd Qu.:0.6787 3rd Qu.:0.03457 3rd Qu.:19.64 3rd Qu.:0
#> Max. :0.6787 Max. :0.03457 Max. :19.64 Max. :0
#> conf.low conf.high
#> Min. :0.6125 Min. :0.7482
#> 1st Qu.:0.6125 1st Qu.:0.7482
#> Median :0.6125 Median :0.7482
#> Mean :0.6125 Mean :0.7482
#> 3rd Qu.:0.6125 3rd Qu.:0.7482
#> Max. :0.6125 Max. :0.7482
# Check if AR coefficient is significantly different from 0.5
cat("\n=== Testing H0: phi = 0.5 ===\n")
#>
#> === Testing H0: phi = 0.5 ===
boot_summary_ar <- boot_ar
ci_lower <- boot_summary_ar$conf.low[1]
ci_upper <- boot_summary_ar$conf.high[1]
cat("95% CI: [", round(ci_lower, 3), ",", round(ci_upper, 3), "]\n")
#> 95% CI: [ 0.612 , 0.748 ]
if(0.5 >= ci_lower && 0.5 <= ci_upper) {
cat("Decision: Cannot reject H0 (0.5 is within CI)\n")
} else {
cat("Decision: Reject H0 (0.5 is outside CI)\n")
}
#> Decision: Reject H0 (0.5 is outside CI)# Generate ARMA(1,1) with heavy-tailed innovations
n <- 400
phi <- 0.5
theta <- 0.4
innovations <- rt(n, df = 4)
x <- arima.sim(n = n,
model = list(ar = phi, ma = theta),
innov = innovations)
# Fit ARMA model
fit_arma <- arma_pmm2(x, order = c(1, 1), method = "pmm2", include.mean = FALSE)
# Bootstrap
boot_arma <- ts_pmm2_inference(fit_arma,
B = 1000,
seed = 999,
method = "block",
block_length = 20)
# Display results
summary(boot_arma)
#> Estimate Std.Error t.value p.value
#> Min. :0.3020 Min. :0.07011 Min. :3.948 Min. :0.000e+00
#> 1st Qu.:0.3682 1st Qu.:0.07170 1st Qu.:4.982 1st Qu.:1.970e-05
#> Median :0.4344 Median :0.07330 Median :6.016 Median :3.941e-05
#> Mean :0.4344 Mean :0.07330 Mean :6.016 Mean :3.941e-05
#> 3rd Qu.:0.5006 3rd Qu.:0.07490 3rd Qu.:7.051 3rd Qu.:5.911e-05
#> Max. :0.5668 Max. :0.07650 Max. :8.085 Max. :7.881e-05
#> conf.low conf.high
#> Min. :0.1181 Min. :0.4155
#> 1st Qu.:0.1823 1st Qu.:0.4751
#> Median :0.2465 Median :0.5347
#> Mean :0.2465 Mean :0.5347
#> 3rd Qu.:0.3107 3rd Qu.:0.5943
#> Max. :0.3749 Max. :0.6539Key Advantage: Bootstrap inference is robust to innovation distribution and does not require Gaussian assumptions.
The choice of B affects precision and computation time:
| Purpose | Recommended B | Computation Time |
|---|---|---|
| Quick check | 200-500 | Seconds |
| Standard inference | 1000-2000 | Minutes |
| Publication-quality | 5000-10000 | Hours |
# Assess stability of bootstrap SEs with different B
B_values <- c(100, 500, 1000, 2000)
se_results <- data.frame(B = B_values, SE_Intercept = NA, SE_Slope = NA)
for(i in seq_along(B_values)) {
boot_temp <- pmm2_inference(fit_pmm2, formula = y ~ x, data = dat,
B = B_values[i], seed = 123)
summary_temp <- boot_temp
se_results$SE_Intercept[i] <- summary_temp$Std.Error[1]
se_results$SE_Slope[i] <- summary_temp$Std.Error[2]
}
print(se_results)Rule of Thumb: Use B ≥ 1000 for confidence intervals, B ≥ 2000 for hypothesis tests.
For large datasets or complex models, use parallel computing:
# Enable parallel bootstrap (requires 'parallel' package)
boot_parallel <- pmm2_inference(fit_pmm2,
formula = y ~ x,
data = dat,
B = 2000,
seed = 123,
parallel = TRUE,
cores = 4) # Use 4 CPU cores
# Check speedup
# Typical speedup: 3-4x on 4 coresPerformance Tip: Parallel bootstrap is most beneficial when:
Check if bootstrap replicates converged successfully:
# Rerun bootstrap with verbose output for diagnostics
boot_verbose <- pmm2_inference(fit_pmm2,
formula = y ~ x,
data = dat,
B = 500,
seed = 321)
# Check for failed bootstrap samples
# (In production code, inspect convergence attribute)
summary(boot_verbose)
#> Estimate Std.Error t.value p.value conf.low
#> Min. :2.563 Min. :0.07911 Min. :21.12 Min. :0 Min. :2.413
#> 1st Qu.:4.352 1st Qu.:0.17442 1st Qu.:23.94 1st Qu.:0 1st Qu.:4.013
#> Median :6.142 Median :0.26974 Median :26.76 Median :0 Median :5.612
#> Mean :6.142 Mean :0.26974 Mean :26.76 Mean :0 Mean :5.612
#> 3rd Qu.:7.931 3rd Qu.:0.36505 3rd Qu.:29.58 3rd Qu.:0 3rd Qu.:7.212
#> Max. :9.721 Max. :0.46037 Max. :32.40 Max. :0 Max. :8.812
#> conf.high
#> Min. : 2.722
#> 1st Qu.: 4.684
#> Median : 6.647
#> Mean : 6.647
#> 3rd Qu.: 8.610
#> Max. :10.573
cat("\nAll bootstrap samples should converge successfully.\n")
#>
#> All bootstrap samples should converge successfully.
cat("If many fail, consider:\n")
#> If many fail, consider:
cat("- Increasing max_iter in lm_pmm2()\n")
#> - Increasing max_iter in lm_pmm2()
cat("- Checking for outliers or influential points\n")
#> - Checking for outliers or influential points
cat("- Simplifying the model\n")
#> - Simplifying the model# For demonstration, regenerate bootstrap samples
set.seed(123)
B <- 1000
sample_size <- nrow(dat)
boot_coefs <- matrix(0, nrow = B, ncol = 2)
for(b in 1:B) {
res_b <- sample(residuals(fit_pmm2), sample_size, replace = TRUE)
y_b <- fitted(fit_pmm2) + res_b
dat_b <- dat
dat_b$y <- y_b
fit_b <- lm_pmm2(y ~ x, data = dat_b, verbose = FALSE)
boot_coefs[b, ] <- coef(fit_b)
}
# Scatter plot of bootstrap estimates
par(mfrow = c(1, 2))
# Bivariate distribution
plot(boot_coefs[, 1], boot_coefs[, 2],
pch = 16, col = rgb(0, 0, 1, 0.1),
xlab = "Intercept", ylab = "Slope",
main = "Joint Bootstrap Distribution")
points(coef(fit_pmm2)[1], coef(fit_pmm2)[2],
pch = 19, col = "red", cex = 2)
# Correlation
cor_boot <- cor(boot_coefs[, 1], boot_coefs[, 2])
cat("Bootstrap correlation between intercept and slope:", round(cor_boot, 3), "\n")
#> Bootstrap correlation between intercept and slope: -0.87
# Density plot
plot(density(boot_coefs[, 2]), main = "Slope Bootstrap Density",
xlab = "Slope", lwd = 2, col = "blue")
abline(v = beta_1, col = "red", lwd = 2, lty = 2)
abline(v = coef(fit_pmm2)[2], col = "darkblue", lwd = 2)
legend("topright", legend = c("True", "Estimate", "Bootstrap"),
col = c("red", "darkblue", "blue"),
lty = c(2, 1, 1), lwd = 2)Recommended:
Alternative methods:
| Problem | Symptom | Solution |
|---|---|---|
| Too few bootstrap samples | Wide variation in CIs across runs | Increase B to ≥ 1000 |
| Non-convergence | Many failed bootstrap fits | Increase max_iter, check data quality |
| Extreme outliers | Very wide bootstrap CIs | Investigate outliers, consider robust methods |
| Small sample size | Unstable bootstrap distribution | Consider analytical methods, increase n if possible |
For even better coverage, use studentized (pivotal) bootstrap:
Bootstrap inference for PMM2 provides:
Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Bootstrap. CRC Press.
Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge University Press.
Zabolotnii, S., Warsza, Z.L., Tkachenko, O. (2018). “Polynomial Estimation of Linear Regression Parameters for the Asymmetric PDF of Errors”. Springer, AUTOMATION 2018.
Package functions: ?pmm2_inference,
?ts_pmm2_inference
vignette("01-pmm2-introduction")vignette("02-pmm2-time-series")demo(package = "EstemPMM") for interactive examples