"risk_vs_effect" plot type (and the
internal plot_risk_vs_effect()). The quadrant view
of each stratum’s mean prediction against its stratum random effect
largely duplicated what "effect_decomp" and
"predicted" already show more directly, so it has been
dropped from plot() for both maihda_model and
maihda_analysis objects, from type = "all",
and from the Shiny app’s plot menu."ternary" plot type and the
compute_maihda_ternary_data(),
maihda_ternary_plot(), and
plot_maihda_ternary() functions. The ternary
diagnostic normalised each stratum’s additive, intersection-specific,
and uncertainty signals to sum to 1, which discards effect magnitude –
the quantity MAIHDA exists to measure – and puts two effect components
and an estimation-error term on a single triangle. Its content is
covered more directly by "effect_decomp" (the
additive-vs-intersectional split with magnitudes intact) and the
"predicted" / "upset" views (uncertainty shown
as intervals), so it has been dropped from plot() for both
maihda_model and maihda_analysis objects, from
type = "all", and from the Shiny app, and the optional
ggtern dependency removed."upset" plot type and
maihda_upset_size(). An UpSet-style alternative to
"predicted" that replaces the long intersectional axis
labels with a category matrix (an intersection-size bar, a dot matrix
encoding each stratum’s level on every dimension, and a predicted-value
panel, all sharing one column order). Binary 0/1 dimensions collapse to
a present/absent row; multi-level factors get one row per level. A
quantity argument switches the bottom panel between the
predicted value (fixed + random) and the stratum random effect
(interaction). maihda_upset_size() returns content-scaled
figure dimensions.theme_maihda() now is set as standard theme for
ggplot objectspredicted (and longitudinal
trajectories) plot can now keep the most extreme strata
when it truncates, instead of the first by stratum order. When
there are more strata than the n_strata cap, the view
dropped strata in stratum order – effectively arbitrary with
respect to how far a subgroup sits from the population mean, so the most
striking strata could be the ones omitted. plot() gains an
opt-in select argument:
"order" (default, unchanged) keeps the first n_strata in
stratum order; "deviation" keeps the n_strata furthest from
the reference line (largest |predicted - reference|, so the
most extreme strata in both directions, not just one tail). For
a longitudinal trajectories plot it keeps the strata whose
trajectories swing furthest from the population curve (peak
|random deviation| over the time grid). Selection and
display are kept separate: select changes which
strata appear, but the x-axis stays in stratum order. It composes with
the flag-aware cap (flagged strata are always kept; select
governs the fill) and the caption names the rule used (“the 12 strata
furthest from the reference, of 200”)."predicted" view caps the number drawn
(n_strata, default 50) and kept them in stratum order, so a
stratum flagged by maihda_interactions() that fell past the
cap was dropped from the figure entirely – the highlight could not reach
it. plot() gains
only_flagged: TRUE restricts
the "predicted" and "obs_vs_shrunken" views to
the strata carrying a credibly non-zero interaction (and turns the
highlight on with the stored diagnostic if it was off), with a caption
naming the screen (e.g. “Showing the 7 flagged strata (95% interval,
BH-adjusted) of 200 total”) and a graceful captioned empty panel when
none are flagged. Independently, whenever interactions are highlighted
the n_strata cap on "predicted" is now
flag-aware: every flagged stratum is kept and the
remaining slots are filled in stratum order, so the signal the highlight
exists to surface is never silently truncated away. The across-strata
reference line is still computed from the full set, so filtering never
shifts it. "effect_decomp" deliberately ignores
only_flagged (its waterfall exists to show each flagged
stratum’s place in the full distribution) and says so; the framing stays
on “flagged”, not “significant”, consistent with the diagnostic’s
exploratory, partial-pooling reading.maihda_interactions() now defaults to
adjust = "BH" (Benjamini-Hochberg) rather than
"none": fitting and flagging every stratum in one call is a
screening question, so the flags should control the false-discovery rate
by default. Pass adjust = "none" (or
interactions = "none" to
maihda()/fit_maihda()) for the uncorrected,
per-stratum individual-testing view. A new
rope argument adds an equivalence /
smallest-interaction-of-interest reading (Schuirmann 1987; Kruschke
2018): supply a region of practical equivalence (a half-width
d for c(-d, d), or
c(lower, upper), on the link scale) and each stratum gains
a decision of "relevant" (interval entirely
outside the region), "negligible" (entirely inside), or
"inconclusive" (straddling a bound), reported by
print(). The documentation now also keeps two ideas the
literature distinguishes apart – partial pooling regularises
magnitude/sign (Gelman, Hill & Yajima 2012) while whether to correct
depends on the inferential structure of the claim (Rubin 2021) – rather
than treating shrinkage as itself a multiplicity correction.compare_maihda(ic = TRUE) could show
information criteria on incompatible scales without a warning.
The comparability check only warned when the models differed in outcome,
weights, family/link, analytic sample, or strata, then appended
whichever information-criterion columns existed. Two
same-family models fitted on different engines – e.g. a
Gaussian lme4 fit (reporting AIC/BIC) and a Gaussian
brms fit (reporting WAIC/LOOIC) – agree on all the checked
aspects, so no warning fired, yet the appended table placed the
likelihood AIC/BIC and the Bayesian WAIC/LOOIC side by side. Those are
different scales and are not comparable to each other (as
maihda_ic()’s documentation already notes).
compare_maihda() now emits an explicit warning whenever the
appended criteria span both the likelihood (AIC/BIC) and the Bayesian
(WAIC/LOOIC) scales, so a cross-engine comparison can no longer be read
as if the four numbers were on one ruler. An
all-lme4/all-ordinal (AIC/BIC only) or
all-brms (WAIC/LOOIC only) comparison is
unaffected.
A fixed interaction among the stratum dimensions
(e.g. Gender * Race) silently corrupted the
decomposition. R expands Gender * Race to
Gender + Race + Gender:Race, but the workflow only looked
for the dimensions’ additive main effects when deciding whether
the supplied formula was the fully-specified adjusted model. It found
both Gender and Race, treated the fit as the
adjusted model, and derived the null by removing only those main effects
– leaving the fixed Gender:Race interaction in
both the null and the adjusted formulas. That fixed
cell-means term duplicates the intersectional
(1 | Gender:Race) random intercept: it absorbs the
between-stratum variance into fixed effects, pinning the stratum
variance at a singular boundary so the PCV came back
NULL/invalid (and every per-group PCV came back
NA in compare_maihda_groups() /
maihda(group = )). maihda() and
compare_maihda_groups() now reject such a
formula up front with an actionable error – the MAIHDA adjusted model is
defined to carry only the dimensions’ additive main effects,
with the intersection estimated by the stratum random effect (write
Gender + Race, and use
decomposition = "crossed-dimensions" or
maihda_interactions() to quantify the interaction).
maihda_interactions() likewise now warns when handed a bare
model that carries such a fixed dimension interaction (its stratum
random effects are no longer the pure interaction the diagnostic
reports). Detection is robust to variable order within the interaction
and to non-syntactic names, and a legitimate covariate-by-dimension
interaction (e.g. age * Gender) is left untouched.
Discriminatory accuracy silently dropped the AUC/MOR for
a brms aggregated-binomial (y | trials(n))
fit. The model layer fits such responses, and
summary() attaches the discriminatory accuracy
automatically for any binomial/Bernoulli fit, but
maihda_discriminatory_accuracy() only recognised the
aggregated structure for lme4
cbind(success, failure) fits. A brms
y | trials(n) fit therefore reached
maihda_auc() with the per-row success counts as
the response, errored on the non-0/1 values, and the summary quietly
omitted the DA. It now detects a brms aggregated binomial
via the existing trial-count extraction path
(maihda_brms_trial_counts(), which parses the
trials() addition term off the stored formula –
brms exposes no weights.brmsfit) and computes
the same count-weighted C-statistic the lme4
cbind() path uses. The rows are ranked by the per-trial
probability predict_maihda() returns (now normalised to a
probability on both engines – see below), so the AUC ranks by
probability and not by a count that confounds the probability with the
number of trials; n_case / n_control are the
total successes / failures. Bernoulli and lme4 fits are
unaffected.
calculate_pvc() (and therefore
maihda(), stepwise_pcv(), and
compare_maihda_groups()) could silently report a
REML-based PCV for a singular multi-random-effect model. The ML
refit (lme4::refitML(), needed because REML between-stratum
variances are not comparable across models with different fixed effects)
was skipped whenever the fit was globally singular – but a fit
can be singular because a non-stratum random effect
(e.g. an extra (1 | site) grouping factor) is on the
boundary while the stratum variance is comfortably nonzero. In that case
the package returned the REML PCV instead of the intended ML PCV. The ML
refit is now skipped only when the stratum variance
itself is on the boundary (effectively zero, judged on
lme4::isSingular()’s own relative tolerance), or when
refitML() fails; a non-stratum boundary no longer
suppresses it. The boundary skip for a zero-variance stratum is
preserved, so the zero-variance guard in calculate_pvc() is
not masked.
Aggregated-binomial stratum predictions were row-weighted
instead of trial-weighted. For a
cbind(success, failure) (or y | trials(n)) fit
the rows of an intersectional stratum can carry very different numbers
of binomial trials, but the per-stratum prediction means
averaged the fitted probabilities / linear predictors with
unit weights, so a 5-trial row counted as much as a
200-trial row. The unit weighting is correct for the
observed stratum summaries (which sum successes over
summed trials – the trials are already in the denominator), but wrong
for the model predictions. Prediction aggregation now uses a dedicated
maihda_prediction_weights() that weights each row by its
binomial trial count – read from
stats::weights(model, type = "prior") for an
lme4 cbind() fit, or parsed from the formula’s
trials() addition term for a brms
y | trials(n) fit (which exposes
model.frame.brmsfit but no weights.brmsfit, so
the prior-weights route is unavailable and the counts would otherwise
fall back to unit weights) – while the observed numerator/denominator
path is unchanged. This corrects the predicted stratum means and the
w_sum weights feeding maihda_table(), the
predicted-strata / effect-decomposition plots, and the
prediction-deviation panels; unweighted and non-binomial fits are
unaffected (the weights reduce exactly to the previous values).
brms cumulative (ordinal) stratum
predictions were on the wrong response scale. The stratum-level
prediction helper applied the scalar inverse link for
the engine = "brms", family = "ordinal"
response scale, returning a single cumulative probability in
[0, 1] rather than the expected category
score sum_k k * P(Y = k) in [1, K]
that the rest of the package documents and computes – the individual
predict_maihda() path already collapses the
fitted() category-probability array to that score, and the
clmm (engine = "ordinal") stratum path already
used it. This silently mis-scaled (and mis-ranked)
maihda_table() and the stratum plots for a Bayesian
cumulative model. The brms stratum helper now maps the latent location
to the expected category score with the same shared cumulative helpers
(using the posterior-mean thresholds), so the brms and clmm
cumulative paths agree and match the documented response scale; the link
scale is unchanged.
maihda_interactions() returned a meaningless
scalar diagnostic for a longitudinal fit. A direct call on a
longitudinal maihda() analysis (or a bare longitudinal
model) fell through to the crossed-dimensions branch and reported a
single per-stratum BLUP – which drops the random slope and so describes
a cross-section, not the trajectory. It now errors with a pointer to the
trajectory tools, matching the automatic-attachment path that already
skips longitudinal objects.
The Bayesian pd column was
mislabelled. maihda_interactions() reported
pd = P(BLUP > 0) under the heading “probability of
direction”, so a strongly negative interaction (whose
direction is near-certain) showed pd near 0 rather than
near 1. pd is now the conventional probability of direction
max(P(BLUP > 0), P(BLUP < 0)) (in
[0.5, 1], à la bayestestR::p_direction), with
the sign reported separately in the existing direction
column.
predict_maihda() silently accepted an unseen
stratum on the wemix and ordinal
paths. When newdata already carried a
stratum column, the preparation step returned early and
skipped the unseen-stratum check, so a misspelled or genuinely new
stratum flowed through to the WeMix/clmm linear-predictor
helpers, which map a missing random effect to 0 – yielding a fixed-only
prediction that looked valid and contradicted both the
documented contract (unseen strata are an error, as for
type = "strata") and lme4’s default behaviour.
predict_maihda() now rejects an unseen stratum by default
for every engine and prediction type, whether the
stratum is supplied directly or rebuilt from the grouping variables. A
new allow_new_levels argument (default
FALSE) opts into the previous behaviour
explicitly: for type = "individual" it returns a
population-average (fixed-effects-only) prediction for
unseen strata, dropping the stratum random effect (forwarded as
allow.new.levels to lme4 and allow_new_levels
to brms). Stratum-level predictions have no random effect to report for
an unseen stratum, so they remain an error regardless.
predict_maihda(scale = "response") returned
expected counts, not probabilities, for a brms
aggregated-binomial (y | trials(n)) fit.
brms’s fitted() /
posterior_epred() return the expected number of successes
(trials * p) for a binomial fit with a
trials() term, whereas lme4’s
type = "response" for a
cbind(success, failure) fit returns the per-trial
probability. The same predict_maihda(scale = "response")
call therefore produced two different scales depending on the engine, so
anyone ranking or comparing predicted risks across brms
strata was comparing expected counts (which confound the probability
with the per-row number of trials) rather than probabilities. The
brms response-scale prediction is now normalised by the
per-row trial counts (evaluated on the prediction newdata),
so both engines return a per-trial probability in [0, 1];
the discriminatory-accuracy AUC, the Shiny app’s absolute-prediction
panel, and any downstream ranking now read a true probability. Bernoulli
and non-binomial fits are unaffected (no trials() term
-> no normalisation).
predict_maihda(allow_new_levels = TRUE) did
not give the documented zero-effect prediction for an unseen stratum on
the brms engine. The contract is to drop the
stratum random effect (treat it as zero) for a stratum the model never
saw. The brms path merely forwarded
allow_new_levels = TRUE, but brms’s default
sample_new_levels = "uncertainty" draws a
new-stratum effect from the estimated random-effects distribution rather
than zeroing it, so the prediction silently carried a random per-call
group effect. Unseen-stratum rows are now split off and predicted with
an re_formula that excludes the stratum term, while
seen-stratum rows keep their estimated effect (a blanket exclusion would
have dropped the seen strata’s effects too). Any other
random effect the unseen row participates in – a contextual
(1 | school) intercept from
fit_maihda(context = ), a longitudinal
(time | id) growth term – is kept, exactly
as lme4’s allow.new.levels zeroes only the
unseen level’s effect and retains seen ones; for the usual
single-stratum model the excluding re_formula is
NA, the fixed-effects-only population average.
lme4 and the wemix/ordinal paths
already behaved this way and are unchanged, so the engines now agree on
the same model.
The crossed-dimensions decomposition reported
NaN additive / interaction shares when there was no
between-stratum variance. The shares split the between-strata
variance, so for a degenerate fit whose additive and
interaction variances are both estimated at exactly zero they are
0 / 0 = NaN, which leaked through summary()
and the comparison/tidier outputs. maihda_cc_partition()
now returns NA (a defined “undefined”) for the shares when
between == 0, and for the VPC when the total variance is
0; print() shows
NA (no between-strata variance to split). Non-degenerate
fits are unchanged.
A formula offset() term was silently dropped
from wemix and ordinal predictions.
WeMix::mix() and ordinal::clmm() both honour
an offset(.) term in the model formula when fitting, but
the manual linear-predictor helpers (no predict.clmm
exists, and WeMix’s own predict() offers no
fixed-only/scale control) rebuilt the linear predictor as
X %*% beta (+ u) from the design matrix only – and
model.matrix() never produces a column for an offset term,
so the offset was omitted. Every link-scale prediction (and the
response-scale value derived from it) was therefore off by exactly the
per-row offset: predict_maihda() for both engines, and the
ordinal plot_prediction_deviation_panels() probabilities,
which rebuild the clmm location independently. The helpers
now evaluate the formula’s offset on the prediction data
(stats::model.offset() of the rebuilt model frame) and add
it back, including for an offset-only null model whose
location is otherwise identically zero. Fits with no offset are
unaffected.
maihda() and
compare_maihda_groups() chose the engine from the raw
outcome, before subset/weights. The wrappers pass
engine explicitly to every sub-fit, so they pre-selected
engine = "ordinal" from an ordered-factor outcome – but
they checked the raw column, while fit_maihda()
detects the family on the analytic sample (after
subset and weight filtering). An ordered 3-level outcome
subset to two observed levels is a binary analytic sample: a direct
fit_maihda() fits it as
binomial/lme4, but the wrappers pinned
engine = "ordinal" and then errored (maihda())
or failed every group (compare_maihda_groups()) with an
engine/family contradiction. The ordinal pre-check now runs on the same
analytic sample fit_maihda() uses (the forwarded
subset/weights are resolved against the data mask first),
so the engine choice can no longer contradict the resolved family.
Relatedly, maihda() now forwards the
evaluated subset/weights (not
the raw expressions) to its derived null/adjusted fits, mirroring
compare_maihda_groups(): a response-referencing
subset (e.g. subset = y %in% c("lo", "mid"))
was otherwise re-evaluated by each derived fit against
$original_data, whose response has already been recoded to
0/1, selecting zero rows.
Auto-binning a numeric stratum dimension could silently
overwrite a user column. When make_strata() (or
the (1 | a:b) shorthand) auto-bins a numeric dimension
v, the adjusted-model and prediction machinery add an
internal factor column named .maihda_dim_<v>
(referenced by the adjusted / crossed-dimensions formulae and rebuilt
for prediction newdata). Neither writer checked whether the
user’s data already carried a column of that name, so an existing
.maihda_dim_<v> variable was clobbered – and, because
that column then enters the model, the fit or prediction changed
silently. The .maihda_dim_ prefix is now
reserved: make_strata() rejects a
pre-existing .maihda_dim_<v> column for any dimension
it is about to auto-bin, with an actionable rename hint (or set
autobin = FALSE), mirroring the existing
reserved-weight-column guard. The internal name is now generated through
one shared helper so the fit-time and predict-time writers cannot drift,
and predicting on a fitted model’s own data (which legitimately carries
the package’s copy) is unaffected.
maihda_table() could show REML variance rows
alongside an ML-based PCV without saying so.
calculate_pvc() refits a Gaussian lme4 fit
with maximum likelihood before differencing the between-stratum
variances (REML variances are not comparable across models with
different fixed effects), but the table’s Between-stratum
variance / SD and VPC/ICC rows are each model’s
own (REML) estimate – the quantities summary() reports. The
two are on different variance bases by design, so
PCV != (var_null - var_adj) / var_null read off the table,
which looks like an inconsistency. maihda_table() now
attaches a models_note (shown by print())
explaining this whenever the PCV’s variance basis actually differs from
the displayed rows; it stays silent for already-ML engines
(glmer/brms/wemix/ordinal)
and for a boundary null where calculate_pvc() keeps the
REML fit. The displayed numbers are unchanged, so the table still agrees
exactly with summary().
cli). The print methods across the package –
maihda() analyses, model/summary, PCV, discriminatory
accuracy, the interaction diagnostic, group comparison, tables,
information criteria, and the rest – bold section titles, accent the
headline values, and dim secondary notes, reusing the plot
accent colour so the console and the figures agree. The palette
is deliberately valence-neutral: colour marks emphasis
(a finding to look at, a neutral conclusion, a de-emphasised note),
never good-vs-bad – so, e.g., a negligible equivalence
result is shown in a neutral colour, not green. It degrades to
plain text automatically wherever ANSI is unsupported
(knitr/vignettes, R CMD check, testthat,
NO_COLOR), so rendered and captured output is byte-for-byte
unchanged. cli joins Imports.compare_maihda_groups() (and
maihda(group = )) gain a
var_between_adjusted_ml column: the
adjusted model’s actual between-stratum variance, read straight
off the adjusted fit on the maximum-likelihood scale the PCV is computed
on. The existing var_between_adjusted column is, by design,
a derived coherence quantity (var_between * (1 - pcv), on
the REML var_between/vpc scale so the table
satisfies
pcv = (var_between - var_between_adjusted) / var_between
exactly) and is not the adjusted fit’s own variance;
the documentation now states this explicitly and
var_between_adjusted_ml reports the literal adjusted
variance for anyone who needs it. The two differ only by the small
REML-vs-ML gap in the null variance.maihda() now reports the intersectional
interactions by default. “Which strata genuinely interact” is
the scientific payoff of MAIHDA, so it no longer needs a separate call:
maihda() computes maihda_interactions() on the
adjusted (or crossed-dimensions) model, stores it as the
interactions slot, and surfaces a one-line headline in
print() (flagged count, the strongest stratum, and –
crucially – the multiplicity stance actually used, so an uncorrected
scan is never silently read as corrected). The computation is
essentially free (it reads the stratum estimates the summary already
holds; no refit), and it is skipped for a longitudinal decomposition
(whose interaction is a trajectory, not a scalar). Control it with the
new interactions argument: TRUE (default)
computes it with the diagnostic’s own default correction;
FALSE skips it; or pass a p.adjust method name
to choose the correction. fit_maihda() gains the same
argument as an opt-in
(interactions = FALSE by default, since a single fit is
often a null model where the diagnostic does not apply).
plot(..., highlight_interactions = TRUE) now reuses the
stored diagnostic, so the plot highlights and the printed headline can
no longer disagree.broom-style tidy() and
glance() methods for maihda_model,
maihda_summary, and maihda_analysis, so MAIHDA
results drop straight into tidy data for ggplot2,
gt/flextable tables, and other downstream
tooling. tidy() returns the stratum (intersection)
random-effect estimates by default (with the human-readable
intersectional label), or the variance-components table
(component = "variance") or fixed effects
(component = "fixed"), all in broom’s
estimate/std.error/conf.low/conf.high
shape; tidy() on a maihda() analysis takes
which = c("null", "adjusted"). glance()
returns the MAIHDA headline as a one-row tibble – the
VPC/ICC (with its bootstrap/posterior interval), and for a
maihda() analysis the PCV, the
adjusted-model AUC, the additive/interaction shares, AUC and MOR for a
binary outcome, plus
n_strata/nobs/engine/family
– a row no generic mixed-model tool emits, since the PCV needs the
null+adjusted pair. The layout is uniform across the lme4,
brms, wemix, and ordinal engines.
The generics come from the lightweight
generics package (the same ones
broom/broom.mixed re-export), so
tidy()/glance() dispatch whether the user has
broom, generics, or just MAIHDA
loaded, with no hard broom dependency; the raw
fixed-effect/per-row tidying broom.mixed already does on
the underlying fit is intentionally not duplicated.id, time,
and time_degree arguments on fit_maihda() and
maihda(). Supplying id (the person/unit
measured repeatedly) and time (a numeric measurement-time
column) fits a 3-level growth curve – occasions within
individuals within intersectional strata – with a random intercept
and slope on time at both the individual and stratum
levels (outcome ~ time + (time | id) + (time | stratum);
the growth slopes are added automatically, so you write only the strata
shorthand (1 | var1:var2)). The between-stratum variance,
and therefore the VPC, is then a function of time
(VarS(t) = a(t)' Sigma_s a(t)): summary()
reports the baseline (reference-time) VPC, the full time-varying VPC
trajectory, and the stratum/individual intercept-slope-covariance
blocks, and a new plot type "vpc_trajectory" draws the
VPC-over-time curve (with "trajectories" for the predicted
per-stratum mean trajectories;
plot(type = "vpc")/"all" route there for a
longitudinal fit). maihda(decomposition = "longitudinal")
(selected automatically when id/time are
supplied) fits a null and an adjusted growth model – the adjusted model
adding the dimensions’ main effects and their
dim:time interactions – and reports the
PCV separately for the baseline (intercept) and the slope
variance: the additive-vs-multiplicative split of the
intersectional trajectory inequality (the paper’s “partly
multiplicative but mostly additive” finding), returned as a
maihda_long_pcv object with pcv_intercept,
pcv_slope, and a time-specific pcv_t (and a
"pcv_trajectory" plot). All families with a defined level-1
variance are supported
(gaussian/binomial/poisson/negbinomial,
latent scale for non-Gaussian) on engine = "lme4" (any
degree) and engine = "brms" (linear growth; posterior
credible bands on the VPC trajectory).
predict_maihda(type = "strata") returns each stratum’s
deviation at the baseline (reference) time plus its raw intercept and
slope (a stratum is now a trajectory; the baseline deviation,
evaluated at ref_time = min(time), is the longitudinal
analogue of a cross-sectional stratum BLUP and differs from the raw
time-0 intercept whenever time is not zero-referenced). The
intercept-only guards elsewhere are untouched – a longitudinal fit is
tagged and routed to the time-varying path, while every other model
still rejects random slopes – and
extract_between_variance()/calculate_pvc()
reject a longitudinal model with a pointer to the time-varying
decomposition. Design-weighted, contextual,
wemix/ordinal, and group-comparison
longitudinal models are out of scope (each errors). A new bundled
dataset maihda_long_data (600 persons x 5
waves, gender x ethnicity x education strata, with constructed
mostly-additive trajectory differences) demonstrates it.family = "ordinal"
(alias "cumulative"; probit via the new exported
maihda_cumulative("probit") or
brms::cumulative("probit")) fits a cumulative
(proportional-odds) model: a new
engine = "ordinal" wraps
ordinal::clmm() for the frequentist path (selected
automatically for an ordinal family under the default engine, with a
message) and engine = "brms" uses
brms::cumulative(). An ordered-factor outcome with 3+
levels under the all-default family/engine selects the model
automatically (with a warning; a 2-level ordered factor still takes the
binomial auto-detect path), and an unordered factor response is coerced
to ordered in its declared level order with a message. The VPC/ICC lives
on the latent scale – level-1 variance pi^2/3 (logit) or 1 (probit), the
same latent treatment as binomial models, so cumulative VPCs are
comparable to logistic ones – and summary() gains a
thresholds slot (the cut points that take the intercept’s
place, with Hessian SEs) printed alongside the location fixed effects.
Because predict.clmm does not exist, predictions are built
from the stored coefficients: the link scale is the latent location eta
= x’beta + u, and the response scale is the expected category
score sum_k k*P(Y = k) (the quantity the plots label “Average
Expected Category Score”), with P(Y <= k) = g^-1(alpha_k - eta)
differenced by pure, unit-tested helpers shared with the brms path
(whose fitted() probability array
predict_maihda() now collapses to the same score). The
two-model maihda() decomposition and PCV,
stepwise_pcv(), compare_maihda_groups()
(engine handshakes mirror the sampling_weights precedent),
the stratum plots, obs_vs_shrunken (observed mean category
score), risk_vs_effect, effect_decomp (exact
on the latent scale), and the deviation panels all work;
maihda_mor() now also accepts cumulative-logit fits,
returning the median cumulative odds ratio. Explicit
limits with targeted errors: logit/probit links only, the canonical
single (1 | stratum) structure (no
context/crossed-dimensions – use brms), no
sampling_weights on the clmm path, no parametric bootstrap
(clmm has no simulate/refit; brms provides credible intervals),
AUC/maihda_vpc_response() stay binomial-only, and the
ternary diagnostic is not yet supported. ordinal joins
Suggests.family = "negbinomial" on
fit_maihda() (and therefore maihda(),
stepwise_pcv(), and compare_maihda_groups()).
The dispersion parameter theta is estimated from the data –
lme4::glmer.nb() under the default engine, the
shape parameter under engine = "brms" – and a
fixed-theta MASS::negative.binomial(theta) family object is
also accepted with lme4, honouring the supplied theta. The VPC/ICC uses
the lognormal latent-scale level-1 variance
log(1 + 1/mu + 1/theta) (Nakagawa, Johnson & Schielzeth
2017, J R Soc Interface), the negative-binomial analogue of the
Stryhn et al. (2006) Poisson approximation already used by the package
(it reduces to it as theta grows); for brms the shape draws
are propagated into the VPC credible interval. The two-model
maihda() decomposition, calculate_pvc(),
parametric-bootstrap intervals (via lme4::refit(), which
holds theta fixed at its estimate – the interval is conditional on
theta, as documented), prediction, and the stratum plots (routed to the
count branch) all work; the log link is required, and the wemix engine,
maihda_discriminatory_accuracy(), and
maihda_vpc_response() reject the family with targeted
messages. Internally, engine-specific family labels are now
canonicalised so the family/link comparability checks in
calculate_pvc() and compare_maihda() no longer
depend on raw label strings: two fits whose theta is estimated
(lme4’s theta-embedding "Negative Binomial(<theta>)",
brms’s shape) compare equal, since each label otherwise
carried its own theta estimate and could never match. A fixed,
user-specified theta (MASS::negative.binomial(theta)) is
treated differently – it is part of the model specification, so it stays
in the comparability key and two fits with different fixed thetas are
(correctly) rejected as incomparable.sampling_weights argument on
fit_maihda(), maihda(),
stepwise_pcv(), and compare_maihda_groups().
Sampling weights are not the same thing as lme4’s
weights= (precision weights, which rescale the residual
variance), so feeding survey weights to lmer/glmer maximises the wrong
objective and gives invalid population estimates; supplying
sampling_weights with engine = "lme4" is
therefore an error, and supplying it with the default
engine switches (with a message) to the new
engine = "wemix": weighted
pseudo-maximum-likelihood via WeMix::mix() (Rabe-Hesketh
& Skrondal 2006), the estimator built for NAEP/PISA-style survey
analysis. The individual weights enter at level 1 unchanged and the
level-2 (stratum) weights are 1, because intersectional strata are
exhaustive population cells sampled with certainty. The wemix engine
supports the canonical MAIHDA structure –
gaussian(identity) or binomial(logit) with the
single (1 | stratum) intercept – and flows through the
whole toolkit: summary() reports the design-weighted
VPC/ICC (latent-scale pi^2/3 level-1 variance for logistic fits,
matching the other engines) and design-consistent (sandwich)
fixed-effect standard errors; stratum estimates carry analytic
conditional SEs (the design-weighted analogue of lme4’s
condVar, reducing to it at unit weights);
predict_maihda(), the stratum plots (aggregated with the
sampling weights, so stratum summaries are population-representative),
calculate_pvc() (which now also refuses to compare fits
with different sampling weights), the maihda()
two-model decomposition, stepwise_pcv(), and
compare_maihda_groups() all work. For a binomial fit,
maihda_discriminatory_accuracy() computes the
design-weighted AUC (each observation contributes its
weight as case/control mass) and flags it in the print method.
Alternatively engine = "brms" accepts
sampling_weights as likelihood weights
(y | weights(w), normalized to mean 1), giving a
pseudo-posterior: point estimates are design-consistent but
credible intervals are not design-based (a message says so). Limitations
are explicit rather than silent: no parametric bootstrap for wemix (a
design-based interval would need replicate weights – a possible future
extension), and no crossed random effects, so context = and
decomposition = "crossed-dimensions" require lme4/brms.
Rows with missing or non-positive weights are dropped with a warning. A
unit-weight wemix fit reproduces the lme4 ML fit to numerical precision.
WeMix joins Suggests.context argument on fit_maihda() and
maihda(). context = "school" (one or more
column names) enters each context as a crossed random intercept
alongside the intersectional stratum effect,
outcome ~ covars + (1 | stratum) + (1 | school), and the
summaries then partition the unexplained variance into
between-stratum vs. between-context vs. residual: the
headline VPC/ICC becomes the between-stratum share net of the
context, each context gets its own Context: <name>
row in the variance-components table, and a new $context
summary element carries the per-context variances and shares (with
parametric-bootstrap intervals for lme4 via
bootstrap = TRUE, and posterior credible intervals for
brms). In maihda() the context random intercept is carried
by both the null and the adjusted model, so the PCV is computed
with the context partialled out; context also composes with
decomposition = "crossed-dimensions" (the context variance
then enters the single fit and its VPC denominator). A new plot type
"context_vpc" (on both maihda_model and
maihda_analysis) bars the stratum vs. context variances
with their shares, and plot(type = "vpc") renders the
contextual split automatically. context cannot be combined
with group (a stratified per-level comparison
vs. a joint crossed model are different designs; supplying both
errors), a context variable may not be a stratum dimension or appear as
a fixed effect, and a context with few levels weakly identifies its
variance (consider engine = "brms"). A manually written
... + (1 | school) still fits and is summarised generically
as “Other random effects”; only context = activates the
labelled partition.decomposition value
"cross-classified" to
"crossed-dimensions" in
maihda() and compare_maihda_groups() (and in
the Shiny app’s decomposition toggle), because that mode crosses the
stratum dimensions’ main effects as random intercepts – whereas
“cross-classified MAIHDA” in the literature means the contextual
stratum-by-place model now fitted via context (see above).
The old value is accepted as a deprecated alias that
warns and maps to "crossed-dimensions". Note for code that
inspects results: a maihda_analysis from this mode now has
mode = "crossed-dimensions" (was
"cross-classified"), and
compare_maihda_groups() results carry
attr(, "decomposition") == "crossed-dimensions"; printed
output and plot titles say “crossed-dimensions” accordingly.maihda(), a single high-level entry point that
runs the standard two-model MAIHDA workflow in one call. It fits the
null model (the formula you supply) and the
adjusted model (the same model plus the additive main
effects of the stratum dimensions), summarises the null model’s VPC/ICC,
and reports the PCV – the proportional change in
between-stratum variance from null to adjusted (the additive share of
the intersectional inequality). When a group is supplied it
also runs this decomposition within each group (the
compare_maihda_groups() result gains a pcv
column). It returns one consistent maihda_analysis object
with new
model_adjusted/summary_adjusted/pcv
slots (alongside the unchanged null-model
model/summary), and print,
summary, and plot methods; plot()
routes the VPC/shrinkage views to the null model and the
additive-vs-intersectional views (effect_decomp,
risk_vs_effect, ternary) to the adjusted
model, and gains type = "group_pcv". maihda()
is intrinsically a decomposition and has no single-model mode – use
fit_maihda() for a single fit. A numeric dimension
auto-binned for the strata enters the adjusted model as its
reconstructed tertile factor (not a linear term). Because it cannot
decompose without an intersection, maihda() errors (rather
than returning a half-result) when the stratum-defining variables are
unidentifiable (e.g. a hand-built stratum column) or define
only one dimension; the shorthand (1 | var1:var2) and
make_strata() paths both record the dimensions and
decompose normally.maihda_country_data dataset (OECD PISA 2018,
accessed via the learningtower package): 3,600 students
across six countries with gender x socioeconomic-status strata and
mathematics-achievement outcomes. It showcases
compare_maihda_groups() /
maihda(group = "country"), since intersectional inequality
(VPC/ICC) genuinely differs across the countries.maihda_sparse_data dataset and a new
vignette, “Bayesian MAIHDA for sparse intersections”,
showing why engine = "brms" is the safer estimator when
intersectional cells are small. The (simulated) data carry a
known interaction – 40% of the between-stratum variance, on a
Gaussian outcome y and a binary outcome event
– across 36 strata with deliberately skewed sizes (median 6, half the
cells below 5). Under that sparsity the maximum-likelihood (lme4)
crossed-dimensions fit is singular and over-shrinks the interaction (to
~23% Gaussian, ~3% binary – i.e. a real interaction read as “fully
additive”), with no uncertainty attached; a weakly-informative prior on
the random-effect SDs regularises the variance off the boundary and
returns a calibrated credible interval that covers the truth. The
vignette also documents the precompute-and-cache pattern for shipping
brms results in a build (Stan cannot run on CRAN’s / pkgdown’s
builders).maihda_table(), a one-call
publication-ready results export that assembles the two
canonical MAIHDA write-up deliverables (cf. Evans et al. 2024’s
tutorial) from a fitted maihda() analysis (or a single
fit_maihda() model): (a) a model-results
table contrasting the null (Model 1) and adjusted (Model 2)
fits – intercept, between-stratum variance and SD, VPC/ICC, the PCV,
and, for a binary outcome, the AUC and Median Odds Ratio – and (b) a
ranked-strata table ordering every intersectional
stratum by its model-predicted outcome (their Table 4), with the
predicted value’s conditional interval, the stratum size, and the
stratum random effect. It introduces no new estimator – the
model-results table reuses the quantities from summary()
and the ranked-strata table reuses
plot(type = "predicted")’s stratum predictions, so the
table agrees exactly with summary()/plot().
The $models data frame is numeric and export-ready
(statistics in rows, an estimate +
*_lower/*_upper interval columns per model:
the VPC bootstrap/posterior interval and the bootstrap PCV interval are
carried, other rows are point estimates), and the print()
method renders the familiar “estimate [low, high]” layout plus the
top/bottom strata (n_strata). It adapts to every fit type:
a crossed-dimensions analysis gets “Additive share”/“Interaction share”
rows instead of the PCV, a contextual cross-classified analysis
(context =) gets a “Context share (VPC)” row, an ordinal
fit leaves the intercept row NA (its category thresholds
are reported by summary(), not the table), and
which = "adjusted" ranks the strata by the adjusted rather
than the null model. Works across the lme4, brms, wemix, and ordinal
engines.summary() of a binomial/Bernoulli MAIHDA model – and
therefore maihda(), whose summaries it builds – now reports
the discriminatory accuracy automatically: the AUC /
C-statistic and Median Odds Ratio (the “DA” in MAIHDA), as a new
discriminatory_accuracy slot shown by the
print() methods, so the strata’s discriminatory accuracy
sits alongside the VPC without a separate call. For
maihda(), the headline print() shows the null
model’s AUC/MOR with the adjusted model’s AUC for comparison. The
response-scale (probability-scale) VPC is attached on request via
summary(model, response_vpc = TRUE, seed = ) or
maihda(..., response_vpc = TRUE, seed = ) (it is
simulation-based, hence opt-in and seeded). Both are computed from the
already-fitted model with no refit, and are skipped for non-binomial
outcomes and for the cross-classified fit (whose single-stratum
between-variance the MOR/response-VPC require is not defined across
crossed random effects). The standalone
maihda_discriminatory_accuracy(),
maihda_auc(), maihda_mor(), and
maihda_vpc_response() are unchanged.maihda_ic(), which surfaces relative-fit
information criteria for choosing between model
structures (different covariate sets, strata definitions, or
families) – the question the VPC/PCV do not answer. It reports
AIC/BIC for the likelihood engines (lme4,
ordinal clmm) and the Bayesian
WAIC/LOOIC for brms, takes one or more
maihda_models (or a maihda_analysis, which
expands into its null and adjusted rows), and adds a delta
column (gap from the best model on the primary criterion). Crucially it
handles the REML pitfall: lmer fits
Gaussian models by REML, whose AIC/BIC are not comparable
across models with different fixed effects (the canonical
null-vs-adjusted MAIHDA case), so when more than one model is supplied
any REML fit is refitted with maximum likelihood via
lme4::refitML() before the criteria are read – matching
what anova() does on lme4 models – and the
estimator column records it. Design-weighted
(wemix) pseudo-likelihood fits report NA (no
standard AIC/BIC is defined). compare_maihda() now appends
these criteria alongside the VPC/ICC by default
(ic = TRUE); set ic = FALSE for the lean
VPC-only table.stepwise_pcv() now also reports the
discriminatory-accuracy trajectory for a binary
(binomial/Bernoulli) outcome, alongside the between-stratum-variance /
PCV trajectory it already produced – the stepwise
discriminatory-accuracy analysis of Merlo et al. (2016). Each step gains
an AUC (that model’s C-statistic; step 0 is the strata-only
discriminatory accuracy), Step_AUC and
Total_AUC (the absolute change in AUC –
delta-AUC – versus the previous step and versus the null, in contrast to
the proportional
Step_PCV/Total_PCV), and MOR (the
Median Odds Ratio, logit link only) column. No extra models are fit: the
columns are read off each step’s already-fitted model via
maihda_discriminatory_accuracy(), so a design-weighted
stepwise (sampling_weights) reports the design-weighted
AUC, and the columns are simply absent for non-binary outcomes (the
gaussian/poisson/ordinal table is unchanged). A new print()
method for the maihda_stepwise table notes the
proportional-vs-absolute distinction.maihda_interactions(), a diagnostic that flags
which strata carry a credibly non-zero intersectional
interaction – the heart of “where is there genuine
intersectionality”. It reads each stratum’s interaction BLUP (the
stratum random effect of the adjusted /
crossed-dimensions model, i.e. the departure from the additive
main-effects prediction) and flags the strata whose effect is credibly
different from zero, returning a ranked maihda_interactions
table (flagged strata first, by interaction magnitude). For the
frequentist engines
(lme4/wemix/ordinal) it uses the
BLUP’s conditional standard error with an optional multiplicity
correction (adjust, default "none"; the docs
steer to FDR "BH" for screening many strata, with the full
set of p.adjust methods available for a reviewer who needs
a specific one); for brms it uses the exact
posterior tail – a credible interval and the probability of
direction pd = P(BLUP > 0) – rather than a normal
approximation, and adjust is inert (the Bayesian answer is
multiplicity-free). Passing a maihda() analysis uses the
right (adjusted) model automatically; passing a bare null model is
caught with a warning, since its stratum effects mix the additive and
interaction parts. The help explains why a correction is optional on
already-shrunken BLUP estimates (Gelman, Hill & Yajima 2012; Gelman
& Carlin 2014).plot() gains a highlight_interactions
argument (on both a maihda_model and a
maihda() analysis) that focuses and stars the
maihda_interactions()-flagged strata on the BLUP-based
views (effect_decomp, predicted,
obs_vs_shrunken). Pass TRUE (flags computed
with defaults), a multiple-testing method such as "BH", or
a precomputed maihda_interactions object to reuse a
specific conf_level/adjust; for an analysis
the flags are computed once from the adjusted model and reused across
views. In effect_decomp, labels follow the selected flagged
set, so a BH screen labels only strata that survive the BH adjustment.
FALSE (default) leaves every plot unchanged.maihda_mor() now requires the logit
link, not merely the binomial family. The Median Odds Ratio is an
odds-ratio-scale quantity derived from the logistic latent variance, so
applying its exp(sqrt(2 * V_A) * qnorm(0.75)) formula to a
binomial(link = "probit") fit returned a number that was
not on the model’s scale. maihda_mor() now errors for a
non-logit binomial link, and
maihda_discriminatory_accuracy() reports the
(link-agnostic) AUC with mor = NA for such fits, recording
the link and explaining the NA in its print()
method.maihda_vpc_response() collapses the
fixed part to its mean linear predictor before simulating, so for an
adjusted (covariate) model the response-scale VPC is a
conditional-at-mean estimate (evaluated at the average covariate
profile), not one marginalised over the covariate distribution. It is
exact for the canonical null/strata-only model; the documentation now
states this and recommends reading it from the null model.maihda_discriminatory_accuracy(),
maihda_auc(), and maihda_mor(), and points to
maihda_vpc_response() for the response-scale VPC.shinycssloaders dependency to the
README’s interactive-dashboard list (it is in DESCRIPTION
Suggests and used by run_maihda_app()).plot() generic.
compare_maihda() and
compute_maihda_ternary_data() now return classed objects,
so plot() dispatches automatically:
plot() on a compare_maihda() result (was
plot_comparison())plot() on a compare_maihda_groups()
result, with type = "vpc"/“components” (was
plot_group_comparison())plot() on a compute_maihda_ternary_data()
result (was plot_maihda_ternary())plot_comparison(),
plot_group_comparison(), and
plot_maihda_ternary() functions still work but are
deprecated and emit a one-time warning pointing to
plot().fit_maihda() now records lme4 fit-quality diagnostics
(singular fit and convergence warnings) on the model object;
print() on the model and summary() surface a
“Fit diagnostics” note so a boundary/non-converged fit – which makes the
VPC/PCV unreliable – is no longer silent.compare_maihda_groups() now raises a single aggregated
warning naming any group whose lme4 fit was singular or failed to
converge (per-group fits on small strata are where this is most likely),
so an unreliable per-group VPC – often pinned at 0 by a boundary fit –
is no longer silent."risk_vs_effect" plot no longer frames the outcome
axis as “risk”. A higher predicted value is not universally “bad” (it
depends on the outcome), so the axis and title now read as a neutral
“mean predicted value/probability”, with a note that the direction
depends on the outcome.plot_prediction_deviation_panels() to match the
implementation: the labelled points use a per-type metric – deviation
from the mean prediction for Gaussian/Poisson (and the ordinal
expected-score mode), the absolute deviance residual for binomial, and
surprise for the ordinal surprise mode – and are not statistical
outliers or model-misfit “deviants”.compare_maihda_groups() is the between-stratum
share of variance, which can differ across groups because of
the residual variance as well as the between-stratum variance. The
documentation now points to the var_between column for
comparing the absolute amount of intersectional variation, and notes
that overlap of separate per-group intervals is not a valid test of
whether two groups’ VPCs differ.plot() on a compare_maihda_groups() result
gains type = "between_variance" (and plot() on
a maihda(group = ) analysis gains
type = "group_between_variance"), which bars the absolute
between-stratum variance by group – the magnitude of
intersectional variation that the VPC share cannot convey. All
group plots now name any groups omitted because their VPC was not
estimable instead of dropping them silently.calculate_pvc(),
stepwise_pcv(), and the print method): the PCV is a
model-dependent change in between-stratum variance and equals variance
“explained” only when the second model nests the first; the stepwise PCV
is order-dependent and not a variable’s unique contribution. The
vignette and Shiny app no longer describe PCV as variance causally
“explained” by main effects or treat a negative PCV as evidence of
hidden structural inequality.summary() VPC/ICC documentation: the
denominator is the total unexplained variance, which includes the
variance of any additional random effects (not just between-stratum +
residual), and a note on the weighted-Gaussian level-1 variance was
added.gaussian("identity"), binomial/Bernoulli with
logit/probit, poisson("log")); other families such as
Gamma(link = "log") will fit but
summary()/VPC/PCV will stop with a clear “not implemented”
error rather than returning an invalid number.future, promises, and haven (for
SPSS/Stata uploads).vignettes/*.html); these are build artifacts generated
from the .Rmd sources and had drifted out of sync with the
corrected text. They are now git-ignored and added to
.Rbuildignore, so a locally rendered HTML is never shipped
in the tarball and R CMD build regenerates inst/doc from
the .Rmd.R CMD check --as-cran: the maihda_country_data
@source linked to
https://www.oecd.org/pisa/data/, which returns HTTP 403. It
now links to the CRAN page of the learningtower package
(the reproducible access route used to build the dataset), keeping the
OECD PISA 2018 attribution in the text.data-raw/maihda_health_data.R regeneration script
no longer calls install.packages("NHANES") as a side
effect; like the country-data script it now stops with a clear message
asking the developer to install the dependency.make_strata() (and the prediction-time stratum lookup)
now matches rows to strata with a single vectorised, collision-safe key
match instead of an O(rows x strata x variables) row-by-row scan, so it
scales to large survey datasets. Behaviour is unchanged, including the
correct handling of values that contain the stratum-label
separator.calculate_pvc() and compare_maihda() now
apply their analytic-sample identity checks to design-weighted
(wemix) fits. A WeMixResults object
exposes no nobs()/model.frame(), so the
row-count, row-identity and outcome-fingerprint guards previously fell
through to a silent pass: two wemix fits sharing a formula,
n and strata but fit to completely different
outcome values compared as if they were the same analytic
sample (calculate_pvc() returned a PCV,
compare_maihda() reported VPCs, neither warned). The guards
now fall back to the wrapper’s stored analytic $data (the
rows the engine actually fit), so a mismatched outcome is caught –
calculate_pvc() errors and compare_maihda()
warns – exactly as for the lme4 path. Genuinely matched wemix fits are
unaffected.ref_time = min(time)), not the raw time-0 random-intercept
variance. maihda_longitudinal_pcv() (and the
maihda_long_pcv print method) evaluated
Sigma[1, 1] directly, which is the variance at
time = 0 – correct only when time is centred so the
baseline is 0. For a model whose time does not start at zero (e.g. waves
10:12) this reported the PCV of an extrapolated time-0 variance, which
is meaningless and can even be negative; the baseline PCV is now
a(t)'Sigma a(t) evaluated at ref_time,
matching how summary() reports the baseline VPC. The
slope-variance PCV is unchanged (it is invariant to where time is
zeroed).maihda_table()’s ranked-strata table and
plot(type = "predicted" / "obs_vs_shrunken" / "risk_vs_effect" / "effect_decomp" / "prediction_deviation" / "ternary")
build (which adds only the random intercept and drops the slope)
misrepresents it. These paths now error – or, for
maihda_table(), omit the ranking with an explanatory note –
and point to the trajectory tools
(predict(type = "strata"),
plot(type = "trajectories"),
plot(type = "vpc_trajectory")). summary()’s
stratum-estimates print is relabelled “baseline (intercept) deviations”
for a longitudinal fit, with the same pointer.maihda_ic() now reports the analytic sample size
n for a design-weighted (wemix)
fit instead of NA. WeMixResults has
no nobs() method, so the IC table fell back to
NA; it now uses nrow(model$data) (the rows the
fit used), matching the nobs that glance()
already reports for the same model.lmer models (the
REML pitfall). The proportional change in between-stratum
variance compares the stratum variance of a null and an
adjusted model that differ in their fixed effects (the adjusted
model adds the dimensions’ additive main effects). lmer
fits Gaussian models by REML, and a REML variance estimate is
not comparable across models with different fixed
effects, so the PCV was biased – it overstated the adjusted model’s
residual between-stratum variance and therefore understated the
additive share (the PCV). calculate_pvc() – and
hence maihda()’s PCV, stepwise_pcv(), and the
per-group PCV in compare_maihda_groups() – now refits any
REML lmer model with maximum likelihood
(lme4::refitML()) before reading the variances (and before
the parametric bootstrap, so the interval matches), exactly as
maihda_ic() already does for AIC/BIC and as
anova() does on lme4 models. In simulations
with a known 60% additive share (40% interaction), the reported PCV
moved from ~50% (biased, REML) to ~58–60% (ML), recovering the truth.
glmer (GLMM) fits, the brms/wemix/ordinal engines,
single-model VPC/ICC summaries (which correctly stay REML, being
comparison-free), and singular/boundary fits are unaffected. In
compare_maihda_groups(), var_between_adjusted
is now reported as var_between * (1 - pcv) so it shares the
REML scale of the VPC’s var_between and the table stays
internally coherent.gaussian(link = "log")). The
residual variance is on the response scale while the between-stratum
variance is on the link scale, so summary(),
compare_maihda(), and calculate_pvc() now
raise a clear error instead of silently returning an invalid variance
partition.log(x)),
dropping rows with missing values, dropping rows with a missing prior
weight, and applying any subset= (including
negative/character row indices) – instead of the raw outcome column. An
outcome that is only 0/1 once excluded rows are removed is now correctly
fit with family = "binomial"....
(e.g. weights=, subset=, offset=)
now work at any nesting depth, including through the
maihda() and compare_maihda_groups() wrappers.
Arguments are captured as quosures and resolved with the data mask
(rlang::eval_tidy), fixing the previous “object not found”
/ “..1 used in an incorrect context” failures (a direct
fit_maihda() call worked, but the wrappers did not).subset= expression referencing the original
response labels (e.g. subset = y %in% c("no", "yes")): the
subset is evaluated against the original response before recoding.compare_maihda_groups() now slices forwarded
weights=/subset=/offset= to each
group’s rows before fitting, not just for the row-count guard. An
external (non-column) weights vector previously failed
every group with a length mismatch, and an external subset
vector could be recycled onto the wrong rows of later groups; both now
align correctly per group.weights.
With weights the per-observation residual variance is
sigma^2 / w_i, so the level-1 variance reported is the mean
conditional residual variance mean(sigma^2 / w_i) rather
than a single sigma^2; unweighted models are
unchanged.plot_effect_decomposition() now uses the stratum random
effect (BLUP) itself as the intersectional component instead of (full
prediction - fixed prediction). With additional random effects such as
(1 | site) the latter wrongly absorbed those effects into
the stratum component.plot_prediction_deviation_panels() (ordinal, surprise mode)
is now the average per-observation surprise, mean(-log(p))
(log loss), instead of -log(mean(p)), which could change
stratum rankings.run_maihda_app()) now also auto-detects
a numeric 0/1 outcome and fits it as binomial under the default family,
matching the core API, instead of silently fitting a linear probability
model.compare_maihda_groups() now warns when groups end up
with different populated strata even under
shared_strata = TRUE, since their VPCs are then estimated
over different stratum support and are not strictly directly
comparable.plot_prediction_deviation_panels() now plots
Poisson/count models on the response (expected-count) scale with count
labels, rather than routing them through the Gaussian link-scale
branch.compare_maihda_groups(min_group_n = ...) now guards the
analytic sample size (the rows the model actually fits) rather than the
raw group row count, so a group with enough raw rows but a tiny usable
sample is skipped instead of being fit on a handful of
observations.predict_maihda(type = "strata") now respects
newdata: it returns only the strata present in
newdata (and errors on a stratum the model never saw, as
type = "individual" does) instead of always returning every
training stratum. With newdata = NULL it still returns all
strata.compare_maihda_groups() now counts populated strata for
the pre-fit guard on the analytic model frame, not the raw subgroup. A
group with two raw strata but only one stratum left after missing-row
removal is now cleanly skipped as VPC-undefined instead of failing
during fitting with “grouping factors must have > 1 sampled
level”.n_boot for bootstrap intervals must now be at least 10
(the minimum number of successful refits an interval requires); an
unusably small n_boot fails immediately with a clear
message instead of only erroring after the bootstrap runs.calculate_pvc() and compare_maihda() now
detect differing prior weights: previously two models fit
to the same rows/outcome/strata but with different weights compared as
if equivalent (PCV returned silently, no warning), even though weights
change the variance estimates. calculate_pvc() now errors
and compare_maihda() warns; an unweighted fit and an
explicit unit-weight fit are still treated as equal.message() and stored on the model as
$response_recoding. Previously the mapping followed
alphabetical (character) or declared (factor) level order silently, with
no signal at all when family = "binomial" was passed
explicitly, so the modeled event could be inverted unnoticed. The 0/1
assignment rule is unchanged (it matches base glm).plot(type = "predicted"), "risk_vs_effect",
"obs_vs_shrunken", "effect_decomp") now honour
prior weights, collapsing per-stratum predictions with a
prior-weight-weighted mean (and weighting the reference lines by the
summed weights). This makes the plots consistent with the weighted
Gaussian VPC for survey/weighted fits; unweighted models are unaffected,
and aggregated-binomial (cbind) fits, whose
weights(type = "prior") are the trials, are left unweighted
to avoid double-counting.run_maihda_app()) no longer aborts the
whole analysis when the baseline (null) model has zero or negative
between-stratum variance (a singular / no-between-variation fit), which
makes calculate_pvc() error by design. The fitted model,
VPC/ICC, summaries, and plots are now shown as usual and the PCV is
reported as unavailable (with the underlying reason) instead of failing
the entire workflow.compare_maihda_groups() now accepts a bare family
function (e.g. family = stats::gaussian) – one of
the documented forms (“as in fit_maihda”), alongside a
family name ("gaussian") and a family object
(gaussian()). The per-group fits already handled it; only
the family metadata recorded on the result did not, so the call fit
every group and then failed with “object of type ‘closure’ is not
subsettable”. The family function is now resolved to a family object up
front, exactly as fit_maihda() does.brms fits now record MCMC convergence diagnostics
(maximum Rhat and the number of divergent transitions) alongside the
engine, surfaced in the “Fit diagnostics” block of
print()/summary() so a non-converged or
divergent Bayesian fit is no longer silent.summary() and calculate_pvc() print the number
of successful bootstrap draws and the Monte Carlo standard error so the
precision of an interval can be judged.compare_maihda_groups() to compare intersectional
inequality (VPC/ICC and between-/within-stratum variance) across levels
of a higher-level grouping variable such as country, region, or survey
wave. It fits a stratified MAIHDA model per group, by default using
shared/global strata so VPCs are directly comparable, with optional
per-group bootstrap confidence intervals.plot_group_comparison() to visualize the result
either as a VPC-by-group forest plot or as stacked variance-composition
bars.summary()) and PVC (calculate_pvc()): failed
refit() iterations were silently recorded as 0
instead of being dropped, biasing intervals toward zero and suppressing
the high-failure-rate warning. Failed iterations are now excluded
correctly.log(1 + 1/mu) (Stryhn et al. 2006); the previous
1/mu linearization biased the VPC downward for low-count
outcomes.plot_prediction_deviation_panels() no longer draws
zero-width “95% CI” bars when the underlying model does not supply
standard errors (e.g. lme4::merMod); intervals are omitted
instead of collapsed.plot_prediction_deviation_panels() function for
visualizing predicted values and identifying deviant cases.plot_risk_vs_effect() function to create a
quadrant scatterplot comparing overall marginal predicted risk against
pure intersectional effects.plot_effect_decomposition() function to visually
decompose the total deviation from the overall mean into additive and
intersectional components.plot() and the interactive dashboard.autobin
parameter) for numeric grouping variables with more than 10 unique
values in make_strata().run_maihda_app()) to include the new visualizations and a
toggle for auto-binning continuous strata variables.fit_maihda() will
now automatically detect binomial outcomes and switch to the appropriate
family."logit" links
and \(1\) for "probit"
links) internally when summarizing models or bootstrapping the variance
partition coefficient, avoiding deflated VPC/ICC metrics.stepwise_pcv() function to sequentially estimate
proportional change in variance (PCV) by adding predictors
one-by-one.run_maihda_app()) for visual data exploration, model
fitting, and performance visualization.maihda_sim_data dataset to resolve R CMD check
warnings.tests/testthat.R was modified
to correctly use test_check("MAIHDA") instead of
shinytest2.importFrom(stats, as.formula) for the
stepwise_pcv function to prevent undefined warnings.introduction.Rmd vignette: added standard CRAN
installation instructions, and improved text clarity.make_strata() function for creating
intersectional stratafit_maihda() function for fitting multilevel
models with lme4 (default) or brms enginessummary() function for variance partition and
stratum estimatespredict_maihda() function for individual and
stratum-level predictionsplot() function with three plot types:
compare_maihda() function for comparing models
with bootstrap confidence intervalsmake_strata() to properly handle missing
values (NA) in input variables: