In experiments, we control the conditions under which observations are made. Ideally, this leads to balanced datasets and clear inferences about the effects of those experimental conditions. In observational data, factor levels are observed rather than controlled, and in the analysis we control for those factors and covariates. It is possible that some factors and covariates lie in the causal path for other predictors. Observational studies can be designed in ways to mitigate some of these issues; but often we are left with a mess. Using EMMs does not solve the inherent problems in messy, undesigned studies; but they do give us ways to compensate for imbalance in the data, and allow us to estimate meaningful effects after carefully considering the ways in which they can be confounded.
####### {#nutrex}
As an illustration, consider the nutrition dataset provided with the package. 
These data are used as an example in Milliken and Johnson (1992), Analysis of 
Messy Data, and contain the results of an observational study on nutrition 
education. Low-income mothers are classified by race, age category, and whether 
or not they received food stamps (the group factor); and the response variable
is a gain score (post minus pre scores) after completing a nutrition training 
program. First, let's fit a model than includes all main effects and 2-way 
interactions, and obtain its “type II” ANOVA:
nutr.lm <- lm(gain ~ (age + group + race)^2, data = nutrition) 
car::Anova(nutr.lm)
## Note: model has aliased coefficients
##       sums of squares computed by model comparison
## Anova Table (Type II tests)
## 
## Response: gain
##             Sum Sq Df F value    Pr(>F)
## age          82.37  3  0.9614    0.4145
## group       658.13  1 23.0441 6.105e-06
## race         11.17  2  0.1956    0.8227
## age:group    91.58  3  1.0688    0.3663
## age:race     87.30  3  1.0189    0.3880
## group:race  113.70  2  1.9906    0.1424
## Residuals  2627.47 92
There is definitely a
group effect and a hint of and interaction with race. Here are the EMMs for
those two factors:
emmeans(nutr.lm, ~ group * race)
##  group      race     emmean   SE df lower.CL upper.CL
##  FoodStamps Black      4.71 2.37 92  0.00497     9.41
##  NoAid      Black     -2.19 2.49 92 -7.13690     2.76
##  FoodStamps Hispanic nonEst   NA NA       NA       NA
##  NoAid      Hispanic nonEst   NA NA       NA       NA
##  FoodStamps White      3.61 1.16 92  1.31252     5.90
##  NoAid      White      2.26 2.39 92 -2.48897     7.00
## 
## Results are averaged over the levels of: age 
## Confidence level used: 0.95
####### {#nonestex}
Hmmmm. The EMMs
when race is “Hispanic” are not given; instead they are flagged as 
non-estimable. What does that mean? Well, when using a model to make
predictions, it is impossible to do that beyond the linear space of the data
used to fit the model. And we have no data for three of the age groups in the
Hispanic population:
with(nutrition, table(race, age))
##           age
## race        1  2  3  4
##   Black     2  7 10  2
##   Hispanic  0  0  3  0
##   White     5 16 51 11
We can't make predictions for all the cases we are averaging over in the above EMMs, and that is why some of them are non-estimable. The bottom line is that we simply cannot include Hispanics in the mix when comparing factor effects. That's a limitation of this study that cannot be overcome without collecting additional data. Our choices for further analysis are to focus only on Black and White populations; or to focus only on age group 3. For example (the latter):
summary(emmeans(nutr.lm, pairwise ~ group | race, at = list(age = "3")), 
    by = NULL)
## $emmeans
##  group      race     emmean   SE df lower.CL upper.CL
##  FoodStamps Black      7.50 2.67 92     2.19   12.807
##  NoAid      Black     -3.67 2.18 92    -8.00    0.666
##  FoodStamps Hispanic   0.00 5.34 92   -10.61   10.614
##  NoAid      Hispanic   2.50 3.78 92    -5.01   10.005
##  FoodStamps White      5.42 0.96 92     3.51    7.326
##  NoAid      White     -0.20 1.19 92    -2.57    2.173
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast           race     estimate   SE df t.ratio p.value
##  FoodStamps - NoAid Black       11.17 3.45 92  3.237  0.0017 
##  FoodStamps - NoAid Hispanic    -2.50 6.55 92 -0.382  0.7034 
##  FoodStamps - NoAid White        5.62 1.53 92  3.666  0.0004
(We used trickery with providing a by variable, and then taking it
away, to make the output more compact.) Evidently, the training program has been
beneficial to the Black and White groups in that age category. There is no 
conclusion for the Hispanic group – for which we have very little data.
The framing data in the mediation
package has the results of an experiment conducted by Brader et al. (2008) where
subjects were given the opportunity to send a message to Congress regarding
immigration. However, before being offered this, some subjects (treat = 1)
were first shown a news story that portrays Latinos in a negative way. Besides
the binary response (whether or not they elected to send a message), the
experimenters also measured emo, the subjects' emotional state after the
treatment was applied. There are various demographic variables as well. Let's a
logistic regression model, after changing the labels for educ to shorter
strings. 
framing <- mediation::framing 
## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car
levels(framing$educ) <- c("NA","Ref","< HS", "HS", "> HS","Coll +") 
framing.glm <- glm(cong_mesg ~ age + income + educ + emo + gender * factor(treat), 
    family = binomial, data = framing)
The conventional way to handle covariates like emo is to set them at their
means and use those means for purposes of predictions and EMMs. These adjusted 
means are shown in the following plot. 
emmip(framing.glm, treat ~ educ | gender, type = "response") 
This plot gives the impression that the effect of treat is reversed between 
male and female subjects; and also that the effect of education is not monotone.
Both of these are counter-intuitive.
However, note that the covariate emo is measured post-treatment. That 
suggests that in fact treat (and perhaps other factors) could affect the value
of emo; and if that is true (as is in fact established by mediation analysis 
techniques), we should not pretend that emo can be set independently of 
treat as was done to obtain the EMMs shown above. Instead, let emo depend on
treat and the other predictors – easily done using cov.reduce – and we 
obtain an entirely different impression:
emmip(framing.glm, treat ~ educ | gender, type = "response", 
    cov.reduce = emo ~ treat*gender + age + educ + income)
The reference grid underlying this plot has different emo values for each 
factor combination. The plot suggests that, after taking emotional response into
account, male (but not female) subjects exposed to the negative news story are 
more likely to send the message than are females or those not seeing the 
negative news story. Also, the effect of educ is now nearly monotone.
By the way, the results in this plot are the same is what you would obtain by refitting the model with an adjusted covariate
emo.adj <- resid(lm(emo ~ treat*gender + age + educ + income, data = framing))
… and then using ordinary covariate-adjusted means at the means of emo.adj. 
This is a technique that is often recommended.
If there is more than one mediating covariate, their settings may be defined in
sequence; for example, if x1, x2, and x3 are all mediating covariates, we
might use
emmeans(..., cov.reduce = list(x1 ~ trt, x2 ~ trt + x1, x3 ~ trt + x1 + x2))
(or possibly with some interactions included as well).
A mediating covariate is one that is
in the causal path; likewise, it is possible to have a mediating factor. For
mediating factors, the moral equivalent of the cov.reduce technique described
above is to use weighted averages in lieu of equally-weighted ones in
computing EMMs. The weights used in these averages should depend on the
frequencies of mediating factor(s). Usually, the "cells" weighting scheme
described later in this section is the right approach. In complex situations, it
may be necessary to compute EMMs in stages.
As described in the “basics” vignette, EMMs are usually 
defined as equally-weighted means of reference-grid predictions. However, 
there are several built-in alternative weighting schemes that are available by 
specifying a character value for weights in a call to emmeans() or related 
function. The options are "equal" (the default), "proportional", "outer",
"cells", and "flat".
The "proportional" (or "prop" for short) method weights proportionally to
the frequencies (or model weights) of each factor combination that is averaged
over. The "outer" method uses the outer product of the marginal frequencies of
each factor that is being averaged over. To explain the distinction, suppose the
EMMs for A involve averaging over two factors B and C. With "prop", we
use the frequencies for each combination of B and C; whereas for "outer",
first obtain the marginal frequencies for B and for C and weight
proportionally to the product of these for each combination of B and C. The
latter weights are like the “expected” counts used in a chi-square test for
independence. Put another way, outer weighting is the same as proportional
weighting applied one factor at a time; the following two would yield the same
results: 
emmeans(model, "A", weights = "outer") 
emmeans(emmeans(model, c("A", "B"), weights = "prop"),  weights = "prop") 
Using "cells" weights gives each prediction the same weight as occurs in the
model; applied to a reference grid for a model with all interactions,
"cells"-weighted EMMs are the same as the ordinary marginal means of the data.
With "flat" weights, equal weights are used, except zero weight is applied to
any factor combination having no data. Usually, "cells" or "flat" weighting
will not produce non-estimable results, because we exclude empty cells. (That
said, if covariates are linearly dependent with factors, we may still encounter
non-estimable cases.)
Here is a comparison of predictions for nutr.lm defined above,
using different weighting schemes:
sapply(c("equal", "prop", "outer", "cells", "flat"), function(w)
    predict(emmeans(nutr.lm, ~ race, weights = w)))
##         equal     prop    outer     cells      flat
## [1,] 1.258929 1.926554 2.546674 0.3809524 0.6865079
## [2,]       NA       NA       NA 1.6666667 1.2500000
## [3,] 2.932008 2.522821 3.142940 2.7951807 1.6103407
For group * race EMMs, the results for "prop" and "flat" are the same
because only one factor (age) is averaged over.
A factor A is nested in another factor B if the levels of A have a
different meaning in one level of B than in another. Often, nested factors are
random effects—for example, subjects in an experiment may be randomly assigned
to treatments, in which case subjects are nested in treatments—and if we model
them as random effects, these random nested effects are not among the fixed
effects and are not an issue to emmeans. But sometimes we have fixed nested
factors.
Here is an example of a fictional study of five fictional treatments for some disease in cows. Two of the treatments are administered by injection, and the other three are administered orally. There are varying numbers of observations for each drug. The data and model follow:
cows <- data.frame (
    route = factor(rep(c("injection", "oral"), c(5, 9))),
    drug = factor(rep(c("Bovineumab", "Charloisazepam", 
              "Angustatin", "Herefordmycin", "Mollycoddle"), c(3,2,  4,2,3))),
    resp = c(34, 35, 34,   44, 43,      36, 33, 36, 32,   26, 25,   25, 24, 24)
)
cows.lm <- lm(resp ~ route + drug, data = cows)
The ref_grid function finds a nested structure in this model:
cows.rg <- ref_grid(cows.lm)
cows.rg
## 'emmGrid' object with variables:
##     route = injection, oral
##     drug = Angustatin, Bovineumab, Charloisazepam, Herefordmycin, Mollycoddle
## Nesting structure:  drug %in% route
When there is nesting, emmeans computes averages separately in each group\ldots
route.emm <- emmeans(cows.rg, "route")
route.emm
##  route     emmean    SE df lower.CL upper.CL
##  injection   38.9 0.591  9     37.6     40.3
##  oral        28.0 0.449  9     27.0     29.0
## 
## Results are averaged over the levels of: drug 
## Confidence level used: 0.95
… and insists on carrying along any grouping factors that a factor is nested in:
drug.emm <- emmeans(cows.rg, "drug")
drug.emm
##  drug           route     emmean    SE df lower.CL upper.CL
##  Bovineumab     injection   34.3 0.747  9     32.6     36.0
##  Charloisazepam injection   43.5 0.915  9     41.4     45.6
##  Angustatin     oral        34.2 0.647  9     32.8     35.7
##  Herefordmycin  oral        25.5 0.915  9     23.4     27.6
##  Mollycoddle    oral        24.3 0.747  9     22.6     26.0
## 
## Confidence level used: 0.95
Here are the associated pairwise comparisons:
pairs(route.emm, reverse = TRUE)
##  contrast         estimate    SE df t.ratio p.value
##  oral - injection    -10.9 0.742  9 -14.671 <.0001 
## 
## Results are averaged over the levels of: drug
pairs(drug.emm, by = "route", reverse = TRUE)
## route = injection:
##  contrast                    estimate    SE df t.ratio p.value
##  Charloisazepam - Bovineumab     9.17 1.182  9   7.757 <.0001 
## 
## route = oral:
##  contrast                    estimate    SE df t.ratio p.value
##  Herefordmycin - Angustatin     -8.75 1.121  9  -7.805 0.0001 
##  Mollycoddle - Angustatin       -9.92 0.989  9 -10.030 <.0001 
##  Mollycoddle - Herefordmycin    -1.17 1.182  9  -0.987 0.6026 
## 
## P value adjustment: tukey method for comparing a family of 3 estimates
In the latter result, the contrast itself becomes a nested factor in the
returned emmGrid object. That would not be the case if there had been no by
variable.
ref_grid() and emmeans() tries to discover
and accommodate nested structures in the fixed effects. It does this in two
ways: first, by identifying factors whose levels appear in combination with only
one level of another factor; and second, by examining the terms attribute of
the fixed effects. In the latter approach, if an interaction A:B appears
in the model but A is not present as a main effect, then A is deemed to
be nested in B. Note that this can create a trap: some users take shortcuts
by omitting some fixed effects, knowing that this won't affect the fitted
values. But such shortcuts do affect the interpretation of model parameters,
ANOVA tables, etc., and I advise against ever taking such shortcuts. 
Here are some ways you may notice mistakenly-identified nesting:
str() listing of the emmGrid object shows a nesting componentemmeans() summary unexpectedly includes one or more factors
that you didn't specifyby factors don't seem to behave right, or
give the same results with different specificationsTo override the auto-detection of nested effects, use the nesting argument
in ref_grid() or emmeans(). Specifying nesting = NULL will ignore
all nesting. Incorrectly-discovered nesting can be overcome by specifying 
something akin to nesting = "A %in% B, C %in% (A * B)" or, equivalently,
nesting = list(A = "B",  C = c("A", "B")).