mantar provides users with several methods for handling
missing data in the context of network analysis. The vignette is
organized as follows: it begins with details on installation, followed
by a description of the main functions and data sets provided by the
package. Next, it discusses the available functionality in more depth by
outlining the key arguments and their effects. Finally, a complete
example analysis based on a real-world data set is presented.
The current stable version (0.2.0) is available on CRAN and can be installed using the usual approach:
You can install the development version of mantar from
GitHub. To do so, you
need the remotes package.
The extension @develop ensures that you get the latest
development version of the package, which may include new features and
bug fixes not yet available in the stable release on CRAN. Exluding this
extension will install the same version as the one on CRAN.
After installation the easiest way to get an overview of functions
and capabilities is to use help(package = "mantar") to open
the package help-file. You could also read the rest of this vignette for
an introduction and some examples.
This section provides an overview of the main functions and data sets
included in the mantar package.
As described above, the package offers approaches for estimating network structures:
For data sets with missing values, two promising missing approaches are implemented:
lavaan package. It performs well when the sample size
is very large relative to the amount of missingness and the complexity
of the network.mice package. The imputed data sets are stacked into a
single data set, and a correlation matrix is estimated from this
combined data.Both methods produce a correlation matrix that is then used for network estimation. It is also possible to compute the correlation matrix using pairwise or listwise deletion. However, these methods are generally not recommended, except in specific cases (e.g., when data are missing completely at random and the proportion of missingness is very small). By default, correlations are computed using Pearson correlations. However, with complete data, listwise deletion, or the stacked MI approach, users may choose to treat variables as ordered categorical, in which case polychoric and polyserial correlations are computed where appropriate. This option is particularly advisable when variables have a low number of categories or exhibit noticeable non-normality. At the same time, estimating polychoric and polyserial correlations requires a sufficiently large number of observations relative to the number of variables to ensure stable and reliable estimates.
In addition to network estimation, the package also supports stepwise regression search based on information criteria for a single dependent variable. This regression search is available for both complete and incomplete data and relies on the same two-step EM or stacked MI procedures to handle missing values as the network analysis. While both methods to handle missingness are expected to perform well in this context, no specific simulation study has been conducted to compare their effectiveness for single regression modeling, and thus their relative strengths remain an open question.
The package includes dummy data sets that resemble a typical psychological data set, where the number of observations is considerably larger than the number of variables. Although the variables have descriptive names, these are included solely to make the examples more engaging - the data themselves are fully synthetic.
Three data sets without missing values are included: -
mantar_dummy_full_cont: Fully observed data (no missing
values) - mantar_dummy_full_cat: Fully observed data with
ordered categorical variables - mantar_dummy_full_mix:
Fully observed data with a mix of continuous and ordered categorical
variables
Additionally, three data sets with missing values are provided: -
mantar_dummy_mis_cont: Data with approximately 30% missing
values in each continuous variable - mantar_dummy_mis_cat:
Data with approximately 25% missing values in each ordered categorical
variable - mantar_dummy_mis_mix: Data with approximately
25% missing values in each variable, with a mix of continuous and
ordered categorical variables
These data sets are intended for examples and testing only.
# Load some example data sets for the ReadMe
data(mantar_dummy_full_cont)
data(mantar_dummy_full_cat)
data(mantar_dummy_mis_cont)
# Preview the first few rows of these data sets
head(mantar_dummy_full_cont)
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious
#> 1 -0.08824641 -0.2659269 -1.2036137 -2.3499259 0.6693700 0.04102854
#> 2 -0.44657803 -0.4588384 -0.2431794 -0.1656722 -0.3361568 0.88919849
#> 3 -1.06934325 -1.5050242 -0.8986388 -1.0857552 0.2249633 0.77060142
#> 4 0.58282029 -0.5036316 -1.6020000 1.0820676 -0.1858346 -0.03462852
#> 5 0.58791759 0.5972580 -0.5882332 1.7461103 0.7160714 1.58280444
#> 6 0.10224725 0.1494428 -1.0877812 -1.7886107 1.3522197 -0.25494638
#> ThoughtFuture RespCriticism
#> 1 0.6484939 -0.77992262
#> 2 0.2949630 -0.91747608
#> 3 -1.3519007 0.56000763
#> 4 -0.4702988 0.34653985
#> 5 0.9503597 0.82981174
#> 6 -0.8938618 -0.01593388
head(mantar_dummy_full_cat)
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious ThoughtFuture
#> 1 3 3 2 1 4 3 4
#> 2 3 3 3 3 3 4 4
#> 3 2 2 3 2 4 4 2
#> 4 4 3 2 5 3 3 3
#> 5 4 4 3 5 4 5 4
#> 6 4 4 2 2 5 3 3
#> RespCriticism
#> 1 3
#> 2 3
#> 3 4
#> 4 4
#> 5 4
#> 6 3
head(mantar_dummy_mis_cont)
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious
#> 1 -1.7551632 -0.4376210 -0.5774722 0.10562820 0.6614044 NA
#> 2 -1.7551688 -0.7039623 0.9070330 0.03418623 0.6140406 0.83879818
#> 3 2.0493638 NA NA NA -0.8872971 0.04830719
#> 4 0.1056282 NA NA -1.24779117 -0.7298623 -0.62263184
#> 5 -0.6338512 0.4361078 -0.5564631 -0.01032403 NA -0.09690612
#> 6 0.1054382 0.6935808 2.6557231 NA NA -0.04358574
#> ThoughtFuture RespCriticism
#> 1 0.7710993 0.37233355
#> 2 -1.5588119 -0.55079199
#> 3 NA -0.90103222
#> 4 -0.7100126 0.80773402
#> 5 1.0583312 0.20820252
#> 6 NA -0.03915726The mantar package provides two primary functions for
network estimation: neighborhood_net() and
regularization_net(). This section introduces their key
arguments and demonstrates their usage with practical examples. We begin
by estimating a network using neighborhood_net() with a
complete data set (i.e., without missing values). Next, we show how to
estimate a network when the data set contains missing data. Finally, we
provide a brief example of network estimation using regularization
techniques via the regularization_net() function.
The neighborhood_net() function estimates a network
structure based on neighborhood selection using information criteria for
model selection in node-wise regressions. The function can either be
provided with raw data (data frame or matrix) or a correlation matrix
along with sample sizes for each variable. The examples will use raw
data, as this is the more complex case. The following arguments are
particularly relevant for controlling the network estimation process
(with fully observed data):
The ic_type argument controls the penalty applied during
model selection for node-wise regressions. It defines the penalty per
parameter (i.e., the number of predictors plus the intercept), thereby
influencing the sparsity of the resulting model. The available options
are:
ic_type = "bic" (default): corresponds to the
Bayesian Information Criterion (BIC)ic_type = "aic": corresponds to the Akaike
Information Criterion (AIC)ic_type = "aicc": corresponds to the corrected
Akaike Information Criterion (AICc)The pcor_merge_rule argument determines how partial
correlations are estimated based on the regression results between two
nodes:
"and" (default): a partial correlation is estimated
only if both regression weights (from node A to B and
from B to A) are non-zero."or": a partial correlation is estimated if at
least one of the two regression weights is non-zero.Although both options are available, current simulation evidence
suggests that the "and" rule yields more accurate partial
correlation estimates than the "or" rule. Therefore,
changing this default is not recommended unless you
have a specific reason.
The ordered argument specifies how variables are treated
when estimating correlations from raw data.
ordered = TRUE: all variables are treated as ordered
categoricalordered = FALSE: all variables are treated as
continuousordered = c(TRUE, FALSE, FALSE, TRUE)).Based on these specifications, the function applies the appropriate correlation type for each pair of variables:
FALSE: Pearson correlationTRUE and one FALSE: polyserial
correlationTRUE: polychoric correlationAfter discussing the key arguments, we can now illustrate how to
estimate a network structure using the neighborhood_net()
function with a data set without missing values.
# Estimate network from full data set using BIC, the and rule as well as treating the data as continuous
result_full_cont <- neighborhood_net(data = mantar_dummy_full_cont,
ic_type = "bic",
pcor_merge_rule = "and",
ordered = FALSE)
#> No missing values in data. Sample size for each variable is equal to the number of rows in the data.
# View estimated partial correlations
result_full_cont
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious
#> EmoReactivity 0.0000000 0.2617524 0.130019 0.0000000 0.0000000 0.0000000
#> TendWorry 0.2617524 0.0000000 0.000000 0.2431947 0.0000000 0.0000000
#> StressSens 0.1300190 0.0000000 0.000000 0.0000000 0.0000000 0.0000000
#> SelfAware 0.0000000 0.2431947 0.000000 0.0000000 0.0000000 0.0000000
#> Moodiness 0.0000000 0.0000000 0.000000 0.0000000 0.0000000 0.4377322
#> Cautious 0.0000000 0.0000000 0.000000 0.0000000 0.4377322 0.0000000
#> ThoughtFuture 0.0000000 0.2595917 0.000000 0.0000000 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000 0.000000 0.0000000 0.2762595 0.2523658
#> ThoughtFuture RespCriticism
#> EmoReactivity 0.0000000 0.0000000
#> TendWorry 0.2595917 0.0000000
#> StressSens 0.0000000 0.0000000
#> SelfAware 0.0000000 0.0000000
#> Moodiness 0.0000000 0.2762595
#> Cautious 0.0000000 0.2523658
#> ThoughtFuture 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000We can also estimate a network structure when some variables are ordered categorical. In the following example, we treat all variables as ordered categorical.
# Estimate network from full data set using BIC, the and rule as well as treating the
# data as ordered categorical
result_full_cat <- neighborhood_net(data = mantar_dummy_full_cat,
ic_type = "bic",
pcor_merge_rule = "and",
ordered = TRUE)
#> No missing values in data. Sample size for each variable is equal to the number of rows in the data.
# View estimated partial correlations
result_full_cat
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious
#> EmoReactivity 0.0000000 0.2742356 0.136029 0.0000000 0.0000000 0.0000000
#> TendWorry 0.2742356 0.0000000 0.000000 0.2679285 0.0000000 0.0000000
#> StressSens 0.1360290 0.0000000 0.000000 0.0000000 0.0000000 0.0000000
#> SelfAware 0.0000000 0.2679285 0.000000 0.0000000 0.0000000 0.0000000
#> Moodiness 0.0000000 0.0000000 0.000000 0.0000000 0.0000000 0.4398609
#> Cautious 0.0000000 0.0000000 0.000000 0.0000000 0.4398609 0.0000000
#> ThoughtFuture 0.0000000 0.2224662 0.000000 0.0000000 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000 0.000000 0.0000000 0.2752566 0.2687388
#> ThoughtFuture RespCriticism
#> EmoReactivity 0.0000000 0.0000000
#> TendWorry 0.2224662 0.0000000
#> StressSens 0.0000000 0.0000000
#> SelfAware 0.0000000 0.0000000
#> Moodiness 0.0000000 0.2752566
#> Cautious 0.0000000 0.2687388
#> ThoughtFuture 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000In the case of missing data, the neighborhood_net()
function offers several additional arguments that control how sample
size and missingness are handled.
The n_calc argument specifies how the sample size is
calculated for each node-wise regression. This affects the penalty term
used in model selection.
The available options are:
"individual" (default): Uses the number of
non-missing observations for each individual variable. This is the
recommended approach."average": Uses the average number of non-missing
observations across all variables."max": Uses the maximum number of non-missing
observations across all variables."total": Uses the total number of observations in the
data set (i.e., the number of rows).The missing_handling argument specifies how the
correlation matrix is estimated when the input data contains missing
values. Two approaches are supported:
"two-step-em": Applies a standard
Expectation-Maximization (EM) algorithm to estimate the
covariance matrix. This method is the default as it is computationally
efficient. However, it only performs well when the sample size is large
relative to the amount of edges in the network and the proportion of
missingness."stacked-mi": Applies multiple
imputation to create several completed data sets, which are
then stacked into a single data set. A correlation matrix is computed
from this stacked data.As described previously, deletion techniques (listwise and pairwise)
are also available, but their use is not recommended. When
"two-step-em" is selected, the correlation matrix is always
based on Pearson correlations, regardless of the ordered
argument. In contrast, when "stacked-mi" is used, the
ordered argument determines how variables are treated
(continuous vs. ordered categorical) during the correlation
estimation.
If "stacked-mi" is used, the nimp argument
controls the number of imputations (default: 20), while
imp_method specifies the imputation method (default:
"pmm" for predictive mean matching).
We can now illustrate how to estimate a network structure using the
neighborhood_net() function with a data set that contains
missing values. All variables are continuous in this example.
# Estimate network for data set with missing values
result_mis_cont <- neighborhood_net(data = mantar_dummy_mis_cont,
n_calc = "individual",
missing_handling = "stacked-mi",
nimp = 20,
imp_method = "pmm",
pcor_merge_rule = "and")# View estimated partial correlations
result_mis_cont
#> EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious
#> EmoReactivity 0.0000000 0.0000000 0.1737479 0.0000000 0.0000000 0.0000000
#> TendWorry 0.0000000 0.0000000 0.0000000 0.2739825 0.0000000 0.1356341
#> StressSens 0.1737479 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#> SelfAware 0.0000000 0.2739825 0.0000000 0.0000000 0.0000000 0.0000000
#> Moodiness 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.4196675
#> Cautious 0.0000000 0.1356341 0.0000000 0.0000000 0.4196675 0.0000000
#> ThoughtFuture 0.1945899 0.2664316 0.0000000 0.0000000 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000 0.0000000 0.2873411 0.2640929 0.1867003
#> ThoughtFuture RespCriticism
#> EmoReactivity 0.1945899 0.0000000
#> TendWorry 0.2664316 0.0000000
#> StressSens 0.0000000 0.0000000
#> SelfAware 0.0000000 0.2873411
#> Moodiness 0.0000000 0.2640929
#> Cautious 0.0000000 0.1867003
#> ThoughtFuture 0.0000000 0.0000000
#> RespCriticism 0.0000000 0.0000000Note: Network estimation with stacked multiple imputation may take
some time. During the imputation process, messages from the
mice package may be printed.
The regularization_net() function estimates a network
structure based on regularization techniques using information criteria
for model selection. Similar to neighborhood_net(), this
function can either be provided with raw data (data frame or matrix) or
a correlation matrix along with sample sizes for each variable. The
examples will use raw data, as this is the more complex case. The
following arguments are particularly relevant for controlling the
network estimation process (with fully observed data):
The penalty argument controls the type of regularization
used in the network estimation. The recommended options are using the
graphical lasso ("glasso") as a convex penalty or
"atan" as a non-convex penalty.
For glasso, the lambda_min_ratio and
n_lambdas arguments control the range and number of penalty
parameters evaluated during model selection. The default values are
generally appropriate. For all nonconvex penalties (e.g.,
"atan"), there is the option to specify an additional
parameter via the gamma argument. However, just using one
default value (default value is different for different penalty types)
for gamma, by setting the argument
vary = "lambda" is sufficient.
The last argument controlling the regularization process is
pen_diag which specifies whether the diagonal elements of
the covariance matrix should be penalized (TRUE) or not
(FALSE, default).
The ic_type argument determines the
information-criterion penalty used during model selection in the
regularization process. It specifies the penalty applied per freely
estimated parameter (i.e., each included edge or nonzero partial
correlation), thereby controlling the sparsity of the resulting model.
The available options are:
ic_type = "bic" (default): corresponds to the
Bayesian Information Criterion (BIC)ic_type = "ebic": corresponds to the Extended
Bayesian Information Criterion (EBIC)ic_type = "aic": corresponds to the Akaike
Information Criterion (AIC)The default depends on the selected regularization approach. For
non-convex penalties, the default is "bic", whereas for the
"glasso" penalty the default is "ebic". In the
latter case, an additional parameter extended_gamma must be
specified (default: 0.5).
The ordered argument specifies how variables are treated
when estimating correlations from raw data.
ordered = TRUE: all variables are treated as ordered
categoricalordered = FALSE: all variables are treated as
continuousordered = c(TRUE, FALSE, FALSE, TRUE)).Based on these specifications, the function applies the appropriate correlation type for each pair of variables:
FALSE: Pearson correlationTRUE and one FALSE: polyserial
correlationTRUE: polychoric correlationAfter discussing the key arguments, we can now illustrate how to
estimate a network structure using the regularization_net()
function with a data set without missing values.
# Estimate network from full data set using BIC and the glasso penalty
result_full_cont <- regularization_net(data = mantar_dummy_full_cont,
penalty = "glasso",
vary = "lambda",
n_lambda = 100,
lambda_min_ratio = 0.1,
ic_type = "bic",
pcor_merge_rule = "and",
ordered = FALSE)
#> Warning in def_pen_mats(mat = mat, penalty = penalty, vary = vary, n_lambda =
#> n_lambda, : Varying 'lambda' only, n_gamma is set to 1.
# View estimated partial correlations
result_full_cont
#> EmoReactivity TendWorry StressSens SelfAware Moodiness
#> EmoReactivity 0.00000000 0.2094365 0.08045204 0.00000000 0.00000000
#> TendWorry 0.20943651 0.0000000 0.00000000 0.19091266 0.00000000
#> StressSens 0.08045204 0.0000000 0.00000000 0.00000000 0.00000000
#> SelfAware 0.00000000 0.1909127 0.00000000 0.00000000 0.00000000
#> Moodiness 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Cautious 0.07143541 0.0000000 0.00000000 0.00000000 0.39918941
#> ThoughtFuture 0.08279610 0.2160799 0.00000000 0.01126785 0.02095296
#> RespCriticism 0.01204568 0.0000000 -0.03899045 0.06692828 0.25264236
#> Cautious ThoughtFuture RespCriticism
#> EmoReactivity 0.07143541 0.08279610 0.01204568
#> TendWorry 0.00000000 0.21607993 0.00000000
#> StressSens 0.00000000 0.00000000 -0.03899045
#> SelfAware 0.00000000 0.01126785 0.06692828
#> Moodiness 0.39918941 0.02095296 0.25264236
#> Cautious 0.00000000 0.06583096 0.22841027
#> ThoughtFuture 0.06583096 0.00000000 0.00000000
#> RespCriticism 0.22841027 0.00000000 0.00000000With missing data, the regularization_net() function
offers several additional arguments that control how sample size,
information criteria computation and missingness are handled.
The n_calc argument specifies how the sample size is
calculated for the information criteria computation. Only one value is
needed here, as the regularization approach does not rely on node-wise
regressions. The default input is "average", which uses the
average number of non-missing observations for all estimated
correlations - this includes the correlations of variables with
themselves. Ignoring these correlations (i.e., using the average number
of non-missing observations across different variables only) is also
possible with setting count_diagonal to
FALSE.
Within the information criteria computation, the likelihood for the
candidate models has to be computed. The likelihood
argument controls how this is done:
"mat_based" (default): The likelihood is computed based
on the sample correlation matrix."obs_based": The likelihood is computed based on the
observed data. This option is only available when the raw input data
contains no ordered categorical variables. In these cases, the observed
data log-likelihood is recommended as it is a better representation of
the sample data than using the sample correlation matrix.These options to compute the likelihood are also available with full data. However, they return the exact same results in this case.
The missing_handling argument specifies how the
correlation matrix is estimated when the input data contains missing
values. Two approaches are supported:
"two-step-em": Applies a standard
Expectation-Maximization (EM) algorithm to estimate the
covariance matrix. This method is the default as it is computationally
efficient. However, it only performs well when the sample size is large
relative to the amount of edges in the network and the proportion of
missingness."stacked-mi": Applies multiple
imputation to create several completed data sets, which are
then stacked into a single data set. A correlation matrix is computed
from this stacked data.As described previously, deletion techniques (listwise and pairwise)
are also available, but their use is not recommended. When
"two-step-em" is selected, the correlation matrix is always
based on Pearson correlations, regardless of the ordered
argument. In contrast, when "stacked-mi" is used, the
ordered argument determines how variables are treated
(continuous vs. ordered categorical) during the correlation
estimation.
If "stacked-mi" is used, the nimp argument
controls the number of imputations (default: 20), while
imp_method specifies the imputation method (default:
"pmm" for predictive mean matching).
We can now illustrate how to estimate a network structure using the
regularization_net() function with a data set that contains
missing values. All variables are continuous in this example.
# Estimate network for data set with missing values
result_mis_cont <- regularization_net(data = mantar_dummy_mis_cont,
likelihood = "obs_based",
penalty = "glasso",
vary = "lambda",
n_lambda = 100,
lambda_min_ratio = 0.1,
ic_type = "ebic",
extended_gamma = 0.5,
n_calc = "average",
missing_handling = "two-step-em",
pcor_merge_rule = "and",
ordered = FALSE)
#> Warning in def_pen_mats(mat = mat, penalty = penalty, vary = vary, n_lambda =
#> n_lambda, : Varying 'lambda' only, n_gamma is set to 1.
# View estimated partial correlations
result_mis_cont
#> EmoReactivity TendWorry StressSens SelfAware Moodiness
#> EmoReactivity 0.00000000 0.00000000 0.0866789 0.03103689 0.0000000
#> TendWorry 0.00000000 0.00000000 0.0000000 0.19973839 0.0000000
#> StressSens 0.08667890 0.00000000 0.0000000 0.00000000 0.0000000
#> SelfAware 0.03103689 0.19973839 0.0000000 0.00000000 0.0000000
#> Moodiness 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000
#> Cautious 0.00000000 0.07321694 0.0000000 0.00000000 0.3645097
#> ThoughtFuture 0.12858049 0.18479199 0.0000000 0.03402553 0.0000000
#> RespCriticism 0.00000000 0.00000000 0.0000000 0.22080386 0.2346446
#> Cautious ThoughtFuture RespCriticism
#> EmoReactivity 0.00000000 0.12858049 0.0000000
#> TendWorry 0.07321694 0.18479199 0.0000000
#> StressSens 0.00000000 0.00000000 0.0000000
#> SelfAware 0.00000000 0.03402553 0.2208039
#> Moodiness 0.36450971 0.00000000 0.2346446
#> Cautious 0.00000000 0.00000000 0.1504089
#> ThoughtFuture 0.00000000 0.00000000 0.0000000
#> RespCriticism 0.15040891 0.00000000 0.0000000Finally, we consider a real-world example that is also described as
an application in Nehler and Schultze
(2024). This example is based on the data from the
cross-sectional study reported in Vervaet et al.
(2021). The original data is available in the OSF Project and can be temporarily
downloaded and loaded into the R environment.
url <- "https://osf.io/download/6s9p4/"
zipfile <- file.path(tempdir(), "vervaet.zip")
exdir <- file.path(tempdir(), "vervaet")
dir.create(exdir, recursive = TRUE, showWarnings = FALSE)
download.file(url, destfile = zipfile, mode = "wb")
unzip(zipfile, exdir = exdir)
load(file.path(exdir, "Supplementary materials", "Dataset.RData"))In this example, we analyze data from 2302 individuals to examine the cross-sectional network structure of 32 scores related to eating disorders (ED) and associated factors (e.g., depressive symptoms, anxiety), with the goal of identifying transdiagnostic vulnerabilities. We now perform a check for missingness in the data set.
colMeans(is.na(Data))
#> Dft Bul Bod Ine Per Dis Awa
#> 0.05473501 0.05430061 0.05473501 0.05430061 0.05516942 0.05473501 0.05473501
#> Fea Asm Imp Soc BDI Anx Res
#> 0.05603823 0.44222415 0.44092094 0.44178975 0.25282363 0.24761077 0.52780191
#> Nov Har Red Pes Sed Coa Set
#> 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584
#> Dir Aut Lim Foc Inh Mis Sta
#> 0.29756733 0.29756733 0.29930495 0.29843614 0.29843614 0.44439618 0.44483058
#> Exp Cri Qua Pref
#> 0.44439618 0.44439618 0.44396177 0.44439618Missingness proportions range from as low as 2% to as high as 53%. Overall, the average missingness rate was 22%.
The first decision concerns the choice of a suitable network estimation method. Simulation studies indicate that glasso regularization combined with EBIC model selection performs well when the ratio of the number of observations to the number of variables is relatively small (Isvoranu and Epskamp 2023; Nehler and Schultze 2025). In our case, with \(N\) = 2302 observations and \(p\) = 32 variables, this ratio is relatively large. Under such conditions, both nonconvex penalties and neighborhood selection tend to perform well.
The literature further suggests that neighborhood selection may be
advantageous when the amount of missingness differs substantially
between variables (Nehler and Schultze
2024), which is the case in this data set. This motivates our
choice to proceed with the neighborhood_net() function.
Next, we examine the measurement level of the variables, as this also influences the choice of an appropriate calculation method and determines which missing-data handling strategies are feasible. An initial overview can be obtained by inspecting the summary of the data set.
summary(Data)
#> Dft Bul Bod Ine Per
#> Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00
#> 1st Qu.:16.00 1st Qu.: 7.00 1st Qu.:18.00 1st Qu.:14.0 1st Qu.: 8.00
#> Median :24.00 Median :17.00 Median :28.00 Median :30.0 Median :16.00
#> Mean :24.52 Mean :16.86 Mean :29.22 Mean :27.8 Mean :15.56
#> 3rd Qu.:36.00 3rd Qu.:27.00 3rd Qu.:41.00 3rd Qu.:41.0 3rd Qu.:23.00
#> Max. :42.00 Max. :42.00 Max. :54.00 Max. :60.0 Max. :36.00
#> NA's :126 NA's :125 NA's :126 NA's :125 NA's :127
#> Dis Awa Fea Asm Imp
#> Min. : 0.00 Min. : 1.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
#> 1st Qu.: 7.00 1st Qu.:11.0 1st Qu.: 7.00 1st Qu.:22.00 1st Qu.:26.00
#> Median :19.00 Median :28.0 Median :20.00 Median :27.00 Median :31.00
#> Mean :17.34 Mean :25.8 Mean :18.96 Mean :26.93 Mean :31.25
#> 3rd Qu.:27.00 3rd Qu.:39.0 3rd Qu.:29.00 3rd Qu.:32.00 3rd Qu.:36.00
#> Max. :42.00 Max. :58.0 Max. :48.00 Max. :47.00 Max. :63.00
#> NA's :126 NA's :126 NA's :129 NA's :1018 NA's :1015
#> Soc BDI Anx Res
#> Min. : 0.00 Min. : 0.00 Min. : 4.00 Min. : 8.00
#> 1st Qu.:24.00 1st Qu.:18.00 1st Qu.:52.00 1st Qu.: 56.00
#> Median :29.00 Median :28.00 Median :60.00 Median : 65.00
#> Mean :28.51 Mean :27.97 Mean :58.79 Mean : 64.35
#> 3rd Qu.:34.00 3rd Qu.:38.00 3rd Qu.:68.00 3rd Qu.: 72.00
#> Max. :45.00 Max. :60.00 Max. :80.00 Max. :100.00
#> NA's :1017 NA's :582 NA's :570 NA's :1215
#> Nov Har Red Pes
#> Min. : 1.00 Min. : 2.00 Min. : 1.00 Min. :0.000
#> 1st Qu.:13.00 1st Qu.:19.00 1st Qu.:14.00 1st Qu.:4.000
#> Median :18.00 Median :25.00 Median :16.00 Median :6.000
#> Mean :17.98 Mean :23.87 Mean :16.29 Mean :5.345
#> 3rd Qu.:23.00 3rd Qu.:30.00 3rd Qu.:19.00 3rd Qu.:7.000
#> Max. :88.00 Max. :35.00 Max. :24.00 Max. :8.000
#> NA's :49 NA's :49 NA's :49 NA's :49
#> Sed Coa Set Dir Aut
#> Min. : 1.00 Min. : 2.00 Min. : 0.0 Min. : 69.0 Min. : 17.0
#> 1st Qu.:16.00 1st Qu.:28.00 1st Qu.: 6.0 1st Qu.:133.0 1st Qu.:110.0
#> Median :22.00 Median :33.00 Median :10.0 Median :181.0 Median :145.0
#> Mean :22.27 Mean :31.81 Mean :10.6 Mean :185.9 Mean :151.5
#> 3rd Qu.:28.00 3rd Qu.:37.00 3rd Qu.:14.0 3rd Qu.:234.0 3rd Qu.:188.0
#> Max. :44.00 Max. :42.00 Max. :31.0 Max. :377.0 Max. :330.0
#> NA's :49 NA's :49 NA's :49 NA's :685 NA's :685
#> Lim Foc Inh Mis
#> Min. : 0 Min. : 0.00 Min. : 0.0 Min. : 2.00
#> 1st Qu.: 56 1st Qu.: 69.00 1st Qu.: 66.0 1st Qu.: 22.00
#> Median : 69 Median : 88.00 Median : 85.0 Median : 29.00
#> Mean : 71 Mean : 89.36 Mean : 84.4 Mean : 28.36
#> 3rd Qu.: 86 3rd Qu.:108.00 3rd Qu.:103.0 3rd Qu.: 35.00
#> Max. :152 Max. :770.00 Max. :150.0 Max. :222.00
#> NA's :689 NA's :687 NA's :687 NA's :1023
#> Sta Exp Cri Qua
#> Min. : 4.00 Min. : 3.00 Min. : 2.000 Min. : 3.00
#> 1st Qu.:20.00 1st Qu.: 7.00 1st Qu.: 5.000 1st Qu.:11.00
#> Median :24.00 Median :10.00 Median : 8.000 Median :13.00
#> Mean :23.75 Mean :10.99 Mean : 8.955 Mean :13.27
#> 3rd Qu.:28.00 3rd Qu.:15.00 3rd Qu.:12.000 3rd Qu.:16.00
#> Max. :35.00 Max. :25.00 Max. :20.000 Max. :20.00
#> NA's :1024 NA's :1023 NA's :1023 NA's :1022
#> Pref
#> Min. : 7.00
#> 1st Qu.:19.00
#> Median :23.00
#> Mean :22.07
#> 3rd Qu.:26.00
#> Max. :30.00
#> NA's :1023The variables are ordered categorical, but the number of categories is sufficiently large to treat them as continuous, as demonstrated by Johal and Rhemtulla (2023). Treating the variables as continuous gives us full flexibility in choosing among the available missing-data handling methods.
Nehler and Schultze (2024) showed that when the amount of available information relative to the number of parameters to be estimated (i.e., edges) is low, the stacked multiple imputation approach tends to perform better, while in other situations it performs similarly to the two-step EM algorithm. In our case, although the number of nodes is relatively high, we also have a large number of observations and only a moderate to small amount of missingness overall. Therefore, both methods are feasible, but we opt for the two-step EM approach due to its substantially lower computational demand.
Estimation thus proceeds as follows:
final_result <- neighborhood_net(data = Data,
n_calc = "individual",
missing_handling = "two-step-em",
pcor_merge_rule = "and",
ordered = FALSE)The estimated partial correlation matrix can be accessed.
final_result$pcor
#> Dft Bul Bod Ine Per Dis
#> Dft 0.00000000 0.13068027 0.45563143 0.00000000 0.00000000 0.07300335
#> Bul 0.13068027 0.00000000 0.12558958 0.00000000 0.08011546 0.11043318
#> Bod 0.45563143 0.12558958 0.00000000 0.23022300 0.09698480 0.00000000
#> Ine 0.00000000 0.00000000 0.23022300 0.00000000 0.11802965 0.16908257
#> Per 0.00000000 0.08011546 0.09698480 0.11802965 0.00000000 0.09005559
#> Dis 0.07300335 0.11043318 0.00000000 0.16908257 0.09005559 0.00000000
#> Awa 0.12477483 0.28481105 0.00000000 0.22909674 0.08493158 0.28554073
#> Fea 0.11757230 0.00000000 -0.08789193 0.12255710 0.07037740 0.19875108
#> Asm 0.21048279 -0.11489155 0.11759054 0.00000000 0.19982194 -0.20616671
#> Imp 0.00000000 0.00000000 -0.08332489 0.10468153 0.00000000 0.00000000
#> Soc -0.15520417 0.00000000 0.00000000 0.26198392 0.00000000 0.32930950
#> BDI 0.00000000 0.00000000 0.00000000 0.00000000 -0.08415238 0.00000000
#> Anx 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Res 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 -0.07881709
#> Nov 0.00000000 0.00000000 0.07743995 0.00000000 0.00000000 0.00000000
#> Har 0.00000000 0.00000000 0.00000000 0.00000000 -0.05974071 0.00000000
#> Red 0.00000000 0.07960554 0.00000000 0.00000000 0.00000000 -0.24272710
#> Pes 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Sed 0.00000000 -0.07549202 0.00000000 0.00000000 0.00000000 0.12724328
#> Coa 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Set 0.00000000 -0.06385982 0.00000000 0.00000000 0.00000000 0.00000000
#> Dir 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Aut 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Lim -0.07919214 0.15629045 0.00000000 0.00000000 0.00000000 -0.10471845
#> Foc 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Inh 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Mis 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Sta 0.08379928 0.00000000 0.00000000 0.00000000 0.15138007 0.00000000
#> Exp 0.00000000 0.00000000 0.00000000 0.00000000 0.12996724 0.00000000
#> Cri 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Qua 0.00000000 0.00000000 0.00000000 0.06632995 0.00000000 0.00000000
#> Pref 0.00000000 -0.07592853 0.00000000 0.00000000 0.00000000 0.00000000
#> Awa Fea Asm Imp Soc BDI
#> Dft 0.12477483 0.11757230 0.2104828 0.00000000 -0.15520417 0.00000000
#> Bul 0.28481105 0.00000000 -0.1148916 0.00000000 0.00000000 0.00000000
#> Bod 0.00000000 -0.08789193 0.1175905 -0.08332489 0.00000000 0.00000000
#> Ine 0.22909674 0.12255710 0.0000000 0.10468153 0.26198392 0.00000000
#> Per 0.08493158 0.07037740 0.1998219 0.00000000 0.00000000 -0.08415238
#> Dis 0.28554073 0.19875108 -0.2061667 0.00000000 0.32930950 0.00000000
#> Awa 0.00000000 0.17938994 0.0000000 0.19209951 0.00000000 0.00000000
#> Fea 0.17938994 0.00000000 0.0000000 0.08247848 0.00000000 0.00000000
#> Asm 0.00000000 0.00000000 0.0000000 0.34437512 0.36568271 0.00000000
#> Imp 0.19209951 0.08247848 0.3443751 0.00000000 0.00000000 0.00000000
#> Soc 0.00000000 0.00000000 0.3656827 0.00000000 0.00000000 0.00000000
#> BDI 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Anx 0.00000000 0.00000000 0.0000000 0.05856714 0.00000000 0.42327065
#> Res 0.00000000 -0.06814741 0.0000000 0.00000000 -0.13070557 -0.18787960
#> Nov 0.00000000 0.00000000 0.0000000 0.10686486 0.00000000 0.00000000
#> Har 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Red 0.00000000 0.06918806 0.0000000 0.00000000 -0.07993343 0.00000000
#> Pes 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Sed 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Coa 0.07198617 0.00000000 0.0000000 -0.11343873 0.00000000 0.00000000
#> Set 0.00000000 0.00000000 0.0000000 0.08836548 0.00000000 0.00000000
#> Dir 0.00000000 -0.10993735 0.0000000 0.00000000 0.00000000 0.12538155
#> Aut 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.10389335
#> Lim 0.00000000 0.00000000 0.0000000 0.10342733 0.00000000 0.00000000
#> Foc 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Inh 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Mis 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Sta 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Exp 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Cri 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.07484521
#> Qua 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Pref 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000 0.00000000
#> Anx Res Nov Har Red Pes
#> Dft 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Bul 0.00000000 0.00000000 0.00000000 0.00000000 0.07960554 0.00000000
#> Bod 0.00000000 0.00000000 0.07743995 0.00000000 0.00000000 0.00000000
#> Ine 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Per 0.00000000 0.00000000 0.00000000 -0.05974071 0.00000000 0.00000000
#> Dis 0.00000000 -0.07881709 0.00000000 0.00000000 -0.24272710 0.00000000
#> Awa 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Fea 0.00000000 -0.06814741 0.00000000 0.00000000 0.06918806 0.00000000
#> Asm 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Imp 0.05856714 0.00000000 0.10686486 0.00000000 0.00000000 0.00000000
#> Soc 0.00000000 -0.13070557 0.00000000 0.00000000 -0.07993343 0.00000000
#> BDI 0.42327065 -0.18787960 0.00000000 0.00000000 0.00000000 0.00000000
#> Anx 0.00000000 -0.18732742 0.00000000 0.19069760 0.00000000 0.09573182
#> Res -0.18732742 0.00000000 0.00000000 -0.17706062 -0.08412776 0.00000000
#> Nov 0.00000000 0.00000000 0.00000000 -0.30581939 0.11275207 -0.16187842
#> Har 0.19069760 -0.17706062 -0.30581939 0.00000000 0.08514112 -0.07669957
#> Red 0.00000000 -0.08412776 0.11275207 0.08514112 0.00000000 0.00000000
#> Pes 0.09573182 0.00000000 -0.16187842 -0.07669957 0.00000000 0.00000000
#> Sed -0.13178762 0.25696100 -0.14454895 -0.18976565 0.00000000 0.09485402
#> Coa 0.00000000 0.00000000 0.00000000 0.00000000 0.36699126 0.00000000
#> Set 0.00000000 0.18370973 0.10299532 -0.08998313 0.11135028 0.13041938
#> Dir 0.00000000 0.00000000 0.00000000 0.00000000 -0.13362608 0.00000000
#> Aut 0.00000000 0.00000000 0.00000000 0.18458193 0.00000000 -0.11323679
#> Lim 0.00000000 0.00000000 0.22961496 0.00000000 0.00000000 -0.23866316
#> Foc 0.00000000 0.00000000 -0.11516138 0.00000000 0.18458310 -0.09529881
#> Inh 0.13015338 0.00000000 -0.08720893 0.00000000 -0.12771036 0.43824666
#> Mis 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Sta 0.00000000 0.09983737 0.00000000 0.00000000 0.00000000 0.16074045
#> Exp 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Cri 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000
#> Qua 0.00000000 0.00000000 0.00000000 0.00000000 0.06389421 0.00000000
#> Pref 0.00000000 0.00000000 -0.18414631 0.00000000 0.00000000 0.00000000
#> Sed Coa Set Dir Aut Lim
#> Dft 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 -0.07919214
#> Bul -0.07549202 0.00000000 -0.06385982 0.0000000 0.00000000 0.15629045
#> Bod 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Ine 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Per 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Dis 0.12724328 0.00000000 0.00000000 0.0000000 0.00000000 -0.10471845
#> Awa 0.00000000 0.07198617 0.00000000 0.0000000 0.00000000 0.00000000
#> Fea 0.00000000 0.00000000 0.00000000 -0.1099374 0.00000000 0.00000000
#> Asm 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Imp 0.00000000 -0.11343873 0.08836548 0.0000000 0.00000000 0.10342733
#> Soc 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> BDI 0.00000000 0.00000000 0.00000000 0.1253816 0.10389335 0.00000000
#> Anx -0.13178762 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Res 0.25696100 0.00000000 0.18370973 0.0000000 0.00000000 0.00000000
#> Nov -0.14454895 0.00000000 0.10299532 0.0000000 0.00000000 0.22961496
#> Har -0.18976565 0.00000000 -0.08998313 0.0000000 0.18458193 0.00000000
#> Red 0.00000000 0.36699126 0.11135028 -0.1336261 0.00000000 0.00000000
#> Pes 0.09485402 0.00000000 0.13041938 0.0000000 -0.11323679 -0.23866316
#> Sed 0.00000000 0.17672331 -0.08861919 -0.1205797 -0.07968325 0.00000000
#> Coa 0.17672331 0.00000000 0.00000000 -0.1504409 0.09906808 -0.25251327
#> Set -0.08861919 0.00000000 0.00000000 0.0677524 0.09519513 0.00000000
#> Dir -0.12057968 -0.15044087 0.06775240 0.0000000 0.41439065 0.00000000
#> Aut -0.07968325 0.09906808 0.09519513 0.4143906 0.00000000 0.15192789
#> Lim 0.00000000 -0.25251327 0.00000000 0.0000000 0.15192789 0.00000000
#> Foc 0.00000000 0.27304492 0.09614857 0.2318596 0.09625451 0.00000000
#> Inh 0.00000000 0.00000000 0.00000000 0.0000000 0.25461910 0.36629158
#> Mis 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Sta 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Exp 0.00000000 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Cri 0.00000000 0.00000000 0.00000000 0.1146817 0.00000000 0.00000000
#> Qua -0.09992706 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Pref 0.11323746 0.00000000 0.00000000 0.0000000 0.00000000 0.00000000
#> Foc Inh Mis Sta Exp Cri
#> Dft 0.00000000 0.00000000 0.0000000 0.08379928 0.0000000 0.00000000
#> Bul 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Bod 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Ine 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Per 0.00000000 0.00000000 0.0000000 0.15138007 0.1299672 0.00000000
#> Dis 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Awa 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Fea 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Asm 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Imp 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Soc 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> BDI 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.07484521
#> Anx 0.00000000 0.13015338 0.0000000 0.00000000 0.0000000 0.00000000
#> Res 0.00000000 0.00000000 0.0000000 0.09983737 0.0000000 0.00000000
#> Nov -0.11516138 -0.08720893 0.0000000 0.00000000 0.0000000 0.00000000
#> Har 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Red 0.18458310 -0.12771036 0.0000000 0.00000000 0.0000000 0.00000000
#> Pes -0.09529881 0.43824666 0.0000000 0.16074045 0.0000000 0.00000000
#> Sed 0.00000000 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Coa 0.27304492 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Set 0.09614857 0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Dir 0.23185962 0.00000000 0.0000000 0.00000000 0.0000000 0.11468170
#> Aut 0.09625451 0.25461910 0.0000000 0.00000000 0.0000000 0.00000000
#> Lim 0.00000000 0.36629158 0.0000000 0.00000000 0.0000000 0.00000000
#> Foc 0.00000000 0.24599931 0.0000000 0.00000000 0.0000000 0.00000000
#> Inh 0.24599931 0.00000000 0.0000000 0.11984313 0.0000000 0.00000000
#> Mis 0.00000000 0.00000000 0.0000000 0.37057477 0.0000000 0.15359247
#> Sta 0.00000000 0.11984313 0.3705748 0.00000000 0.1373552 0.00000000
#> Exp 0.00000000 0.00000000 0.0000000 0.13735518 0.0000000 0.64551915
#> Cri 0.00000000 0.00000000 0.1535925 0.00000000 0.6455192 0.00000000
#> Qua 0.00000000 0.00000000 0.2050507 0.21005917 0.0000000 0.00000000
#> Pref 0.00000000 0.07492378 0.0000000 0.31996271 0.0000000 0.00000000
#> Qua Pref
#> Dft 0.00000000 0.00000000
#> Bul 0.00000000 -0.07592853
#> Bod 0.00000000 0.00000000
#> Ine 0.06632995 0.00000000
#> Per 0.00000000 0.00000000
#> Dis 0.00000000 0.00000000
#> Awa 0.00000000 0.00000000
#> Fea 0.00000000 0.00000000
#> Asm 0.00000000 0.00000000
#> Imp 0.00000000 0.00000000
#> Soc 0.00000000 0.00000000
#> BDI 0.00000000 0.00000000
#> Anx 0.00000000 0.00000000
#> Res 0.00000000 0.00000000
#> Nov 0.00000000 -0.18414631
#> Har 0.00000000 0.00000000
#> Red 0.06389421 0.00000000
#> Pes 0.00000000 0.00000000
#> Sed -0.09992706 0.11323746
#> Coa 0.00000000 0.00000000
#> Set 0.00000000 0.00000000
#> Dir 0.00000000 0.00000000
#> Aut 0.00000000 0.00000000
#> Lim 0.00000000 0.00000000
#> Foc 0.00000000 0.00000000
#> Inh 0.00000000 0.07492378
#> Mis 0.20505073 0.00000000
#> Sta 0.21005917 0.31996271
#> Exp 0.00000000 0.00000000
#> Cri 0.00000000 0.00000000
#> Qua 0.00000000 0.13692672
#> Pref 0.13692672 0.00000000This partial correlation matrix can be used for reporting purposes, but it can also serve as input for further analyses in other packages (e.g., centrality analysis, community detection).
Beyond that, two methods are available for inspecting the results of
networks estimated with the mantar package; regardless of
the estimation method used or whether missing values were present
(although the output structure may differ slightly). The first option is
to obtain a summary of the results.
summary(final_result)
#> The density of the estimated network is 0.276
#>
#> Network was estimated using neighborhood selection on data with missing values.
#> Missing data were handled using 'two-step-em'.
#> The information criterion was BIC and the 'and' rule was used for edge inclusion.
#>
#> The sample sizes used for the nodewise regressions were as follows:
#> Dft Bul Bod Ine Per Dis Awa Fea Asm Imp Soc BDI Anx Res Nov Har
#> 2176 2177 2176 2177 2175 2176 2176 2173 1284 1287 1285 1720 1732 1087 2253 2253
#> Red Pes Sed Coa Set Dir Aut Lim Foc Inh Mis Sta Exp Cri Qua Pref
#> 2253 2253 2253 2253 2253 1617 1617 1613 1615 1615 1279 1278 1279 1279 1280 1279This output mainly provides information about the estimation process,much of which reflects the arguments we specified earlier, but it also includes two particularly informative elements. First, it reports the density of the estimated network (i.e., the proportion of non-zero edges). Second, it provides the effective sample sizes used for each node-wise regression.
Most of the time, the goal is not to present the partial correlation
matrix itself, but rather to visualize the resulting network structure.
This can be achieved by creating a network plot, which in
mantar builds on the functionality of the
qgraph package. The plot
can be customized using the various options provided by
qgraph. A common customization is to color nodes according
to predefined clusters and to display full variable names in a legend.
The names and grouping structure used here follow the original analysis code of the study reported in
Vervaet et al. (2021).
Groups <- c(rep("EDI-II", 11), rep("BDI", 1), rep("STAI", 1), rep("RS-NL", 1),
rep("TCI", 7), rep("YSQ", 5), rep("FMPS", 6))
# Create names for legend
Names <- c("Drive for Thinness", "Bulimia","Body Dissatisfaction", "Ineffectiveness", "Perfectionism",
"Interpersonal Distrust ", "Interoceptive Awareness ", "Maturity Fears", "Asceticism",
"Impulse Regulation","Social Insecurity", "Depression", "Anxiety", "Resilience", "Novelty Seeking",
"Harm Avoidance", "Reward Dependence", "Persistence", "Self-Directedness", "Cooperativeness",
"Self- Transcendence", "Disconnection and Rejection", "Impaired Autonomy & Performance",
"Impaired Limits", "Other-directness", "Overvigilance & Inhibition", "Concern over Mistakes",
"Personal Standards", "Parental Expectations", "Parental Criticism", "Doubting of Actions", "Order and Organisation")
Lab_Colors <- c(rep('white', 11),
rep('white', 1),
rep('black', 1),
rep('white', 1),
rep('black', 7),
rep('black', 5),
rep('white', 6))
plot(final_result,
layout = 'spring',
nodeNames = Names,
groups = Groups,
label.color = Lab_Colors,
vsize = 5,
legend.cex = 0.15,
label.cex = 1.25,
negCol = "#7A0403FF",
posCol = "#00204DFF")This example demonstrated how to estimate a psychological network
structure using the mantar package while appropriately
handling missing data, and outlined the key considerations involved in
choosing between the available estimation options. It also showed how
the resulting network can be further analyzed and visualized using the
methods provided in the package.