mantar

mantar provides users with several methods for handling missing data in the context of network analysis. The vignette is organized as follows: it begins with details on installation, followed by a description of the main functions and data sets provided by the package. Next, it discusses the available functionality in more depth by outlining the key arguments and their effects. Finally, a complete example analysis based on a real-world data set is presented.

Installation

The current stable version (0.2.0) is available on CRAN and can be installed using the usual approach:

install.packages("mantar")

You can install the development version of mantar from GitHub. To do so, you need the remotes package.

# install.packages("remotes")
remotes::install_github("kai-nehler/mantar@develop")

The extension @develop ensures that you get the latest development version of the package, which may include new features and bug fixes not yet available in the stable release on CRAN. Exluding this extension will install the same version as the one on CRAN.

library(mantar)

After installation the easiest way to get an overview of functions and capabilities is to use help(package = "mantar") to open the package help-file. You could also read the rest of this vignette for an introduction and some examples.

Features

This section provides an overview of the main functions and data sets included in the mantar package.

Functionalities

As described above, the package offers approaches for estimating network structures:

Neighborhood selection: This approach is based on node-wise regressions with information criteria for model selection. Each node is used once as a dependent variable, while all other nodes serve as potential predictors. For each node, the best set of predictors is selected using an information criterion. The resulting regression weights are then combined to obtain partial correlations between nodes.
Regularization: This approach relies on penalized maximum likelihood estimation of the (inverse) covariance matrix. Through appropriate penalty parameters, some entries of the estimated inverse covariance matrix are shrunk to zero, which in turn yields zeros in the partial correlation matrix and therefore a sparse network structure. Multiple values of the penalty parameter are evaluated, and information criteria are used to select the optimal penalization.

For data sets with missing values, two promising missing approaches are implemented:

Two-step Expectation-Maximization (EM): A fast method that estimates the correlation matrix via an EM algorithm using the lavaan package. It performs well when the sample size is very large relative to the amount of missingness and the complexity of the network.
Stacked Multiple Imputation (MI): A more robust approach across a wider range of sample sizes. Multiple imputation is performed using predictive mean matching (PMM) with the mice package. The imputed data sets are stacked into a single data set, and a correlation matrix is estimated from this combined data.

Both methods produce a correlation matrix that is then used for network estimation. It is also possible to compute the correlation matrix using pairwise or listwise deletion. However, these methods are generally not recommended, except in specific cases (e.g., when data are missing completely at random and the proportion of missingness is very small). By default, correlations are computed using Pearson correlations. However, with complete data, listwise deletion, or the stacked MI approach, users may choose to treat variables as ordered categorical, in which case polychoric and polyserial correlations are computed where appropriate. This option is particularly advisable when variables have a low number of categories or exhibit noticeable non-normality. At the same time, estimating polychoric and polyserial correlations requires a sufficiently large number of observations relative to the number of variables to ensure stable and reliable estimates.

In addition to network estimation, the package also supports stepwise regression search based on information criteria for a single dependent variable. This regression search is available for both complete and incomplete data and relies on the same two-step EM or stacked MI procedures to handle missing values as the network analysis. While both methods to handle missingness are expected to perform well in this context, no specific simulation study has been conducted to compare their effectiveness for single regression modeling, and thus their relative strengths remain an open question.

Data Sets

The package includes dummy data sets that resemble a typical psychological data set, where the number of observations is considerably larger than the number of variables. Although the variables have descriptive names, these are included solely to make the examples more engaging - the data themselves are fully synthetic.

Three data sets without missing values are included: - mantar_dummy_full_cont: Fully observed data (no missing values) - mantar_dummy_full_cat: Fully observed data with ordered categorical variables - mantar_dummy_full_mix: Fully observed data with a mix of continuous and ordered categorical variables

Additionally, three data sets with missing values are provided: - mantar_dummy_mis_cont: Data with approximately 30% missing values in each continuous variable - mantar_dummy_mis_cat: Data with approximately 25% missing values in each ordered categorical variable - mantar_dummy_mis_mix: Data with approximately 25% missing values in each variable, with a mix of continuous and ordered categorical variables

These data sets are intended for examples and testing only.

# Load some example data sets for the ReadMe
data(mantar_dummy_full_cont)
data(mantar_dummy_full_cat)
data(mantar_dummy_mis_cont)

# Preview the first few rows of these data sets
head(mantar_dummy_full_cont)
#>   EmoReactivity  TendWorry StressSens  SelfAware  Moodiness    Cautious
#> 1   -0.08824641 -0.2659269 -1.2036137 -2.3499259  0.6693700  0.04102854
#> 2   -0.44657803 -0.4588384 -0.2431794 -0.1656722 -0.3361568  0.88919849
#> 3   -1.06934325 -1.5050242 -0.8986388 -1.0857552  0.2249633  0.77060142
#> 4    0.58282029 -0.5036316 -1.6020000  1.0820676 -0.1858346 -0.03462852
#> 5    0.58791759  0.5972580 -0.5882332  1.7461103  0.7160714  1.58280444
#> 6    0.10224725  0.1494428 -1.0877812 -1.7886107  1.3522197 -0.25494638
#>   ThoughtFuture RespCriticism
#> 1     0.6484939   -0.77992262
#> 2     0.2949630   -0.91747608
#> 3    -1.3519007    0.56000763
#> 4    -0.4702988    0.34653985
#> 5     0.9503597    0.82981174
#> 6    -0.8938618   -0.01593388
head(mantar_dummy_full_cat)
#>   EmoReactivity TendWorry StressSens SelfAware Moodiness Cautious ThoughtFuture
#> 1             3         3          2         1         4        3             4
#> 2             3         3          3         3         3        4             4
#> 3             2         2          3         2         4        4             2
#> 4             4         3          2         5         3        3             3
#> 5             4         4          3         5         4        5             4
#> 6             4         4          2         2         5        3             3
#>   RespCriticism
#> 1             3
#> 2             3
#> 3             4
#> 4             4
#> 5             4
#> 6             3
head(mantar_dummy_mis_cont)
#>   EmoReactivity  TendWorry StressSens   SelfAware  Moodiness    Cautious
#> 1    -1.7551632 -0.4376210 -0.5774722  0.10562820  0.6614044          NA
#> 2    -1.7551688 -0.7039623  0.9070330  0.03418623  0.6140406  0.83879818
#> 3     2.0493638         NA         NA          NA -0.8872971  0.04830719
#> 4     0.1056282         NA         NA -1.24779117 -0.7298623 -0.62263184
#> 5    -0.6338512  0.4361078 -0.5564631 -0.01032403         NA -0.09690612
#> 6     0.1054382  0.6935808  2.6557231          NA         NA -0.04358574
#>   ThoughtFuture RespCriticism
#> 1     0.7710993    0.37233355
#> 2    -1.5588119   -0.55079199
#> 3            NA   -0.90103222
#> 4    -0.7100126    0.80773402
#> 5     1.0583312    0.20820252
#> 6            NA   -0.03915726

Function Usage

The mantar package provides two primary functions for network estimation: neighborhood_net() and regularization_net(). This section introduces their key arguments and demonstrates their usage with practical examples. We begin by estimating a network using neighborhood_net() with a complete data set (i.e., without missing values). Next, we show how to estimate a network when the data set contains missing data. Finally, we provide a brief example of network estimation using regularization techniques via the regularization_net() function.

Network Estimation via Neighborhood Selection

The neighborhood_net() function estimates a network structure based on neighborhood selection using information criteria for model selection in node-wise regressions. The function can either be provided with raw data (data frame or matrix) or a correlation matrix along with sample sizes for each variable. The examples will use raw data, as this is the more complex case. The following arguments are particularly relevant for controlling the network estimation process (with fully observed data):

Information Criteria

The ic_type argument controls the penalty applied during model selection for node-wise regressions. It defines the penalty per parameter (i.e., the number of predictors plus the intercept), thereby influencing the sparsity of the resulting model. The available options are:

ic_type = "bic" (default): corresponds to the Bayesian Information Criterion (BIC)
ic_type = "aic": corresponds to the Akaike Information Criterion (AIC)
ic_type = "aicc": corresponds to the corrected Akaike Information Criterion (AICc)

Estimation of Partial Correlation

The pcor_merge_rule argument determines how partial correlations are estimated based on the regression results between two nodes:

"and" (default): a partial correlation is estimated only if both regression weights (from node A to B and from B to A) are non-zero.
"or": a partial correlation is estimated if at least one of the two regression weights is non-zero.

Although both options are available, current simulation evidence suggests that the "and" rule yields more accurate partial correlation estimates than the "or" rule. Therefore, changing this default is not recommended unless you have a specific reason.

Type of Correlation

The ordered argument specifies how variables are treated when estimating correlations from raw data.

Global specification:
- ordered = TRUE: all variables are treated as ordered categorical
- ordered = FALSE: all variables are treated as continuous
Variable-specific specification:
- A logical vector of length equal to the number of variables can be supplied to indicate which variables are treated as ordered categorical (e.g., ordered = c(TRUE, FALSE, FALSE, TRUE)).
- If a global specification is used, the function automatically creates a vector of this value with the same length as the number of variables.

Based on these specifications, the function applies the appropriate correlation type for each pair of variables:

both FALSE: Pearson correlation
one TRUE and one FALSE: polyserial correlation
both TRUE: polychoric correlation

Estimation with Continuous Complete Data

After discussing the key arguments, we can now illustrate how to estimate a network structure using the neighborhood_net() function with a data set without missing values.

# Estimate network from full data set using BIC, the and rule as well as treating the data as continuous
result_full_cont <- neighborhood_net(data = mantar_dummy_full_cont, 
                                     ic_type = "bic", 
                                     pcor_merge_rule = "and",
                                     ordered = FALSE)
#> No missing values in data. Sample size for each variable is equal to the number of rows in the data.

# View estimated partial correlations
result_full_cont
#>               EmoReactivity TendWorry StressSens SelfAware Moodiness  Cautious
#> EmoReactivity     0.0000000 0.2617524   0.130019 0.0000000 0.0000000 0.0000000
#> TendWorry         0.2617524 0.0000000   0.000000 0.2431947 0.0000000 0.0000000
#> StressSens        0.1300190 0.0000000   0.000000 0.0000000 0.0000000 0.0000000
#> SelfAware         0.0000000 0.2431947   0.000000 0.0000000 0.0000000 0.0000000
#> Moodiness         0.0000000 0.0000000   0.000000 0.0000000 0.0000000 0.4377322
#> Cautious          0.0000000 0.0000000   0.000000 0.0000000 0.4377322 0.0000000
#> ThoughtFuture     0.0000000 0.2595917   0.000000 0.0000000 0.0000000 0.0000000
#> RespCriticism     0.0000000 0.0000000   0.000000 0.0000000 0.2762595 0.2523658
#>               ThoughtFuture RespCriticism
#> EmoReactivity     0.0000000     0.0000000
#> TendWorry         0.2595917     0.0000000
#> StressSens        0.0000000     0.0000000
#> SelfAware         0.0000000     0.0000000
#> Moodiness         0.0000000     0.2762595
#> Cautious          0.0000000     0.2523658
#> ThoughtFuture     0.0000000     0.0000000
#> RespCriticism     0.0000000     0.0000000

Estimation with Ordered Categorical Complete Data

We can also estimate a network structure when some variables are ordered categorical. In the following example, we treat all variables as ordered categorical.

# Estimate network from full data set using BIC, the and rule as well as treating the
# data as ordered categorical
result_full_cat <- neighborhood_net(data = mantar_dummy_full_cat,
                                    ic_type = "bic",  
                                    pcor_merge_rule = "and",
                                    ordered = TRUE)
#> No missing values in data. Sample size for each variable is equal to the number of rows in the data.

# View estimated partial correlations
result_full_cat
#>               EmoReactivity TendWorry StressSens SelfAware Moodiness  Cautious
#> EmoReactivity     0.0000000 0.2742356   0.136029 0.0000000 0.0000000 0.0000000
#> TendWorry         0.2742356 0.0000000   0.000000 0.2679285 0.0000000 0.0000000
#> StressSens        0.1360290 0.0000000   0.000000 0.0000000 0.0000000 0.0000000
#> SelfAware         0.0000000 0.2679285   0.000000 0.0000000 0.0000000 0.0000000
#> Moodiness         0.0000000 0.0000000   0.000000 0.0000000 0.0000000 0.4398609
#> Cautious          0.0000000 0.0000000   0.000000 0.0000000 0.4398609 0.0000000
#> ThoughtFuture     0.0000000 0.2224662   0.000000 0.0000000 0.0000000 0.0000000
#> RespCriticism     0.0000000 0.0000000   0.000000 0.0000000 0.2752566 0.2687388
#>               ThoughtFuture RespCriticism
#> EmoReactivity     0.0000000     0.0000000
#> TendWorry         0.2224662     0.0000000
#> StressSens        0.0000000     0.0000000
#> SelfAware         0.0000000     0.0000000
#> Moodiness         0.0000000     0.2752566
#> Cautious          0.0000000     0.2687388
#> ThoughtFuture     0.0000000     0.0000000
#> RespCriticism     0.0000000     0.0000000

In the case of missing data, the neighborhood_net() function offers several additional arguments that control how sample size and missingness are handled.

Calculation of Sample Size

The n_calc argument specifies how the sample size is calculated for each node-wise regression. This affects the penalty term used in model selection.

The available options are:

"individual" (default): Uses the number of non-missing observations for each individual variable. This is the recommended approach.
"average": Uses the average number of non-missing observations across all variables.
"max": Uses the maximum number of non-missing observations across all variables.
"total": Uses the total number of observations in the data set (i.e., the number of rows).

Handling Missing Data

The missing_handling argument specifies how the correlation matrix is estimated when the input data contains missing values. Two approaches are supported:

"two-step-em": Applies a standard Expectation-Maximization (EM) algorithm to estimate the covariance matrix. This method is the default as it is computationally efficient. However, it only performs well when the sample size is large relative to the amount of edges in the network and the proportion of missingness.
"stacked-mi": Applies multiple imputation to create several completed data sets, which are then stacked into a single data set. A correlation matrix is computed from this stacked data.

As described previously, deletion techniques (listwise and pairwise) are also available, but their use is not recommended. When "two-step-em" is selected, the correlation matrix is always based on Pearson correlations, regardless of the ordered argument. In contrast, when "stacked-mi" is used, the ordered argument determines how variables are treated (continuous vs. ordered categorical) during the correlation estimation.

If "stacked-mi" is used, the nimp argument controls the number of imputations (default: 20), while imp_method specifies the imputation method (default: "pmm" for predictive mean matching).

Estimation with Missing Data

We can now illustrate how to estimate a network structure using the neighborhood_net() function with a data set that contains missing values. All variables are continuous in this example.

# Estimate network for data set with missing values
result_mis_cont <- neighborhood_net(data = mantar_dummy_mis_cont,
                                    n_calc = "individual", 
                                    missing_handling = "stacked-mi",
                                    nimp = 20,
                                    imp_method = "pmm",
                                    pcor_merge_rule = "and")

# View estimated partial correlations
result_mis_cont
#>               EmoReactivity TendWorry StressSens SelfAware Moodiness  Cautious
#> EmoReactivity     0.0000000 0.0000000  0.1737479 0.0000000 0.0000000 0.0000000
#> TendWorry         0.0000000 0.0000000  0.0000000 0.2739825 0.0000000 0.1356341
#> StressSens        0.1737479 0.0000000  0.0000000 0.0000000 0.0000000 0.0000000
#> SelfAware         0.0000000 0.2739825  0.0000000 0.0000000 0.0000000 0.0000000
#> Moodiness         0.0000000 0.0000000  0.0000000 0.0000000 0.0000000 0.4196675
#> Cautious          0.0000000 0.1356341  0.0000000 0.0000000 0.4196675 0.0000000
#> ThoughtFuture     0.1945899 0.2664316  0.0000000 0.0000000 0.0000000 0.0000000
#> RespCriticism     0.0000000 0.0000000  0.0000000 0.2873411 0.2640929 0.1867003
#>               ThoughtFuture RespCriticism
#> EmoReactivity     0.1945899     0.0000000
#> TendWorry         0.2664316     0.0000000
#> StressSens        0.0000000     0.0000000
#> SelfAware         0.0000000     0.2873411
#> Moodiness         0.0000000     0.2640929
#> Cautious          0.0000000     0.1867003
#> ThoughtFuture     0.0000000     0.0000000
#> RespCriticism     0.0000000     0.0000000

Note: Network estimation with stacked multiple imputation may take some time. During the imputation process, messages from the mice package may be printed.

Network Estimation via Regularization

The regularization_net() function estimates a network structure based on regularization techniques using information criteria for model selection. Similar to neighborhood_net(), this function can either be provided with raw data (data frame or matrix) or a correlation matrix along with sample sizes for each variable. The examples will use raw data, as this is the more complex case. The following arguments are particularly relevant for controlling the network estimation process (with fully observed data):

Type of Regularization Penalty and corresponding Parameters

The penalty argument controls the type of regularization used in the network estimation. The recommended options are using the graphical lasso ("glasso") as a convex penalty or "atan" as a non-convex penalty.

For glasso, the lambda_min_ratio and n_lambdas arguments control the range and number of penalty parameters evaluated during model selection. The default values are generally appropriate. For all nonconvex penalties (e.g., "atan"), there is the option to specify an additional parameter via the gamma argument. However, just using one default value (default value is different for different penalty types) for gamma, by setting the argument vary = "lambda" is sufficient.

The last argument controlling the regularization process is pen_diag which specifies whether the diagonal elements of the covariance matrix should be penalized (TRUE) or not (FALSE, default).

Information Criteria

The ic_type argument determines the information-criterion penalty used during model selection in the regularization process. It specifies the penalty applied per freely estimated parameter (i.e., each included edge or nonzero partial correlation), thereby controlling the sparsity of the resulting model. The available options are:

ic_type = "bic" (default): corresponds to the Bayesian Information Criterion (BIC)
ic_type = "ebic": corresponds to the Extended Bayesian Information Criterion (EBIC)
ic_type = "aic": corresponds to the Akaike Information Criterion (AIC)

The default depends on the selected regularization approach. For non-convex penalties, the default is "bic", whereas for the "glasso" penalty the default is "ebic". In the latter case, an additional parameter extended_gamma must be specified (default: 0.5).

Type of Correlation

The ordered argument specifies how variables are treated when estimating correlations from raw data.

Global specification:
- ordered = TRUE: all variables are treated as ordered categorical
- ordered = FALSE: all variables are treated as continuous
Variable-specific specification:
- A logical vector of length equal to the number of variables can be supplied to indicate which variables are treated as ordered categorical (e.g., ordered = c(TRUE, FALSE, FALSE, TRUE)).
- If a global specification is used, the function automatically creates a vector of this value with the same length as the number of variables.

Based on these specifications, the function applies the appropriate correlation type for each pair of variables:

both FALSE: Pearson correlation
one TRUE and one FALSE: polyserial correlation
both TRUE: polychoric correlation

Estimation with Continuous Complete Data

After discussing the key arguments, we can now illustrate how to estimate a network structure using the regularization_net() function with a data set without missing values.

# Estimate network from full data set using BIC and the glasso penalty
result_full_cont <- regularization_net(data = mantar_dummy_full_cont,
                                       penalty = "glasso",
                                       vary = "lambda",
                                       n_lambda = 100,
                                       lambda_min_ratio = 0.1,
                                       ic_type = "bic", 
                                       pcor_merge_rule = "and",
                                       ordered = FALSE)
#> Warning in def_pen_mats(mat = mat, penalty = penalty, vary = vary, n_lambda =
#> n_lambda, : Varying 'lambda' only, n_gamma is set to 1.

# View estimated partial correlations
result_full_cont
#>               EmoReactivity TendWorry  StressSens  SelfAware  Moodiness
#> EmoReactivity    0.00000000 0.2094365  0.08045204 0.00000000 0.00000000
#> TendWorry        0.20943651 0.0000000  0.00000000 0.19091266 0.00000000
#> StressSens       0.08045204 0.0000000  0.00000000 0.00000000 0.00000000
#> SelfAware        0.00000000 0.1909127  0.00000000 0.00000000 0.00000000
#> Moodiness        0.00000000 0.0000000  0.00000000 0.00000000 0.00000000
#> Cautious         0.07143541 0.0000000  0.00000000 0.00000000 0.39918941
#> ThoughtFuture    0.08279610 0.2160799  0.00000000 0.01126785 0.02095296
#> RespCriticism    0.01204568 0.0000000 -0.03899045 0.06692828 0.25264236
#>                 Cautious ThoughtFuture RespCriticism
#> EmoReactivity 0.07143541    0.08279610    0.01204568
#> TendWorry     0.00000000    0.21607993    0.00000000
#> StressSens    0.00000000    0.00000000   -0.03899045
#> SelfAware     0.00000000    0.01126785    0.06692828
#> Moodiness     0.39918941    0.02095296    0.25264236
#> Cautious      0.00000000    0.06583096    0.22841027
#> ThoughtFuture 0.06583096    0.00000000    0.00000000
#> RespCriticism 0.22841027    0.00000000    0.00000000

With missing data, the regularization_net() function offers several additional arguments that control how sample size, information criteria computation and missingness are handled.

Calculation of Sample Size for Information Criteria

The n_calc argument specifies how the sample size is calculated for the information criteria computation. Only one value is needed here, as the regularization approach does not rely on node-wise regressions. The default input is "average", which uses the average number of non-missing observations for all estimated correlations - this includes the correlations of variables with themselves. Ignoring these correlations (i.e., using the average number of non-missing observations across different variables only) is also possible with setting count_diagonal to FALSE.

Within the information criteria computation, the likelihood for the candidate models has to be computed. The likelihood argument controls how this is done:

"mat_based" (default): The likelihood is computed based on the sample correlation matrix.
"obs_based": The likelihood is computed based on the observed data. This option is only available when the raw input data contains no ordered categorical variables. In these cases, the observed data log-likelihood is recommended as it is a better representation of the sample data than using the sample correlation matrix.

These options to compute the likelihood are also available with full data. However, they return the exact same results in this case.

Handling Missing Data

The missing_handling argument specifies how the correlation matrix is estimated when the input data contains missing values. Two approaches are supported:

"two-step-em": Applies a standard Expectation-Maximization (EM) algorithm to estimate the covariance matrix. This method is the default as it is computationally efficient. However, it only performs well when the sample size is large relative to the amount of edges in the network and the proportion of missingness.
"stacked-mi": Applies multiple imputation to create several completed data sets, which are then stacked into a single data set. A correlation matrix is computed from this stacked data.

If "stacked-mi" is used, the nimp argument controls the number of imputations (default: 20), while imp_method specifies the imputation method (default: "pmm" for predictive mean matching).

Estimation with Missing Data

We can now illustrate how to estimate a network structure using the regularization_net() function with a data set that contains missing values. All variables are continuous in this example.

# Estimate network for data set with missing values
result_mis_cont <- regularization_net(data = mantar_dummy_mis_cont,
                                      likelihood = "obs_based",
                                      penalty = "glasso",
                                      vary = "lambda",
                                      n_lambda = 100,
                                      lambda_min_ratio = 0.1,
                                      ic_type = "ebic",
                                      extended_gamma = 0.5,
                                      n_calc = "average",
                                      missing_handling = "two-step-em",
                                      pcor_merge_rule = "and",
                                      ordered = FALSE)
#> Warning in def_pen_mats(mat = mat, penalty = penalty, vary = vary, n_lambda =
#> n_lambda, : Varying 'lambda' only, n_gamma is set to 1.

# View estimated partial correlations
result_mis_cont
#>               EmoReactivity  TendWorry StressSens  SelfAware Moodiness
#> EmoReactivity    0.00000000 0.00000000  0.0866789 0.03103689 0.0000000
#> TendWorry        0.00000000 0.00000000  0.0000000 0.19973839 0.0000000
#> StressSens       0.08667890 0.00000000  0.0000000 0.00000000 0.0000000
#> SelfAware        0.03103689 0.19973839  0.0000000 0.00000000 0.0000000
#> Moodiness        0.00000000 0.00000000  0.0000000 0.00000000 0.0000000
#> Cautious         0.00000000 0.07321694  0.0000000 0.00000000 0.3645097
#> ThoughtFuture    0.12858049 0.18479199  0.0000000 0.03402553 0.0000000
#> RespCriticism    0.00000000 0.00000000  0.0000000 0.22080386 0.2346446
#>                 Cautious ThoughtFuture RespCriticism
#> EmoReactivity 0.00000000    0.12858049     0.0000000
#> TendWorry     0.07321694    0.18479199     0.0000000
#> StressSens    0.00000000    0.00000000     0.0000000
#> SelfAware     0.00000000    0.03402553     0.2208039
#> Moodiness     0.36450971    0.00000000     0.2346446
#> Cautious      0.00000000    0.00000000     0.1504089
#> ThoughtFuture 0.00000000    0.00000000     0.0000000
#> RespCriticism 0.15040891    0.00000000     0.0000000

Complete Example Analysis

Finally, we consider a real-world example that is also described as an application in Nehler and Schultze (2024). This example is based on the data from the cross-sectional study reported in Vervaet et al. (2021). The original data is available in the OSF Project and can be temporarily downloaded and loaded into the R environment.

url <- "https://osf.io/download/6s9p4/"

zipfile <- file.path(tempdir(), "vervaet.zip")
exdir <- file.path(tempdir(), "vervaet")

dir.create(exdir, recursive = TRUE, showWarnings = FALSE)

download.file(url, destfile = zipfile, mode = "wb")
unzip(zipfile, exdir = exdir)

load(file.path(exdir, "Supplementary materials", "Dataset.RData"))

In this example, we analyze data from 2302 individuals to examine the cross-sectional network structure of 32 scores related to eating disorders (ED) and associated factors (e.g., depressive symptoms, anxiety), with the goal of identifying transdiagnostic vulnerabilities. We now perform a check for missingness in the data set.

colMeans(is.na(Data))
#>        Dft        Bul        Bod        Ine        Per        Dis        Awa 
#> 0.05473501 0.05430061 0.05473501 0.05430061 0.05516942 0.05473501 0.05473501 
#>        Fea        Asm        Imp        Soc        BDI        Anx        Res 
#> 0.05603823 0.44222415 0.44092094 0.44178975 0.25282363 0.24761077 0.52780191 
#>        Nov        Har        Red        Pes        Sed        Coa        Set 
#> 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584 0.02128584 
#>        Dir        Aut        Lim        Foc        Inh        Mis        Sta 
#> 0.29756733 0.29756733 0.29930495 0.29843614 0.29843614 0.44439618 0.44483058 
#>        Exp        Cri        Qua       Pref 
#> 0.44439618 0.44439618 0.44396177 0.44439618

Missingness proportions range from as low as 2% to as high as 53%. Overall, the average missingness rate was 22%.

The first decision concerns the choice of a suitable network estimation method. Simulation studies indicate that glasso regularization combined with EBIC model selection performs well when the ratio of the number of observations to the number of variables is relatively small (Isvoranu and Epskamp 2023; Nehler and Schultze 2025). In our case, with \(N\) = 2302 observations and \(p\) = 32 variables, this ratio is relatively large. Under such conditions, both nonconvex penalties and neighborhood selection tend to perform well.

The literature further suggests that neighborhood selection may be advantageous when the amount of missingness differs substantially between variables (Nehler and Schultze 2024), which is the case in this data set. This motivates our choice to proceed with the neighborhood_net() function.

Next, we examine the measurement level of the variables, as this also influences the choice of an appropriate calculation method and determines which missing-data handling strategies are feasible. An initial overview can be obtained by inspecting the summary of the data set.

summary(Data)
#>       Dft             Bul             Bod             Ine            Per       
#>  Min.   : 0.00   Min.   : 0.00   Min.   : 0.00   Min.   : 0.0   Min.   : 0.00  
#>  1st Qu.:16.00   1st Qu.: 7.00   1st Qu.:18.00   1st Qu.:14.0   1st Qu.: 8.00  
#>  Median :24.00   Median :17.00   Median :28.00   Median :30.0   Median :16.00  
#>  Mean   :24.52   Mean   :16.86   Mean   :29.22   Mean   :27.8   Mean   :15.56  
#>  3rd Qu.:36.00   3rd Qu.:27.00   3rd Qu.:41.00   3rd Qu.:41.0   3rd Qu.:23.00  
#>  Max.   :42.00   Max.   :42.00   Max.   :54.00   Max.   :60.0   Max.   :36.00  
#>  NA's   :126     NA's   :125     NA's   :126     NA's   :125    NA's   :127    
#>       Dis             Awa            Fea             Asm             Imp       
#>  Min.   : 0.00   Min.   : 1.0   Min.   : 0.00   Min.   : 0.00   Min.   : 0.00  
#>  1st Qu.: 7.00   1st Qu.:11.0   1st Qu.: 7.00   1st Qu.:22.00   1st Qu.:26.00  
#>  Median :19.00   Median :28.0   Median :20.00   Median :27.00   Median :31.00  
#>  Mean   :17.34   Mean   :25.8   Mean   :18.96   Mean   :26.93   Mean   :31.25  
#>  3rd Qu.:27.00   3rd Qu.:39.0   3rd Qu.:29.00   3rd Qu.:32.00   3rd Qu.:36.00  
#>  Max.   :42.00   Max.   :58.0   Max.   :48.00   Max.   :47.00   Max.   :63.00  
#>  NA's   :126     NA's   :126    NA's   :129     NA's   :1018    NA's   :1015   
#>       Soc             BDI             Anx             Res        
#>  Min.   : 0.00   Min.   : 0.00   Min.   : 4.00   Min.   :  8.00  
#>  1st Qu.:24.00   1st Qu.:18.00   1st Qu.:52.00   1st Qu.: 56.00  
#>  Median :29.00   Median :28.00   Median :60.00   Median : 65.00  
#>  Mean   :28.51   Mean   :27.97   Mean   :58.79   Mean   : 64.35  
#>  3rd Qu.:34.00   3rd Qu.:38.00   3rd Qu.:68.00   3rd Qu.: 72.00  
#>  Max.   :45.00   Max.   :60.00   Max.   :80.00   Max.   :100.00  
#>  NA's   :1017    NA's   :582     NA's   :570     NA's   :1215    
#>       Nov             Har             Red             Pes       
#>  Min.   : 1.00   Min.   : 2.00   Min.   : 1.00   Min.   :0.000  
#>  1st Qu.:13.00   1st Qu.:19.00   1st Qu.:14.00   1st Qu.:4.000  
#>  Median :18.00   Median :25.00   Median :16.00   Median :6.000  
#>  Mean   :17.98   Mean   :23.87   Mean   :16.29   Mean   :5.345  
#>  3rd Qu.:23.00   3rd Qu.:30.00   3rd Qu.:19.00   3rd Qu.:7.000  
#>  Max.   :88.00   Max.   :35.00   Max.   :24.00   Max.   :8.000  
#>  NA's   :49      NA's   :49      NA's   :49      NA's   :49     
#>       Sed             Coa             Set            Dir             Aut       
#>  Min.   : 1.00   Min.   : 2.00   Min.   : 0.0   Min.   : 69.0   Min.   : 17.0  
#>  1st Qu.:16.00   1st Qu.:28.00   1st Qu.: 6.0   1st Qu.:133.0   1st Qu.:110.0  
#>  Median :22.00   Median :33.00   Median :10.0   Median :181.0   Median :145.0  
#>  Mean   :22.27   Mean   :31.81   Mean   :10.6   Mean   :185.9   Mean   :151.5  
#>  3rd Qu.:28.00   3rd Qu.:37.00   3rd Qu.:14.0   3rd Qu.:234.0   3rd Qu.:188.0  
#>  Max.   :44.00   Max.   :42.00   Max.   :31.0   Max.   :377.0   Max.   :330.0  
#>  NA's   :49      NA's   :49      NA's   :49     NA's   :685     NA's   :685    
#>       Lim           Foc              Inh             Mis        
#>  Min.   :  0   Min.   :  0.00   Min.   :  0.0   Min.   :  2.00  
#>  1st Qu.: 56   1st Qu.: 69.00   1st Qu.: 66.0   1st Qu.: 22.00  
#>  Median : 69   Median : 88.00   Median : 85.0   Median : 29.00  
#>  Mean   : 71   Mean   : 89.36   Mean   : 84.4   Mean   : 28.36  
#>  3rd Qu.: 86   3rd Qu.:108.00   3rd Qu.:103.0   3rd Qu.: 35.00  
#>  Max.   :152   Max.   :770.00   Max.   :150.0   Max.   :222.00  
#>  NA's   :689   NA's   :687      NA's   :687     NA's   :1023    
#>       Sta             Exp             Cri              Qua       
#>  Min.   : 4.00   Min.   : 3.00   Min.   : 2.000   Min.   : 3.00  
#>  1st Qu.:20.00   1st Qu.: 7.00   1st Qu.: 5.000   1st Qu.:11.00  
#>  Median :24.00   Median :10.00   Median : 8.000   Median :13.00  
#>  Mean   :23.75   Mean   :10.99   Mean   : 8.955   Mean   :13.27  
#>  3rd Qu.:28.00   3rd Qu.:15.00   3rd Qu.:12.000   3rd Qu.:16.00  
#>  Max.   :35.00   Max.   :25.00   Max.   :20.000   Max.   :20.00  
#>  NA's   :1024    NA's   :1023    NA's   :1023     NA's   :1022   
#>       Pref      
#>  Min.   : 7.00  
#>  1st Qu.:19.00  
#>  Median :23.00  
#>  Mean   :22.07  
#>  3rd Qu.:26.00  
#>  Max.   :30.00  
#>  NA's   :1023

The variables are ordered categorical, but the number of categories is sufficiently large to treat them as continuous, as demonstrated by Johal and Rhemtulla (2023). Treating the variables as continuous gives us full flexibility in choosing among the available missing-data handling methods.

Nehler and Schultze (2024) showed that when the amount of available information relative to the number of parameters to be estimated (i.e., edges) is low, the stacked multiple imputation approach tends to perform better, while in other situations it performs similarly to the two-step EM algorithm. In our case, although the number of nodes is relatively high, we also have a large number of observations and only a moderate to small amount of missingness overall. Therefore, both methods are feasible, but we opt for the two-step EM approach due to its substantially lower computational demand.

Estimation thus proceeds as follows:

final_result <- neighborhood_net(data = Data,
                                 n_calc = "individual",
                                 missing_handling = "two-step-em",
                                 pcor_merge_rule = "and",
                                 ordered = FALSE)

The estimated partial correlation matrix can be accessed.

final_result$pcor
#>              Dft         Bul         Bod        Ine         Per         Dis
#> Dft   0.00000000  0.13068027  0.45563143 0.00000000  0.00000000  0.07300335
#> Bul   0.13068027  0.00000000  0.12558958 0.00000000  0.08011546  0.11043318
#> Bod   0.45563143  0.12558958  0.00000000 0.23022300  0.09698480  0.00000000
#> Ine   0.00000000  0.00000000  0.23022300 0.00000000  0.11802965  0.16908257
#> Per   0.00000000  0.08011546  0.09698480 0.11802965  0.00000000  0.09005559
#> Dis   0.07300335  0.11043318  0.00000000 0.16908257  0.09005559  0.00000000
#> Awa   0.12477483  0.28481105  0.00000000 0.22909674  0.08493158  0.28554073
#> Fea   0.11757230  0.00000000 -0.08789193 0.12255710  0.07037740  0.19875108
#> Asm   0.21048279 -0.11489155  0.11759054 0.00000000  0.19982194 -0.20616671
#> Imp   0.00000000  0.00000000 -0.08332489 0.10468153  0.00000000  0.00000000
#> Soc  -0.15520417  0.00000000  0.00000000 0.26198392  0.00000000  0.32930950
#> BDI   0.00000000  0.00000000  0.00000000 0.00000000 -0.08415238  0.00000000
#> Anx   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Res   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000 -0.07881709
#> Nov   0.00000000  0.00000000  0.07743995 0.00000000  0.00000000  0.00000000
#> Har   0.00000000  0.00000000  0.00000000 0.00000000 -0.05974071  0.00000000
#> Red   0.00000000  0.07960554  0.00000000 0.00000000  0.00000000 -0.24272710
#> Pes   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Sed   0.00000000 -0.07549202  0.00000000 0.00000000  0.00000000  0.12724328
#> Coa   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Set   0.00000000 -0.06385982  0.00000000 0.00000000  0.00000000  0.00000000
#> Dir   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Aut   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Lim  -0.07919214  0.15629045  0.00000000 0.00000000  0.00000000 -0.10471845
#> Foc   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Inh   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Mis   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Sta   0.08379928  0.00000000  0.00000000 0.00000000  0.15138007  0.00000000
#> Exp   0.00000000  0.00000000  0.00000000 0.00000000  0.12996724  0.00000000
#> Cri   0.00000000  0.00000000  0.00000000 0.00000000  0.00000000  0.00000000
#> Qua   0.00000000  0.00000000  0.00000000 0.06632995  0.00000000  0.00000000
#> Pref  0.00000000 -0.07592853  0.00000000 0.00000000  0.00000000  0.00000000
#>             Awa         Fea        Asm         Imp         Soc         BDI
#> Dft  0.12477483  0.11757230  0.2104828  0.00000000 -0.15520417  0.00000000
#> Bul  0.28481105  0.00000000 -0.1148916  0.00000000  0.00000000  0.00000000
#> Bod  0.00000000 -0.08789193  0.1175905 -0.08332489  0.00000000  0.00000000
#> Ine  0.22909674  0.12255710  0.0000000  0.10468153  0.26198392  0.00000000
#> Per  0.08493158  0.07037740  0.1998219  0.00000000  0.00000000 -0.08415238
#> Dis  0.28554073  0.19875108 -0.2061667  0.00000000  0.32930950  0.00000000
#> Awa  0.00000000  0.17938994  0.0000000  0.19209951  0.00000000  0.00000000
#> Fea  0.17938994  0.00000000  0.0000000  0.08247848  0.00000000  0.00000000
#> Asm  0.00000000  0.00000000  0.0000000  0.34437512  0.36568271  0.00000000
#> Imp  0.19209951  0.08247848  0.3443751  0.00000000  0.00000000  0.00000000
#> Soc  0.00000000  0.00000000  0.3656827  0.00000000  0.00000000  0.00000000
#> BDI  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Anx  0.00000000  0.00000000  0.0000000  0.05856714  0.00000000  0.42327065
#> Res  0.00000000 -0.06814741  0.0000000  0.00000000 -0.13070557 -0.18787960
#> Nov  0.00000000  0.00000000  0.0000000  0.10686486  0.00000000  0.00000000
#> Har  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Red  0.00000000  0.06918806  0.0000000  0.00000000 -0.07993343  0.00000000
#> Pes  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Sed  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Coa  0.07198617  0.00000000  0.0000000 -0.11343873  0.00000000  0.00000000
#> Set  0.00000000  0.00000000  0.0000000  0.08836548  0.00000000  0.00000000
#> Dir  0.00000000 -0.10993735  0.0000000  0.00000000  0.00000000  0.12538155
#> Aut  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.10389335
#> Lim  0.00000000  0.00000000  0.0000000  0.10342733  0.00000000  0.00000000
#> Foc  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Inh  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Mis  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Sta  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Exp  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Cri  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.07484521
#> Qua  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#> Pref 0.00000000  0.00000000  0.0000000  0.00000000  0.00000000  0.00000000
#>              Anx         Res         Nov         Har         Red         Pes
#> Dft   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Bul   0.00000000  0.00000000  0.00000000  0.00000000  0.07960554  0.00000000
#> Bod   0.00000000  0.00000000  0.07743995  0.00000000  0.00000000  0.00000000
#> Ine   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Per   0.00000000  0.00000000  0.00000000 -0.05974071  0.00000000  0.00000000
#> Dis   0.00000000 -0.07881709  0.00000000  0.00000000 -0.24272710  0.00000000
#> Awa   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Fea   0.00000000 -0.06814741  0.00000000  0.00000000  0.06918806  0.00000000
#> Asm   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Imp   0.05856714  0.00000000  0.10686486  0.00000000  0.00000000  0.00000000
#> Soc   0.00000000 -0.13070557  0.00000000  0.00000000 -0.07993343  0.00000000
#> BDI   0.42327065 -0.18787960  0.00000000  0.00000000  0.00000000  0.00000000
#> Anx   0.00000000 -0.18732742  0.00000000  0.19069760  0.00000000  0.09573182
#> Res  -0.18732742  0.00000000  0.00000000 -0.17706062 -0.08412776  0.00000000
#> Nov   0.00000000  0.00000000  0.00000000 -0.30581939  0.11275207 -0.16187842
#> Har   0.19069760 -0.17706062 -0.30581939  0.00000000  0.08514112 -0.07669957
#> Red   0.00000000 -0.08412776  0.11275207  0.08514112  0.00000000  0.00000000
#> Pes   0.09573182  0.00000000 -0.16187842 -0.07669957  0.00000000  0.00000000
#> Sed  -0.13178762  0.25696100 -0.14454895 -0.18976565  0.00000000  0.09485402
#> Coa   0.00000000  0.00000000  0.00000000  0.00000000  0.36699126  0.00000000
#> Set   0.00000000  0.18370973  0.10299532 -0.08998313  0.11135028  0.13041938
#> Dir   0.00000000  0.00000000  0.00000000  0.00000000 -0.13362608  0.00000000
#> Aut   0.00000000  0.00000000  0.00000000  0.18458193  0.00000000 -0.11323679
#> Lim   0.00000000  0.00000000  0.22961496  0.00000000  0.00000000 -0.23866316
#> Foc   0.00000000  0.00000000 -0.11516138  0.00000000  0.18458310 -0.09529881
#> Inh   0.13015338  0.00000000 -0.08720893  0.00000000 -0.12771036  0.43824666
#> Mis   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Sta   0.00000000  0.09983737  0.00000000  0.00000000  0.00000000  0.16074045
#> Exp   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Cri   0.00000000  0.00000000  0.00000000  0.00000000  0.00000000  0.00000000
#> Qua   0.00000000  0.00000000  0.00000000  0.00000000  0.06389421  0.00000000
#> Pref  0.00000000  0.00000000 -0.18414631  0.00000000  0.00000000  0.00000000
#>              Sed         Coa         Set        Dir         Aut         Lim
#> Dft   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000 -0.07919214
#> Bul  -0.07549202  0.00000000 -0.06385982  0.0000000  0.00000000  0.15629045
#> Bod   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Ine   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Per   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Dis   0.12724328  0.00000000  0.00000000  0.0000000  0.00000000 -0.10471845
#> Awa   0.00000000  0.07198617  0.00000000  0.0000000  0.00000000  0.00000000
#> Fea   0.00000000  0.00000000  0.00000000 -0.1099374  0.00000000  0.00000000
#> Asm   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Imp   0.00000000 -0.11343873  0.08836548  0.0000000  0.00000000  0.10342733
#> Soc   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> BDI   0.00000000  0.00000000  0.00000000  0.1253816  0.10389335  0.00000000
#> Anx  -0.13178762  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Res   0.25696100  0.00000000  0.18370973  0.0000000  0.00000000  0.00000000
#> Nov  -0.14454895  0.00000000  0.10299532  0.0000000  0.00000000  0.22961496
#> Har  -0.18976565  0.00000000 -0.08998313  0.0000000  0.18458193  0.00000000
#> Red   0.00000000  0.36699126  0.11135028 -0.1336261  0.00000000  0.00000000
#> Pes   0.09485402  0.00000000  0.13041938  0.0000000 -0.11323679 -0.23866316
#> Sed   0.00000000  0.17672331 -0.08861919 -0.1205797 -0.07968325  0.00000000
#> Coa   0.17672331  0.00000000  0.00000000 -0.1504409  0.09906808 -0.25251327
#> Set  -0.08861919  0.00000000  0.00000000  0.0677524  0.09519513  0.00000000
#> Dir  -0.12057968 -0.15044087  0.06775240  0.0000000  0.41439065  0.00000000
#> Aut  -0.07968325  0.09906808  0.09519513  0.4143906  0.00000000  0.15192789
#> Lim   0.00000000 -0.25251327  0.00000000  0.0000000  0.15192789  0.00000000
#> Foc   0.00000000  0.27304492  0.09614857  0.2318596  0.09625451  0.00000000
#> Inh   0.00000000  0.00000000  0.00000000  0.0000000  0.25461910  0.36629158
#> Mis   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Sta   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Exp   0.00000000  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Cri   0.00000000  0.00000000  0.00000000  0.1146817  0.00000000  0.00000000
#> Qua  -0.09992706  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#> Pref  0.11323746  0.00000000  0.00000000  0.0000000  0.00000000  0.00000000
#>              Foc         Inh       Mis        Sta       Exp        Cri
#> Dft   0.00000000  0.00000000 0.0000000 0.08379928 0.0000000 0.00000000
#> Bul   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Bod   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Ine   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Per   0.00000000  0.00000000 0.0000000 0.15138007 0.1299672 0.00000000
#> Dis   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Awa   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Fea   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Asm   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Imp   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Soc   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> BDI   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.07484521
#> Anx   0.00000000  0.13015338 0.0000000 0.00000000 0.0000000 0.00000000
#> Res   0.00000000  0.00000000 0.0000000 0.09983737 0.0000000 0.00000000
#> Nov  -0.11516138 -0.08720893 0.0000000 0.00000000 0.0000000 0.00000000
#> Har   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Red   0.18458310 -0.12771036 0.0000000 0.00000000 0.0000000 0.00000000
#> Pes  -0.09529881  0.43824666 0.0000000 0.16074045 0.0000000 0.00000000
#> Sed   0.00000000  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Coa   0.27304492  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Set   0.09614857  0.00000000 0.0000000 0.00000000 0.0000000 0.00000000
#> Dir   0.23185962  0.00000000 0.0000000 0.00000000 0.0000000 0.11468170
#> Aut   0.09625451  0.25461910 0.0000000 0.00000000 0.0000000 0.00000000
#> Lim   0.00000000  0.36629158 0.0000000 0.00000000 0.0000000 0.00000000
#> Foc   0.00000000  0.24599931 0.0000000 0.00000000 0.0000000 0.00000000
#> Inh   0.24599931  0.00000000 0.0000000 0.11984313 0.0000000 0.00000000
#> Mis   0.00000000  0.00000000 0.0000000 0.37057477 0.0000000 0.15359247
#> Sta   0.00000000  0.11984313 0.3705748 0.00000000 0.1373552 0.00000000
#> Exp   0.00000000  0.00000000 0.0000000 0.13735518 0.0000000 0.64551915
#> Cri   0.00000000  0.00000000 0.1535925 0.00000000 0.6455192 0.00000000
#> Qua   0.00000000  0.00000000 0.2050507 0.21005917 0.0000000 0.00000000
#> Pref  0.00000000  0.07492378 0.0000000 0.31996271 0.0000000 0.00000000
#>              Qua        Pref
#> Dft   0.00000000  0.00000000
#> Bul   0.00000000 -0.07592853
#> Bod   0.00000000  0.00000000
#> Ine   0.06632995  0.00000000
#> Per   0.00000000  0.00000000
#> Dis   0.00000000  0.00000000
#> Awa   0.00000000  0.00000000
#> Fea   0.00000000  0.00000000
#> Asm   0.00000000  0.00000000
#> Imp   0.00000000  0.00000000
#> Soc   0.00000000  0.00000000
#> BDI   0.00000000  0.00000000
#> Anx   0.00000000  0.00000000
#> Res   0.00000000  0.00000000
#> Nov   0.00000000 -0.18414631
#> Har   0.00000000  0.00000000
#> Red   0.06389421  0.00000000
#> Pes   0.00000000  0.00000000
#> Sed  -0.09992706  0.11323746
#> Coa   0.00000000  0.00000000
#> Set   0.00000000  0.00000000
#> Dir   0.00000000  0.00000000
#> Aut   0.00000000  0.00000000
#> Lim   0.00000000  0.00000000
#> Foc   0.00000000  0.00000000
#> Inh   0.00000000  0.07492378
#> Mis   0.20505073  0.00000000
#> Sta   0.21005917  0.31996271
#> Exp   0.00000000  0.00000000
#> Cri   0.00000000  0.00000000
#> Qua   0.00000000  0.13692672
#> Pref  0.13692672  0.00000000

This partial correlation matrix can be used for reporting purposes, but it can also serve as input for further analyses in other packages (e.g., centrality analysis, community detection).

Beyond that, two methods are available for inspecting the results of networks estimated with the mantar package; regardless of the estimation method used or whether missing values were present (although the output structure may differ slightly). The first option is to obtain a summary of the results.

summary(final_result)
#> The density of the estimated network is 0.276
#> 
#> Network was estimated using neighborhood selection on data with missing values.
#> Missing data were handled using 'two-step-em'.
#> The information criterion was BIC and the 'and' rule was used for edge inclusion.
#> 
#> The sample sizes used for the nodewise regressions were as follows:
#>  Dft  Bul  Bod  Ine  Per  Dis  Awa  Fea  Asm  Imp  Soc  BDI  Anx  Res  Nov  Har 
#> 2176 2177 2176 2177 2175 2176 2176 2173 1284 1287 1285 1720 1732 1087 2253 2253 
#>  Red  Pes  Sed  Coa  Set  Dir  Aut  Lim  Foc  Inh  Mis  Sta  Exp  Cri  Qua Pref 
#> 2253 2253 2253 2253 2253 1617 1617 1613 1615 1615 1279 1278 1279 1279 1280 1279

This output mainly provides information about the estimation process,much of which reflects the arguments we specified earlier, but it also includes two particularly informative elements. First, it reports the density of the estimated network (i.e., the proportion of non-zero edges). Second, it provides the effective sample sizes used for each node-wise regression.

Most of the time, the goal is not to present the partial correlation matrix itself, but rather to visualize the resulting network structure. This can be achieved by creating a network plot, which in mantar builds on the functionality of the qgraph package. The plot can be customized using the various options provided by qgraph. A common customization is to color nodes according to predefined clusters and to display full variable names in a legend. The names and grouping structure used here follow the original analysis code of the study reported in Vervaet et al. (2021).

Groups <- c(rep("EDI-II", 11), rep("BDI", 1), rep("STAI", 1), rep("RS-NL", 1), 
  rep("TCI", 7), rep("YSQ", 5), rep("FMPS", 6))

# Create names for legend
Names <- c("Drive for Thinness", "Bulimia","Body Dissatisfaction", "Ineffectiveness", "Perfectionism", 
  "Interpersonal Distrust ", "Interoceptive Awareness ", "Maturity Fears", "Asceticism", 
  "Impulse Regulation","Social Insecurity", "Depression", "Anxiety", "Resilience", "Novelty Seeking", 
  "Harm Avoidance", "Reward Dependence", "Persistence", "Self-Directedness", "Cooperativeness", 
  "Self- Transcendence", "Disconnection and Rejection", "Impaired Autonomy & Performance",
  "Impaired Limits", "Other-directness", "Overvigilance & Inhibition", "Concern over Mistakes", 
  "Personal Standards", "Parental Expectations", "Parental Criticism", "Doubting of Actions", "Order and Organisation")


Lab_Colors <- c(rep('white', 11),
  rep('white', 1),
  rep('black', 1),
  rep('white', 1),
  rep('black', 7),
  rep('black', 5),
  rep('white', 6))

plot(final_result,
    layout = 'spring',
  nodeNames = Names,
  groups = Groups,
  label.color = Lab_Colors,
  vsize = 5, 
  legend.cex = 0.15, 
  label.cex = 1.25,
  negCol = "#7A0403FF",
  posCol = "#00204DFF")

This example demonstrated how to estimate a psychological network structure using the mantar package while appropriately handling missing data, and outlined the key considerations involved in choosing between the available estimation options. It also showed how the resulting network can be further analyzed and visualized using the methods provided in the package.

mantar

Installation

Features

Functionalities

Data Sets

Function Usage

Network Estimation via Neighborhood Selection

Information Criteria

Estimation of Partial Correlation

Type of Correlation

Estimation with Continuous Complete Data

Estimation with Ordered Categorical Complete Data

Calculation of Sample Size

Handling Missing Data

Estimation with Missing Data

Network Estimation via Regularization

Type of Regularization Penalty and corresponding Parameters

Information Criteria

Type of Correlation

Estimation with Continuous Complete Data

Calculation of Sample Size for Information Criteria

Handling Missing Data

Estimation with Missing Data

Complete Example Analysis

References