| Type: | Package |
| Title: | Robust Categorical Data Analysis |
| Version: | 0.2 |
| Date: | 2026-02-10 |
| Maintainer: | Max Welz <max.welz@uzh.ch> |
| Description: | Robust categorical data analysis based on the theory of C-estimation developed in Welz (2024) <doi:10.48550/arXiv.2403.11954>. For now, the package only implements robust estimation of polychoric correlation as proposed in Welz, Mair and Alfons (2026) <doi:10.1017/psy.2025.10066> and robust estimation of polyserial correlation (Welz, 2026 <doi:10.1017/psy.2026.10091>) with methods for printing and plotting. We will implement further models in future releases. In addition, the package is still experimental, so input arguments and class structure may change in future releases. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Encoding: | UTF-8 |
| Depends: | ggplot2 |
| Imports: | Rcpp (≥ 1.0.10), stats, mvtnorm, stringr, parallel, Matrix, numDeriv, pracma, utils |
| Suggests: | testthat (≥ 3.0.0) |
| LinkingTo: | Rcpp |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | yes |
| Packaged: | 2026-02-10 10:45:25 UTC; mwelz |
| Author: | Max Welz |
| Repository: | CRAN |
| Date/Publication: | 2026-02-10 11:50:02 UTC |
Neutral initialization of starting values for polychoric correlation
Description
Initializes starting values for numerical optimization in a neutral way. The optimization problem itself is convex, so the initialization should not matter much.
Usage
initialize_param(x, y)
Arguments
x |
Vector of integer-valued responses to first rating variable, or contingency table (a |
y |
Vector of integer-valued responses to second rating variable; only required if |
Value
A vector of initial values for the polychoric correlation coefficient, the X-threshold parameters, and the Y-threshold parameters
Examples
## example data
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
initialize_param(x, y)
Plot method for classes "robpolycor" and "polycor".
Description
Plot method for classes "robpolycor" and "polycor".
Usage
## S3 method for class 'robpolycor'
plot(x, cutoff = 3, ...)
Arguments
x |
Object of class |
cutoff |
Cutoff beyond which the color scale for Pearson residuals is truncated. |
... |
Additional parameters to be passed down. |
Value
An object of class "ggplot".
Examples
## example data
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
fit <- polycor(x,y)
plot(fit)
Robust estimation of polychoric correlation
Description
Implements to robust estimator of Welz, Mair and Alfons (2024, doi:10.48550/arXiv.2407.18835) for the polychoric correlation model, based on the general theory of C-estimation proposed by Welz (2024, doi:10.48550/arXiv.2403.11954).
Usage
polycor(
x,
y = NULL,
c = 0.6,
variance = TRUE,
constrained = "ifneeded",
method = NULL,
maxcor = 0.999,
tol_thresholds = 0.01,
init = initialize_param(x, y)
)
Arguments
x |
Vector of integer-valued responses to first item, or contingency table (a |
y |
Vector of integer-valued responses to second item; only required if |
c |
Tuning constant that governs robustness; must be in |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
init |
Initialization of numerical optimization. Default is neutral. |
Value
An object of class "robpolycor", which is a list with the following components.
thetahatA vector of estimates for the polychoric correlation coefficient (
rho) as well as thresholds forx(nameda1,a2,...,a_{Kx-1}) andy(namedb1,b2,...,b_{Ky-1}).stderrA vector of standard errors for each estimate in
thetahat.vcovEstimated asymptotic covariance matrix of
thetahat. The matrix\Sigmain the paper (asymptotic covariance matrix of\sqrt{N} \hat{\theta}) can be obtained viavcov * N, whereNis the sample size.chisq,pval,dfCurrently
NULL, will in a future release be the test statistic, p-value, and degrees of freedom of a test for bivariate normality.objectiveValue of minimized loss function.
optimObject of class
optim.
Examples
## example data
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
polycor(x,y) # robust
polycor_mle(x,y) # non-robust MLE
Maximum likelihood estimation of polychoric correlation coefficient
Description
Implements the maximum likelihood estimator of Olsson (1979, Psychometrika, doi:10.1007/BF02296207) for the polychoric correlation model.
Usage
polycor_mle(
x,
y = NULL,
variance = TRUE,
constrained = "ifneeded",
twostep = FALSE,
method = NULL,
maxcor = 0.999,
tol_thresholds = 0.01,
init = initialize_param(x, y)
)
Arguments
x |
Vector of integer-valued responses to first item, or contingency table (a |
y |
Vector of integer-valued responses to second item; only required if |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
twostep |
Shall two-step estimation of Olsson (1979) <doi:10.1007/BF02296207> be performed? Default is |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
init |
Initialization of numerical optimization. Default is neutral. |
Value
An object of class "robpolycor". See polycor() for details.
Examples
## example data
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
polycor(x,y) # robust
polycor_mle(x,y) # non-robust MLE
Robust estimation of polychoric correlation matrix
Description
A useful wrapper of polycor to robustly estimate a polychoric correlation matrix by calculating all unique pairwise polychoric correlation coefficients.
Usage
polycormat(
data,
c = 0.6,
parallel = FALSE,
num_cores = 1L,
return_polycor = TRUE,
variance = TRUE,
constrained = "ifneeded",
method = NULL,
maxcor = 0.999,
tol_thresholds = 0.01
)
Arguments
data |
Data matrix or |
c |
Tuning constant that governs robustness; must be in |
parallel |
Logical. Shall parallelization be used? Default is |
num_cores |
Number of cores to be used, only relevant if |
return_polycor |
Logical. Shall the individual |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
Value
If return_polycor = TRUE, returns a list with a polychoric correlation matrix and list of "polycor" objects. If return_polycor = FALSE, then only a correlation matrix is returned.
Examples
## example data
set.seed(123)
data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100)
polycormat(data) # robust
polycormat_mle(data) # non-robust MLE
Maximum likelihood estimation of polychoric correlation matrix
Description
A useful wrapper of polycor_mle to estimate a polychoric correlation matrix via maximum likelihood by calculating all unique pairwise polychoric correlation coefficients.
Usage
polycormat_mle(
data,
parallel = FALSE,
num_cores = 1L,
return_polycor = TRUE,
variance = TRUE,
constrained = "ifneeded",
method = NULL,
maxcor = 0.999,
tol_thresholds = 0.01
)
Arguments
data |
Data matrix or |
parallel |
Logical. Shall parallelization be used? Default is |
num_cores |
Number of cores to be used, only relevant if |
return_polycor |
Logical. Shall the individual |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
constrained |
Shall strict monotonicity of thresholds be explicitly enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
Value
If return_polycor = TRUE, returns a list with a polychoric correlation matrix and list of "polycor" objects. If return_polycor = FALSE, then only a correlation matrix is returned.
Examples
## example data
set.seed(123)
data <- matrix(sample(c(1,2,3), size = 3*100, replace = TRUE), nrow = 100)
polycormat(data) # robust
polycormat_mle(data) # non-robust MLE
Robust estimation of polyserial correlation
Description
Implements the robust estimator of Welz (2025, doi:10.48550/arXiv.2510.15632) for the polyserial correlation model.
Usage
polyserial(
x,
y,
alpha = 0.5,
num_y = max(y),
constrained = "ifneeded",
method = NULL,
variance = TRUE,
weights = TRUE,
init = polyserial_initialize_param(x = x, num_y = num_y, robust = TRUE),
maxcor = 0.999,
tol_thresholds = 0.01,
tol_sigma2 = 0.01
)
Arguments
x |
Vector of numeric values. |
y |
Vector of integer-valued ordinal values. |
alpha |
Tuning constant that governs robustness-efficiency tradeoff; must be in |
num_y |
Number of response categories in y; defaults to max(y) |
constrained |
Shall parameter restructions be enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
weights |
Shall weights be returned? Default is |
init |
Initialization of numerical optimization. Default is neutral. |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
tol_sigma2 |
Minimum value of sigma2 parameter (population variance of X); only relevant in case of constrained optimization. Default is 0.01. |
Value
An object of class "polyserial", which is a list with the following components.
thetahatA vector of estimates for the polyserial correlation coefficient (
rho), population mean of X (mu), population variance of Y (sigma2), as well as thresholds fory(namedtau1,tau2,...,tau_{r-1}).stderrA vector of standard errors for each estimate in
thetahat.vcovEstimated asymptotic covariance matrix of
thetahat. The matrix\Sigmain the paper (asymptotic covariance matrix of\sqrt{N} \hat{\theta}) can be obtained viavcov * N, whereNis the sample size.pointpolyserialEstimated polyserial correlation coefficient, calculated with provided scoring of Y
weightsList of rescaled and raw outlyingness weights for each observation as well as maximum possible raw weight that was used for rescaling (
sup).objectiveValue of minimized loss function.
optimObject of class
optim.inputsList of provided inputs.
Examples
## example data
set.seed(123)
x <- rnorm(n = 100)
y <- sample(c(1,2), size = 100, replace = TRUE)
polyserial(x,y)
Efficiency of minimum DPD estimators of polyserial model
Description
Calculate population asymptotic variance-covariance matrix associated with a parameter vector theta, assuming that the polyserial model is correctly specified and that theta is the true model parameter. May take a few moments to compute because a relatively large number of integrals need to be numerically solved.
Usage
polyserial_efficiency(theta, alpha)
Arguments
theta |
Parameter vector of polyserial model; assumed to be the true one. First element is polyserial correlatio coefficient, second is the population mean of X, third is population variance of X, and the remaining elements are the thresholds associated with the ordnal Y (must be in increasing order) |
alpha |
Tuning constant governing robustness-efficiency tradeoff. Set to 0 for maximum likelihood. |
Value
A numeric matrix that is the population asymptotic variance-covariance matrix associated with a parameter vector theta and tuning constant alpha.
Examples
theta <- c(rho = 0, mu = 0, sigma2 = 1, tau1 = 0) # true parameter vector
polyserial_efficiency(theta, alpha = 0.5)
Neutral initialization of starting values for polyserial correlation
Description
Initializes starting values for numerical optimization in a neutral way.
Usage
polyserial_initialize_param(x, num_y, robust = FALSE)
Arguments
x |
Vector of numeric values |
num_y |
Number of response options of ordinal variable |
robust |
Should values of |
Value
A vector of initial values for the polyserial correlation coefficient, mu, sigma2, and Y-threshold parameters
Examples
## example data
set.seed(123)
x <- rnorm(100)
polyserial_initialize_param(x = x, num_y = 3)
Maximum likelihood estimation of polyserial correlation
Description
Implements maximum likelihood estimation of the polyserial correlation model.
Usage
polyserial_mle(
x,
y,
num_y = max(y),
constrained = "ifneeded",
method = NULL,
variance = TRUE,
init = polyserial_initialize_param(x = x, num_y = num_y, robust = FALSE),
maxcor = 0.999,
tol_thresholds = 0.01,
tol_sigma2 = 0.01
)
Arguments
x |
Vector of numeric values. |
y |
Vector of integer-valued ordinal values. |
num_y |
Number of response categories in y; defaults to max(y) |
constrained |
Shall parameter restructions be enforced by linear constraints? This can be a logical ( |
method |
Numerical optimization method, see |
variance |
Shall an estimated asymptotic covariance matrix be returned? Default is |
init |
Initialization of numerical optimization. Default is neutral. |
maxcor |
Maximum absolute correlation (to ensure numerical stability). Default is 0.999. |
tol_thresholds |
Minimum distance between consecutive thresholds (to enforce strict monotonicity); only relevant in case of constrained optimization. Default is 0.01. |
tol_sigma2 |
Minimum value of sigma2 parameter (population variance of X); only relevant in case of constrained optimization. Default is 0.01. |
Value
An object of class "polyserial", which is a list with the following components.
thetahatA vector of estimates for the polyserial correlation coefficient (
rho), population mean of X (mu), population variance of Y (sigma2), as well as thresholds fory(namedtau1,tau2,...,tau_{r-1}).stderrA vector of standard errors for each estimate in
thetahat.vcovEstimated asymptotic covariance matrix of
thetahat. The matrix\Sigmain the paper (asymptotic covariance matrix of\sqrt{N} \hat{\theta}) can be obtained viavcov * N, whereNis the sample size.pointpolyserialEstimated polyserial correlation coefficient, calculated with provided scoring of Y
objectiveValue of minimized loss function.
optimObject of class
optim.inputsList of provided inputs.
Examples
## example data
set.seed(123)
x <- rnorm(n = 100)
y <- sample(c(1,2), size = 100, replace = TRUE)
polyserial_mle(x,y)
Print method for classes "robpolycor" and "polycor".
Description
Print method for classes "robpolycor" and "polycor".
Usage
## S3 method for class 'robpolycor'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
Object of class |
digits |
Number of digits to be printed. |
... |
Additional parameters to be passed down. |
Value
A print to the console.
Examples
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
fit <- polycor(x,y)
print(fit)
fit # equivalent
Print method for classes "robpolyserial" and "polyserial".
Description
Print method for classes "robpolyserial" and "polyserial".
Usage
## S3 method for class 'robpolyserial'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x |
Object of class |
digits |
Number of digits to be printed. |
... |
Additional parameters to be passed down. |
Value
A print to the console.
Examples
set.seed(123)
x <- rnorm(100)
y <- sample(c(1,2), size = 100, replace = TRUE)
fit <- polyserial(x,y)
print(fit)
fit # equivalent
Obtain estimated asymptotic variance-covariance matrix
Description
Estimate asymptotic variance-covariance matrix of polychoric model.
Usage
## S3 method for class 'robpolycor'
vcov(object, ...)
Arguments
object |
Object of class |
... |
Additional parameters to be passed down. |
Details
Method for classes "robpolycor" and "polycor". Returns the estimated asymptotic variance-covariance matrix of a point estimate thetahat. The matrix \Sigma in the paper (asymptotic variance-covariance matrix of \sqrt{N} \hat{\theta}) can be obtained via multiplying the returned matrix by the sample size.
Value
A numeric matrix, being the estimated asymptotic covariance matrix for the model parameters
Examples
set.seed(123)
x <- sample(c(1,2,3), size = 100, replace = TRUE)
y <- sample(c(1,2,3), size = 100, replace = TRUE)
fit <- polycor(x,y)
vcov(fit)
Obtain estimated asymptotic variance-covariance matrix
Description
Estimate asymptotic variance-covariance matrix of polyserial model.
Usage
## S3 method for class 'robpolyserial'
vcov(object, ...)
Arguments
object |
Object of class |
... |
Additional parameters to be passed down. |
Details
Method for classes "robpolyserial" and "polyserial". Returns the estimated asymptotic variance-covariance matrix of a point estimate thetahat. The matrix \Sigma in the paper (asymptotic variance-covariance matrix of \sqrt{N} \hat{\theta}) can be obtained via multiplying the returned matrix by the sample size.
Value
A numeric matrix, being the estimated asymptotic covariance matrix for the model parameters
Examples
## example data
set.seed(123)
x <- rnorm(n = 100)
y <- sample(c(1,2), size = 100, replace = TRUE)
fit <- polyserial(x,y)
vcov(fit)