# conquer

**Con**volution-type smoothed **qu**antil**e** **r**egression

## Description

The `conquer` library performs fast and accurate convolution-type smoothed quantile regression ([Fernandes, Guerre and Horta, 2021](https://www.tandfonline.com/doi/full/10.1080/07350015.2019.1660177), [He et al., 2022](https://doi.org/10.1016/j.jeconom.2021.07.010), [Tan, Wang and Zhou, 2022](https://doi.org/10.1111/rssb.12485) for low/high-dimensional estimation and bootstrap inference.

In the low-dimensional setting, efficient gradient-based methods are employed for fitting both a single model and a regression process over a quantile range. Normal-based and (multiplier) bootstrap confidence intervals for all slope coefficients are constructed. In high dimensions, the conquer methods complemented with *&ell;<sub>1</sub>*-penalization and iteratively reweighted *&ell;<sub>1</sub>*-penalization are used to fit sparse models.


## Updates

**2022-03-08**:

We are adding more flexible penalties and inference methods into the package.


**2022-02-12 (Version 1.2.2)**:

Remove the unnecessary dependent packge `caret` for a cleaner installation.

**2021-10-24 (Version 1.2.1)**:

Major updates:

1. Add a function `conquer.process` for conquer process over a quantile range.

2. Add functions `conquer.reg`, `conquer.cv.reg` for high-dimensional conquer with Lasso, SCAD and MCP penalties. The first function is called with a prescribed *&lambda;*, and the second function calibrate *&lambda;* via cross-validation. The candidates of *&lambda;* can be user-specified, or automatically generated by simulating the pivotal quantity proposed in [Belloni and Chernozhukov, 2011](https://doi.org/10.1214/10-AOS827).

Minor updates:

1. Add logistic kernel for all the functions.

2. Modify initialization using asymmetric Huber regression.

3. Default number of tightening iterations is now 3.

4. Parameters for SCAD (default = 3.7) and MCP (default = 3) are added as arguments into the functions.

## Installation

`conquer` is available on [CRAN](https://CRAN.R-project.org/package=conquer), and it can be installed into `R` environment:

```r
install.packages("conquer")
```

## Common errors or warnings

A collection of error / warning messages and their solutions:

* Error: Compilation failed for package 'conquer' (with messages involving lgfortran, clang, etc.). **Solution**: This is a compilation error of Rcpp-based source packages. It happens when we recently submit a new version to CRAN, but it usually takes 3-5 days to build the binary package. Please use an older version or patiently wait for 3-5 days and then install the updated version.

* Error: smqr.cpp: 'quantile' is not a member of 'arma’. **Solution**: 'quantile' function was added into `RcppArmadillo` version 0.9.850.1.0 (2020-02-09), so reinstalling / updating the library `RcppArmadillo` will fix this issue.

* Error: unable to load shared object.. Symbol not found: _EXTPTR_PTR. **Solution**: This issue is common in some specific versions of `R` when we load Rcpp-based libraries. It is an error in R caused by a minor change about `EXTPTR_PTR`. Upgrading R to 4.0.2 will solve the problem.

* Error: function 'Rcpp_precious_remove' not provided by package 'Rcpp'. **Solution**: This happens when a package is compiled against a recent `Rcpp` release, but users load it using an older version of `Rcpp`. Reinstalling the package `Rcpp` will solve the problem.

## Functions

There are 4 functions in this library:

* `conquer`: convolution-type smoothed quantile regression
* `conquer.process`: convolution-type smoothed quantile regression process
* `conquer.reg`: convolution-type smoothed quantile regression with regularization
* `conquer.cv.reg`: cross-validated convolution-type smoothed quantile regression with regularization

## Examples

Let us illustrate conquer by a simple example. For sample size *n = 5000* and dimension *p = 500*, we generate data from a linear model *y<sub>i</sub> = &beta;<sub>0</sub> + <x<sub>i</sub>, &beta;> + &epsilon;<sub>i</sub>*, for *i = 1, 2, ... n*. Here we set *&beta;<sub>0</sub> = 1*, *&beta;* is a *p*-dimensional vector with every entry being *1*, *x<sub>i</sub>* follows *p*-dimensional standard multivariate normal distribution (available in the library `MASS`), and *&epsilon;<sub>i</sub>* is from *t<sub>2</sub>* distribution. 

```r
library(MASS)
library(quantreg)
library(conquer)
n = 5000
p = 500
beta = rep(1, p + 1)
set.seed(2021)
X = mvrnorm(n, rep(0, p), diag(p))
err = rt(n, 2)
Y = cbind(1, X) %*% beta + err
```

Then we run both quantile regression using package `quantreg`, with a Frisch-Newton approach after preprocessing ([Portnoy and Koenker, 1997](https://projecteuclid.org/euclid.ss/1030037960)), and conquer (with Gaussian kernel) on the generated data. The quantile level *&tau;* is fixed to be *0.5*. 

```r
tau = 0.5
start = Sys.time()
fit.qr = rq(Y ~ X, tau = tau, method = "pfn")
end = Sys.time()
time.qr = as.numeric(difftime(end, start, units = "secs"))
est.qr = norm(as.numeric(fit.qr$coefficients) - beta, "2")

start = Sys.time()
fit.conquer = conquer(X, Y, tau = tau)
end = Sys.time()
time.conquer = as.numeric(difftime(end, start, units = "secs"))
est.conquer = norm(fit.conquer$coeff - beta, "2")
```

It takes 7.4 seconds to run the standard quantile regression but only 0.2 seconds to run conquer. In the meanwhile, the estimation error is 0.5186 for quantile regression and 0.4864 for conquer. For readers’ reference, these runtimes are recorded on a Macbook Pro with 2.3 GHz 8-Core Intel Core i9 processor, and 16 GB 2667 MHz DDR4 memory. We refer to [He et al., 2022](https://doi.org/10.1016/j.jeconom.2021.07.010) for a more extensive numerical study.

## Getting help

Help on the functions can be accessed by typing `?`, followed by function name at the `R` command prompt. 

For example, `?conquer` will present a detailed documentation with inputs, outputs and examples of the function `conquer`.

## License

GPL-3.0

## System requirements 

C++11

## Authors

Xuming He <xmhe@umich.edu>, Xiaoou Pan <xip024@ucsd.edu>, Kean Ming Tan <keanming@umich.edu> and Wen-Xin Zhou <wez243@ucsd.edu>

## Maintainer

Xiaoou Pan <xip024@ucsd.edu>

## References

Barzilai, J. and Borwein, J. M. (1988). Two-point step size gradient methods. *IMA J. Numer. Anal.* **8** 141-148. [Paper](https://doi.org/10.1093/imanum/8.1.141)

Belloni, A. and Chernozhukov, V. (2011) *&ell;<sub>1</sub>*-penalized quantile regression in high-dimensional sparse models. *Ann. Statist.* **39** 82-130. [Paper](https://doi.org/10.1214/10-AOS827)

Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. *Ann. Statist.* **46** 814-841. [Paper](https://projecteuclid.org/euclid.aos/1522742437)

Fernandes, M., Guerre, E. and Horta, E. (2021). Smoothing quantile regressions. *J. Bus. Econ. Statist.* **39** 338-357, [Paper](https://www.tandfonline.com/doi/full/10.1080/07350015.2019.1660177)

He, X., Pan, X., Tan, K. M., and Zhou, W.-X. (2022). Smoothed quantile regression with large-scale inference. *J. Econometrics*, to appear. [Paper](https://doi.org/10.1016/j.jeconom.2021.07.010)

Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press, Cambridge. [Book](https://www.cambridge.org/core/books/quantile-regression/C18AE7BCF3EC43C16937390D44A328B1)

Koenker, R. and Bassett, G. (1978). Regression quantiles. *Econometrica* **46** 33-50. [Paper](https://www.jstor.org/stable/1913643?seq=1#metadata_info_tab_contents)

Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators. *Statist. Sci.* **12** 279–300. [Paper](https://projecteuclid.org/euclid.ss/1030037960)

Tan, K. M., Wang, L. and Zhou, W.-X. (2022). High-dimensional quantile regression: convolution smoothing and concave regularization. *J. Roy. Statist. Soc. Ser. B* **84(1)** 205-233. [Paper](https://doi.org/10.1111/rssb.12485)

