% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/crossval.R
\name{sgdgmf.cv}
\alias{sgdgmf.cv}
\title{Model selection via cross-validation for generalized matrix factorization models}
\usage{
sgdgmf.cv(
  Y,
  X = NULL,
  Z = NULL,
  family = gaussian(),
  ncomps = seq(from = 1, to = 10, by = 1),
  weights = NULL,
  offset = NULL,
  method = c("airwls", "newton", "sgd"),
  sampling = c("block", "coord", "rnd-block"),
  penalty = list(),
  control.init = list(),
  control.alg = list(),
  control.cv = list()
)
}
\arguments{
\item{Y}{matrix of responses (\eqn{n \times m})}

\item{X}{matrix of row fixed effects (\eqn{n \times p})}

\item{Z}{matrix of column fixed effects (\eqn{q \times m})}

\item{family}{a \code{glm} family (see \code{\link{family}} for more details)}

\item{ncomps}{ranks of the latent matrix factorization used in cross-validation (default 1 to 10)}

\item{weights}{an optional matrix of weights (\eqn{n \times m})}

\item{offset}{an optional matrix of offset values (\eqn{n \times m}), that specify a known component to be included in the linear predictor.}

\item{method}{estimation method to minimize the negative penalized log-likelihood}

\item{sampling}{sub-sampling strategy to use if \code{method = "sgd"}}

\item{penalty}{list of penalty parameters (see \code{\link{set.penalty}} for more details)}

\item{control.init}{list of control parameters for the initialization (see \code{\link{set.control.init}} for more details)}

\item{control.alg}{list of control parameters for the optimization (see \code{\link{set.control.alg}} for more details)}

\item{control.cv}{list of control parameters for the cross-validation (see \code{\link{set.control.cv}} for more details)}
}
\value{
If \code{refit = FALSE} (see \code{\link{set.control.cv}}), the function returns a list containing \code{control.init},
\code{control.alg}, \code{control.cv} and \code{summary.cv}. The latter is a matrix
collecting the cross-validation results for each combination of fold and latent
dimension.

If \code{refit = TRUE} (see \code{\link{set.control.cv}}), the function returns an object of class \code{sgdgmf},
obtained by refitting the model on the whole data matrix using the latent dimension
selected via cross-validation. The returned object also contains the \code{summary.cv}
information along with the other standard output of the \code{\link{sgdgmf.fit}} function.
}
\description{
K-fold cross-validation for generalized matrix factorization (GMF) models.
}
\details{
Cross-validation is performed by minimizing the estimated out-of-sample error, which
can be measured in terms of averaged deviance, AIC or BIC calculated on fold-specific
test sets. Within each fold, the test set is defined as a fixed proportion of entries
in the response matrix which are held out from the estimation process.
To this end, the test set entries are hidden by \code{NA} values when training the
model. Then, the predicted, i.e. imputed, values are used to compute the fold-specific
out-of-sample error.
}
\examples{
# Load the sgdGMF package
library(sgdGMF)

# Set the data dimensions
n = 100; m = 20; d = 5

# Generate data using Poisson, Binomial and Gamma models
data_pois = sim.gmf.data(n = n, m = m, ncomp = d, family = poisson())
data_bin = sim.gmf.data(n = n, m = m, ncomp = d, family = binomial())
data_gam = sim.gmf.data(n = n, m = m, ncomp = d, family = Gamma(link = "log"), dispersion = 0.25)

# Set RUN = TRUE to run the example, it may take some time. To speed up
# the computation it is possible to run CV in parallel specifying
# control.cv = list(parallel = TRUE, nthreads = <number_of_workers>)
# as an argument of sgdgmf.cv()
RUN = FALSE
if (RUN) {
  # Initialize the GMF parameters assuming 3 latent factors
  gmf_pois = sgdgmf.cv(data_pois$Y, ncomp = 1:10, family = poisson())
  gmf_bin = sgdgmf.cv(data_bin$Y, ncomp = 3, family = binomial())
  gmf_gam = sgdgmf.cv(data_gam$Y, ncomp = 3, family = Gamma(link = "log"))

  # Get the fitted values in the link and response scales
  mu_hat_pois = fitted(gmf_pois, type = "response")
  mu_hat_bin = fitted(gmf_bin, type = "response")
  mu_hat_gam = fitted(gmf_gam, type = "response")

  # Compare the results
  oldpar = par(no.readonly = TRUE)
  par(mfrow = c(1,3), mar = c(1,1,3,1))
  image(data_pois$Y, axes = FALSE, main = expression(Y[Pois]))
  image(data_pois$mu, axes = FALSE, main = expression(mu[Pois]))
  image(mu_hat_pois, axes = FALSE, main = expression(hat(mu)[Pois]))
  image(data_bin$Y, axes = FALSE, main = expression(Y[Bin]))
  image(data_bin$mu, axes = FALSE, main = expression(mu[Bin]))
  image(mu_hat_bin, axes = FALSE, main = expression(hat(mu)[Bin]))
  image(data_gam$Y, axes = FALSE, main = expression(Y[Gam]))
  image(data_gam$mu, axes = FALSE, main = expression(mu[Gam]))
  image(mu_hat_gam, axes = FALSE, main = expression(hat(mu)[Gam]))
  par(oldpar)
}

}
