% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mixEMM.r
\name{mixEMM}
\alias{mixEMM}
\title{A mixed-effects model for analyzing cluster-level non-ignorable missing data}
\usage{
mixEMM(Ym, Xm, Zm, gamma, maxIter = 100, tol = 0.001)
}
\arguments{
\item{Ym}{is an N by p outcome data from N clusters/batches/experiments; p is the number of samples within each cluster.
The first sample within each cluster is assumed to be a reference sample with different error variance. 
Missing values are coded as NAs.}

\item{Xm}{is a covariate array of dimension N by k by p, where k is the number of covariates.}

\item{Zm}{is a design array for random-effects, with a dimension of N by h by p, where h is the number of variables with random effects.}

\item{gamma}{is the parameter for the missing-data mechanism. The missingness of the outcome in cluster i 
depends on the mean of the outcome. The missing probability is modelled as exp(-gamma0 - gamma*mean(y)). The parameter gamma can 
be estimated by borrowing information across outcomes and finding the common missing-data patterns
in the high-dimensional data. For example, by estimating the relationship
the observed average value of \eqn{\bar\mathbf{y}_{i}} and the missing rate, or the parameter can be
selected by the log-likelihood profile (see the Reference).
If gamma = 0, the missingness is ignorable. The parameter gamma0 does not affect the estimation of the EM algorithm,
and is mostly determined by the missing rate. So it is set as 0 in the estimation here.}

\item{maxIter}{the maximum number of iterations in the estimation of the EM algorithm.}

\item{tol}{the tolerance level for the absolute change in the observed-data log-likelihood function.}
}
\value{
A list containing
\item{alpha.hat}{the estimated fixed-effects.}
\item{alpha.se}{the standard errors for the estimated fixed-effects.}
\item{sigma0.hat, sigma2.hat}{the estimated sample error variances. It returns 
 the variances for the first (reference) sample and the other samples within each cluster/batch.}
\item{D}{the estimated covariance matrix for the random-effects.}
\item{RE}{the estimated random-effects.}
\item{loglikelihood}{the observed-data log-likelihood values.}
}
\description{
This function fits a mixed-effects model for clustered data with cluster-level missing values in the outcome.
}
\details{
The model consists of two parts, the outcome model and the missing-data model. The outcome model 
is a mixed-effects model,
\deqn{\mathbf{y}_{i} =  \mathbf{X}_{i}\boldsymbol{\alpha}+\mathbf{Z}_{i}\boldsymbol{b}_{i}+\mathbf{e}_{i},}
where \eqn{\mathbf{y}_{i}} is the outcome for the i-th cluster, \eqn{\mathbf{X}_{i}} is the covariate matrix,
\eqn{\boldsymbol{\alpha}} is the fixed-effects, \eqn{\mathbf{Z}_{i}} is the design matrix for
the random-effects \eqn{\mathbf{b}_i}, and \eqn{\mathbf{e}_{i}} is the error term.

The non-ignorable batch-level (or cluster-level) abundance-dependent missing-data model (BADMM) can be written as
\deqn{\textrm{Pr}\left(M_{i}=1|\mathbf{y}_{i}\right)= \mathrm{exp}\left(-\gamma_{0} - \gamma \bar\mathbf{y}_{i}
\right),}
where \eqn{M_{i}} is the missing indicator for the i-th cluster, and \eqn{\bar\mathbf{y}_{i}} is the average of \eqn{\mathbf{y}_{i}}. 
If \eqn{M_{i}=1}, the outcome of the i-th cluster  
\eqn{\mathbf{y}_{i}} would be missing altogether. 
The estimation of the mixEMM model is implemented via an ECM algorithm. If \eqn{\gamma \neq 0}, i.e., 
the missingness depends on the outcome, the missing-data mechanism is missing not at random (MNAR), 
otherwise it is missing completely at random (MCAR) for the current model. The parameter \eqn{\gamma} can 
be estimated by borrowing information across outcomes and finding the common missing-data patterns
in the high-dimensional data. For example, by estimating the relationship
the observed average value of \eqn{\bar\mathbf{y}_{i}} and the missing rate, or the parameter can be
selected by the log-likelihood profile (see the Reference).
}
\examples{
data(sim_dat)

Z = sim_dat$X[, 1, , drop = FALSE]
fit0 = mixEMM(Ym = sim_dat$Ym, Xm = sim_dat$X, Zm = Z, gamma = 0.14)
}
\references{
Chen, L. S., Wang, J., Wang, X., & Wang, P. (2017). A mixed-effects model for incomplete data from 
labeling-based quantitative proteomics experiments. The Annals of Applied Statistics, 11(1), 114-138. \doi{10.1214/16-AOAS994}
}
