There are three assumptions about the process by which data become missing [1].
The process by which data become missing is random, and so missing data can be formalised from a probabilistic perspective.
The following unifies the formalisms in [1], [2], and [3].
The definitions of MCAR, MAR, and MNAR are based on the probability distribution of \(M\).
The above is summarised informally below [1].
| Assumption | You can predict \(M\) with: | 
|---|---|
| MCAR | - | 
| MAR | \(D_{obs}\) | 
| MNAR | \(D_{obs}\) and \(D_{mis}\) | 
[1] King G, Honaker J, Joseph A, Scheve K. Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation. American Political Science Review. 2001 March.
[2] Little RJA. A Test of Missing Completely at Random for Multivariate Data with Missing Values. Journal of the American Statistical Association. 1988;83(404):1198-202.
[3] Rubin DB. Inference and Missing Data. Biometrika. 1976;63(3):581-92.
[4] Joseph G Ibrahim HZ, Tang N. Model Selection Criteria for Missing-Data Problems Using the EM Algorithm. Journal of the American Statistical Asso- ciation. 2008;103(484):1648-58. PMID: 19693282.