% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dig_contrasts.R
\name{dig_contrasts}
\alias{dig_contrasts}
\title{Search for paired contrast patterns}
\usage{
dig_contrasts(
  x,
  condition = where(is.logical),
  xvars = where(is.numeric),
  yvars = where(is.numeric),
  method = "t",
  alternative = "two.sided",
  min_length = 0L,
  max_length = Inf,
  min_support = 0,
  max_p_value = 0.05,
  threads = 1,
  ...
)
}
\arguments{
\item{x}{a matrix or data frame with data to search in.}

\item{condition}{a tidyselect expression (see
\href{https://tidyselect.r-lib.org/articles/syntax.html}{tidyselect syntax})
specifying the columns to use as condition predicates}

\item{xvars}{a tidyselect expression (see
\href{https://tidyselect.r-lib.org/articles/syntax.html}{tidyselect syntax})
specifying the columns to use for computation of contrasts}

\item{yvars}{a tidyselect expression (see
\href{https://tidyselect.r-lib.org/articles/syntax.html}{tidyselect syntax})
specifying the columns to use for computation of contrasts}

\item{method}{a character string indicating which contrast to compute.
One of \code{"t"}, \code{"wilcox"}, or \code{"var"}. \code{"t"} (resp. \code{"wilcos"}) compute
a parametric (resp. non-parametric) test on equality in position, and
\code{"var"} performs the F-test on equality of variance.}

\item{alternative}{indicates the alternative hypothesis and must be one of
\code{"two.sided"}, \code{"greater"} or \code{"less"}. \code{"greater"} corresponds to
positive association, \code{"less"} to negative association.}

\item{min_length}{the minimum size (the minimum number of predicates) of the
condition to be generated (must be greater or equal to 0). If 0, the empty
condition is generated in the first place.}

\item{max_length}{The maximum size (the maximum number of predicates) of the
condition to be generated. If equal to Inf, the maximum length of conditions
is limited only by the number of available predicates.}

\item{min_support}{the minimum support of a condition to trigger the callback
function for it. The support of the condition is the relative frequency
of the condition in the dataset \code{x}. For logical data, it equals to the
relative frequency of rows such that all condition predicates are TRUE on it.
For numerical (double) input, the support is computed as the mean (over all
rows) of multiplications of predicate values.}

\item{max_p_value}{the maximum p-value of a test for the pattern to be considered
significant. If the p-value of the test is greater than \code{max_p_value}, the
pattern is not included in the result.}

\item{threads}{the number of threads to use for parallel computation.}

\item{...}{Further arguments passed to the underlying test function
(\code{\link[=t.test]{t.test()}}, \code{\link[=wilcox.test]{wilcox.test()}}, or \code{\link[=var.test]{var.test()}} accordingly to the
selected method).}
}
\value{
A tibble with found patterns in rows. The following columns are always
present:
\item{condition}{the condition of the pattern as a character string
in the form \code{{p1 & p2 & ... & pn}} where \code{p1}, \code{p2}, ..., \code{pn} are
\code{x}'s column names.}
\item{support}{the support of the condition, i.e., the relative
frequency of the condition in the dataset \code{x}.}
\item{xvar}{the name of the first variable in the contrast.}
\item{yvar}{the name of the second variable in the contrast.}
\item{p_value}{the p-value of the underlying test.}
\item{rows}{the number of rows in the sub-data corresponding to
the condition.}
\item{alternative}{a character string indicating the alternative
hypothesis.}
\item{method}{a character string indicating the method used for the
test.}
For the \code{"t"} method, the following additional columns are also
present (see also \code{\link[=t.test]{t.test()}}):
\item{estimate_x}{the estimated mean of variable \code{xvar}.}
\item{estimate_y}{the estimated mean of variable \code{yvar}.}
\item{t_statistic}{the t-statistic of the t test.}
\item{df}{the degrees of freedom of the t test.}
\item{conf_int_lo}{the lower bound of the confidence interval.}
\item{conf_int_hi}{the upper bound of the confidence interval.}
\item{stderr}{the standard error of the mean difference.}
For the \code{"wilcox"} method, the following additional columns are also
present (see also \code{\link[=wilcox.test]{wilcox.test()}}):
\item{estimate}{the estimate of the location parameter.}
\item{W_statistic}{the Wilcoxon rank sum statistic.}
\item{conf_int_lo}{the lower bound of the confidence interval.}
\item{conf_int_hi}{the upper bound of the confidence interval.}
For the \code{"var"} method, the following additional columns are also
present (see also \code{\link[=var.test]{var.test()}}):
\item{estimate}{the ratio of the sample variances of variables
\code{xvar} and \code{yvar}.}
\item{F_statistic}{the value of the F test statistic.}
\item{df1}{the numerator degrees of freedom.}
\item{df2}{the denominator degrees of freedom.}
\item{conf_int_lo}{the lower bound of the confidence interval for the
ratio of the population variances.}
\item{conf_int_hi}{the upper bound of the confidence interval for the
ratio of the population variances.}
}
\description{
\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#deprecated}{\figure{lifecycle-deprecated.svg}{options: alt='[Deprecated]'}}}{\strong{[Deprecated]}}

Contrast patterns are a generalization of association rules that allow
for the specification of a condition under which there is a significant
difference in some statistical feature between two numeric variables.

\describe{
\item{Scheme:}{\verb{theta(xvar) >> theta(yvar) | C}\cr\cr
The feature \code{theta} of the first variable \code{xvar} is significantly higher
than the feature \code{theta} of the second variable \code{yvar} under the
condition \code{C}.}
\item{Example:}{\verb{mean(daily_ice_cream_income) >> mean(daily_tea_income) | sunny}\cr\cr
The \emph{mean} of \emph{daily ice-cream income} is significantly higher than
the \emph{mean} of \emph{daily tea income} under the condition of \emph{sunny weather}.}
}

The contrast is computed using
a statistical test, which is specified by the \code{method} argument. The
function computes the contrast between all pairs of variables, where the
first variable is specified by the \code{xvars} argument and the second variable
is specified by the \code{yvars} argument. The contrast is computed in sub-data
corresponding to conditions generated from the \code{condition} columns. The
\code{dig_contrasts()} function supports crisp conditions only, i.e., the
condition columns must be logical.
}
\examples{
crispCO2 <- partition(CO2, Plant:Treatment)
dig_contrasts(crispCO2,
             condition = where(is.logical),
             xvars = conc,
             yvars = uptake,
             method = "t",
             min_support = 0.1)
}
\seealso{
\code{\link[=dig]{dig()}}, \code{\link[=dig_grid]{dig_grid()}}, \code{\link[stats:t.test]{stats::t.test()}}, \code{\link[stats:wilcox.test]{stats::wilcox.test()}}, \code{\link[stats:var.test]{stats::var.test()}}
}
\author{
Michal Burda
}
\keyword{internal}
