% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/01_UNIVARIATE_ANALYSIS.R
\name{univariate}
\alias{univariate}
\title{Univariate analysis}
\usage{
univariate(
  db,
  sc = c(NA, NaN, Inf),
  sc.method = "together",
  sc.threshold = 0.2
)
}
\arguments{
\item{db}{Data frame of risk factors supplied for univariate analysis.}

\item{sc}{Vector of special case elements. Default values are \code{c(NA, NaN, Inf)}.}

\item{sc.method}{Define how special cases will be treated, all together or in separate bins.
Possible values are \code{"together"}, \code{"separately"}.}

\item{sc.threshold}{Threshold for special cases expressed as percentage of total number of observations.
If \code{sc.method} is set to \code{"separately"}, then percentage for each special case
will be summed up.}
}
\value{
The command \code{univariate} returns the data frame with explained univariate metrics for numeric,
character, factor and logical class of risk factors.
}
\description{
\code{univariate} returns the univariate statistics for risk factors supplied in data frame \code{db}. \cr
For numeric risk factors univariate report includes:
\itemize{
\item rf: Risk factor name.
\item rf.type: Risk factor class. This metric is always equal to \code{numeric}.
\item bin.type: Bin type - special or complete cases.
\item bin: Bin type. If a \code{sc.method} argument is equal to \code{"together"}, then
\code{bin} and \code{bin.type} have the same value. If the \code{sc.method} argument
is equal to \code{"separately"}, then the \code{bin} will contain all special cases that
exist for analyzed risk factor (e.g. \code{NA}, \code{NaN}, \code{Inf}).
\item pct: Percentage of observations in each \code{bin}.
\item cnt.unique: Number of unique values per \code{bin}.
\item min: Minimum value.
\item p1, p5, p25, p50, p75, p95, p99: Percentile values.
\item avg: Mean value.
\item avg.se: Standard error of the mean.
\item max: Maximum value.
\item neg: Number of negative values.
\item pos: Number of positive values.
\item cnt.outliers: Number of outliers. Records above or below
\code{Q75}\eqn{\pm}\code{1.5 * IQR}, where \code{IQR = Q75 - Q25}.
\item sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
\code{sc.threshold} otherwise 0.
}
For categorical risk factors univariate report includes:
\itemize{
\item rf: Risk factor name.
\item rf.type: Risk factor class. This metric is equal to one of: \code{character},
\code{factor} or \code{logical}.
\item bin.type: Bin type - special or complete cases.
\item bin: Bin type. If a \code{sc.method} argument is equal to \code{"together"}, then
\code{bin} and \code{bin.type} have the same value. If the \code{sc.method} argument
is equal to \code{"separately"}, then the \code{bin} will contain all special cases that
exist for analyzed risk factor (e.g. \code{NA}, \code{NaN}, \code{Inf}).
\item pct: Percentage of observations in each \code{bin}.
\item cnt.unique: Number of unique values per \code{bin}.
\item sc.ind: Special case indicator. It takes value 1 if share of special cases exceeds
\code{sc.threshold} otherwise 0.
}
}
\examples{
suppressMessages(library(PDtoolkit))
data(gcd)
gcd$age[100:120] <- NA
gcd$age.bin <- ndr.bin(x = gcd$age, y = gcd$qual, y.type = "bina")[[2]]
gcd$age.bin <- as.factor(gcd$age.bin)
gcd$maturity.bin <- ndr.bin(x = gcd$maturity, y = gcd$qual, y.type = "bina")[[2]]
gcd$amount.bin <- ndr.bin(x = gcd$amount, y = gcd$qual, y.type = "bina")[[2]]
gcd$all.miss1 <- NaN
gcd$all.miss2 <- NA
gcd$tf <- sample(c(TRUE, FALSE), nrow(gcd), rep = TRUE)
#create date variable to confirm that it will not be processed by the function
gcd$dates <- Sys.Date()
str(gcd)
univariate(db = gcd)
}
