% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/qlm_compare.R
\name{qlm_compare}
\alias{qlm_compare}
\title{Compare coded results for inter-rater reliability}
\usage{
qlm_compare(
  ...,
  by,
  level = NULL,
  tolerance = 0,
  ci = c("none", "analytic", "bootstrap"),
  bootstrap_n = 1000
)
}
\arguments{
\item{...}{Two or more data frames, \code{qlm_coded}, or \code{as_qlm_coded} objects
to compare. These represent different "raters" (e.g., different LLM runs,
different models, human coders, or human vs. LLM coding). Each object must
have a \code{.id} column and the variable specified in \code{by}. Objects should have
the same units (matching \code{.id} values). Plain data frames are automatically
converted to \code{as_qlm_coded} objects.}

\item{by}{Optional. Name of the variable(s) to compare across raters (supports
both quoted and unquoted). If \code{NULL} (default), all coded variables are
compared. Can be a single variable (\code{by = sentiment}), a character vector
(\code{by = c("sentiment", "rating")}), or NULL to process all variables.}

\item{level}{Optional. Measurement level(s) for the variable(s). Can be:
\itemize{
\item \code{NULL} (default): Auto-detect from codebook
\item Character scalar: Use same level for all variables
\item Named list: Specify level for each variable
}
Valid levels are \code{"nominal"}, \code{"ordinal"}, \code{"interval"}, or \code{"ratio"}.}

\item{tolerance}{Numeric. Tolerance for agreement with numeric data.
Default is 0 (exact agreement required). Used for percent agreement calculation.}

\item{ci}{Confidence interval method:
\describe{
\item{\code{"none"}}{No confidence intervals (default)}
\item{\code{"analytic"}}{Analytic CIs where available (ICC, Pearson's r)}
\item{\code{"bootstrap"}}{Bootstrap CIs for all metrics via resampling}
}}

\item{bootstrap_n}{Number of bootstrap resamples when \code{ci = "bootstrap"}.
Default is 1000. Ignored when \code{ci} is \code{"none"} or \code{"analytic"}.}
}
\value{
A \code{qlm_comparison} object (a tibble/data frame) with the following columns:
\describe{
\item{\code{variable}}{Name of the compared variable}
\item{\code{level}}{Measurement level used}
\item{\code{measure}}{Name of the reliability metric}
\item{\code{value}}{Computed value of the metric}
\item{\code{rater1}, \code{rater2}, ...}{Names of the compared objects (one column per rater)}
\item{\code{ci_lower}}{Lower bound of confidence interval (only if \code{ci != "none"})}
\item{\code{ci_upper}}{Upper bound of confidence interval (only if \code{ci != "none"})}
}
The object has class \code{c("qlm_comparison", "tbl_df", "tbl", "data.frame")} and
attributes containing metadata (\code{raters}, \code{n}, \code{call}).

\strong{Metrics computed by measurement level:}
\itemize{
\item \strong{Nominal:} alpha_nominal, kappa (Cohen's/Fleiss'), percent_agreement
\item \strong{Ordinal:} alpha_ordinal, kappa_weighted (2 raters only), w (Kendall's W),
rho (Spearman's), percent_agreement
\item \strong{Interval/Ratio:} alpha_interval/alpha_ratio, icc, r (Pearson's),
percent_agreement
}

\strong{Confidence intervals:}
\itemize{
\item \code{ci = "analytic"}: Provides analytic CIs for ICC and Pearson's r only
\item \code{ci = "bootstrap"}: Provides bootstrap CIs for all metrics via resampling
}
}
\description{
Compares two or more data frames or \code{qlm_coded} objects to assess inter-rater
reliability or agreement. This function extracts a specified variable from
each object and computes reliability statistics using the irr package.
}
\details{
The function merges the coded objects by their \code{.id} column and only includes
units that are present in all objects. Missing values in any rater will
exclude that unit from analysis.

\strong{Measurement levels and statistics:}
\itemize{
\item \strong{Nominal}: For unordered categories. Computes Krippendorff's alpha,
Cohen's/Fleiss' kappa, and percent agreement.
\item \strong{Ordinal}: For ordered categories. Computes Krippendorff's alpha (ordinal),
weighted kappa (2 raters only), Kendall's W, Spearman's rho, and percent
agreement.
\item \strong{Interval}: For continuous data with meaningful intervals. Computes
Krippendorff's alpha (interval), ICC, Pearson's r, and percent agreement.
\item \strong{Ratio}: For continuous data with a true zero point. Computes the same
measures as interval level, but Krippendorff's alpha uses the ratio-level
formula which accounts for proportional differences.
}

Kendall's W, ICC, and percent agreement are computed using all raters
simultaneously. For 3 or more raters, Spearman's rho and Pearson's r are
computed as the mean of all pairwise correlations between raters.
}
\examples{
# Load example coded objects
examples <- readRDS(system.file("extdata", "example_objects.rds", package = "quallmer"))

# Compare two coding runs
comparison <- qlm_compare(
  examples$example_coded_sentiment,
  examples$example_coded_mini,
  by = "sentiment",
  level = "nominal"
)
print(comparison)

# Compare specific variables with explicit levels
qlm_compare(
  examples$example_coded_sentiment,
  examples$example_coded_mini,
  by = "sentiment"
)

}
\seealso{
\code{\link[=qlm_validate]{qlm_validate()}} for validation of coding against gold standards,
\code{\link[=qlm_code]{qlm_code()}} for LLM coding, \code{\link[=as_qlm_coded]{as_qlm_coded()}} for human coding.
}
