The available metrics in cvms

Introduction

cvms has a large set of metrics for model evaluation. In this document, we list the metrics and their formulas.

Some of the metrics in the package are computed with external packages. These are listed at the bottom.

Some of the metrics are disabled by default to avoid cluttering the output tibble. These can be enabled in the metrics argument. This argument takes a list of named booleans, like list("Accuracy" = FALSE, "Weighted F1" = TRUE). This can be generated with the helper functions gaussian_metrics(), binomial_metrics(), and multinomial_metrics(). If, for instance, we only wish to calculate the RMSE metric for our regression model, we can use either list("all" = FALSE, "RMSE" = TRUE) or gaussian_metrics(all = FALSE, rmse = TRUE).

Gaussian Metrics

The metrics used to evaluate regression tasks (like linear regression):

Symbol	Denotes	Formula
\(y\)	Targets
\(\hat{y}\)	Predictions
\(\bar{y}\)	Average target
\(n\)	Number of observations
\({\scriptstyle \operatorname{IQR}(x)}\)	Interquartile Range	\({\scriptstyle \operatorname{quantile}(x, 3/4) - \operatorname{quantile}(x, 1/4)}\)
\(\lvert x \rvert\)	Absolute value of \(x\)

Metric name	Abbreviation	Formula
Root Mean Square Error	RMSE	\(\sqrt{\frac{\sum_{i=1}^{n}(\hat{y}_{i}-y_{i})^2}{n}}\)
Mean Absolute Error	MAE	\(\frac{\sum_{i=1}^{n}\lvert\hat{y}_{i}-y_{i}\rvert}{n}\)
Root Mean Square Log Error	RMSLE	\(\sqrt{\frac{\sum_{i=1}^{n}(\ln{(\hat{y}_{i}+1)}-\ln{(y_{i}+1))^2}}{n}}\)
Mean Absolute Log Error	MALE	\(\frac{\sum_{i=1}^{n}\lvert\ln{(\hat{y}_{i}+1)}-\ln{(y_{i}+1)\rvert}}{n}\)
Relative Absolute Error	RAE	\(\frac{\sum_{i=1}^{n}\lvert\hat{y}_{i}-y_{i}\rvert}{\sum_{i=1}^{n}\lvert y_{i}-\bar{y}\rvert}\)
Relative Squared Error	RSE	\(\frac{\sum_{i=1}^{n}(\hat{y}_{i}-y_{i})^2}{\sum_{i=1}^{n}(y_{i} - \bar{y})^2}\)
Root Relative Squared Error	RRSE	\({\scriptstyle \sqrt{RSE} }\)
Mean Absolute Percentage Error	MAPE	\(\frac{1}{n}\sum_{i=1}^{n} \lvert \frac{\hat{y}_{i}-y_{i}}{y_{i}} \rvert\)
Normalized RMSE (by target range)	NRMSE(RNG)	\(\frac{RMSE}{\max{y}-\min{y}}\)
Normalized RMSE (by target IQR)	NRMSE(IQR)	\(\frac{RMSE}{\operatorname{IQR}(y)}\)
Normalized RMSE (by target STD)	NRMSE(STD)	\(\frac{RMSE}{\sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(y_i-\bar{y})^2}}\)
Normalized RMSE (by target mean)	NRMSE(AVG)	\(\frac{RMSE}{\bar{y}}\)
Mean Square Error	MSE	\(\frac{\sum_{i=1}^{n}(\hat{y}_{i}-y_{i})^2}{n}\)
Total Absolute Error	TAE	\({\scriptstyle \sum_{i=1}^{n}\lvert\hat{y}_{i}-y_{i}\rvert}\)
Total Squared Error	TSE	\({\scriptstyle \sum_{i=1}^{n}(\hat{y}_{i}-y_{i})^2}\)

Binomial Metrics

The metrics used to evaluate binary classification tasks:

Based on a confusion matrix, we first count the True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN). Below, 1 is the positive class.

#>           Target
#> Prediction 0  1 
#>          0 TN FN
#>          1 FP TP

With these counts, we can calculate the following metrics. Note, that the Kappa metric normalizes the counts to percentages.

Metric name(s)	Abbreviation	Formula
Accuracy		\(\frac{TP + TN}{TP + TN + FP + FN}\)
Balanced Accuracy		\(\frac{Sensitivity + Specificity}{2}\)
Sensitivity, Recall, True Positive Rate		\(\frac{TP}{TP + FN}\)
Specificity, True Negative Rate		\(\frac{TN}{TN + FP}\)
Positive Predictive Value, Precision	Pos Pred Value	\(\frac{TP}{TP + FP}\)
Negative Predictive Value	Neg Pred Value	\(\frac{TN}{TN + FN}\)
F1 score		\(2 \cdot \frac{Pos Pred Value \cdot Sensitivity}{Pos Pred Value + Sensitivity}\)
Matthews Correlation Coefficient	MCC	\(\frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP) (TN + FN)}}\) Note: When the denominator is 0, we set it to 1 to avoid `NaN`.
Detection Rate		\(\frac{TP}{TP + FN + TN + FP}\)
Detection Prevalence		\(\frac{TP + FP}{TP + FN + TN + FP}\)
Prevalence		\(\frac{TP + FN}{TP + FN + TN + FP}\)
Threat Score		\(\frac{TP}{TP + FN + FP}\)
False Negative Rate		\({\scriptstyle 1 - Sensitivity}\)
False Positive Rate		\({\scriptstyle 1 - Specificity}\)
False Discovery Rate		\({\scriptstyle 1 - Pos Pred Value}\)
False Omission Rate		\({\scriptstyle 1 - Neg Pred Value}\)
Kappa		For Kappa, the counts (`TP`, `TN`, `FP`, `FN`) are normalized to percentages (summing to 1). Then: \({\scriptstyle p_{observed} = TP + TN}\) \({\scriptstyle p_{expected} = (TN + FP)(TN + FN) + (FN + TP)(FP + TP)}\) \(Kappa = \frac{p_{observed} - p_{expected}}{1 - p_{expected}}\)

Multinomial Metrics

We have four types of metrics for the multiclass classification evaluation:

Overall metrics simply look at whether a prediction is correct or not. Currently, cvms only has the Overall Accuracy.

The Macro/Average metrics are based on one-vs-all evaluations of each class. In a one-vs-all evaluation, we set all predictions and targets for the current class to 1 and all others to 0 ( \({\scriptstyle y_{o,c} = 1 \text{ if } y_{o} = c \text{ else } 0}\) and \({\scriptstyle \hat{y}_{o,c} = 1 \text{ if } \hat{y} _{o} = c \text{ else } 0}\) ) and perform a binomial evaluation. Once done for all classes, we average the results. Note that this is sometimes referred to as one-vs-rest, as it is the current class against the rest of the classes.

With a few exceptions (AUC and MCC), the metrics in the multinomial outputs that share their name with the binomial metrics are macro metrics. AUC and MCC instead have specific multiclass variants.

The Weighted metrics are averages, similar to the macro metrics, but weighted by the Support for each class.

Metric name	Abbreviation	Formula
Overall Accuracy		\(\frac{Correct}{Correct + Incorrect}\)
Macro Metric		\({\scriptstyle \frac{1}{\lvert C \rvert}\sum_{c}^{C} metric_{c}}\)
Support		\({\scriptstyle support_c = \lvert \{ o \in O : o=c \} \rvert \quad \forall c \in C}\) I.e., a count of the class in the target column. \(C\): the set of classes; \(O\): the observations. \(\lvert x \rvert\) denotes length of \(x\).
Weighted metric		\(\frac{\sum_{c}^{C} metric_{c} \cdot support_{c}}{\sum_{c}^{C} support_{c}}\)
Multiclass MCC	MCC	\({\scriptstyle \frac{N \operatorname{Tr}(C)-\sum_{k l} \tilde{\mathcal{C}}_{k} \hat{\mathcal{C}}_{l}}{\sqrt{N^{2}-\sum_{k l} \tilde{\mathcal{C}}_{k}\left(\hat{\mathcal{C}}^{\mathrm{T}}\right)_{l}} \sqrt{N^{2}-\sum_{k l}\left(\tilde{C}^{\mathrm{T}}\right)_{k} \hat{C}_{l}}} }\) \(N\): number of samples \(C\): a \(K \times K\) confusion matrix \(Tr(C)\): Number of correct predictions \(\tilde{\mathcal{C}}_{k}\): \(k\)th row of \(C\) ; \(\hat{C}_{l}\): \(l\)th column of \(C\) \(C^{T}\): \(C\) transposed Note: When the computation is `NaN`, we return `0`. Code was ported from scikit-learn. Gorodkin, J. (2004). Comparing two K-category assignments by a K-category correlation coefficient. Computational biology and chemistry, 28(5-6), 367-374.

External metrics

These metrics are calculated by other packages:

Metric name	Abbreviation	Package::Function	Used in
Aikake Information Criterion	AIC	stats::AIC	Shared
Corrected Aikake Information Criterion	AICc	MuMIn::AICc	Shared
Bayesian Information Criterion	BIC	stats::BIC	Shared
Aikake Information Criterion	AIC	stats::AIC	Shared
Marginal R-squared	r2m	MuMIn::r.squaredGLMM	Gaussian
Conditional R-squared	r2c	MuMIn::r.squaredGLMM	Gaussian
ROC curve	ROC	pROC::roc	Binomial
Area Under the Curve	AUC	pROC::roc	Binomial
Multiclass ROC curve	ROC	pROC::multiclass.roc	Multinomial
Multiclass Area Under the Curve	AUC	pROC::multiclass.roc	Multinomial