The Coxmos R package includes the following functionalities for High-Dimensional datasets:
Model fitting and optimization for High-Dimensional Data: The user can either select the parameters (such as the penalty value, number of components in the latent space…) for survival models in high-dimensional datasets or use the package’s tools to estimate the best values for these parameters.
Comparing Survival Models: After obtaining multiple survival models, the user can compare them to determine which one gives the best results. The package includes several functions for comparing the models.
Interpreting Results: After selecting the best model or models, the user can interpret the results. The package includes several functions for understanding the impact of the original variables on survival prediction.
Predicting Survival for New Patients: Finally, if a new dataset of patients is available, the user can use the estimated model to make predictions for the new patients. In addition, Coxmos allows for the comparison of variables to identify which ones contribute most to the prediction.
Coxmos stable version can be installed from CRAN repository:
C:1BfP43083d9d5ab9-pipeline.R
The development version can be installed from GitHub:
install.packages("devtools")
devtools::install_github("BiostatOmics/Coxmos", build_vignettes = TRUE)
C:1BfP43083d9d5ab9-pipeline.R
To run the analyses in this vignette, first load
Coxmos
:
C:1BfP43083d9d5ab9-pipeline.R
To generate plots, we will make use of the RColorConesa
R package. After installing it:
C:1BfP43083d9d5ab9-pipeline.R
The Coxmos pipeline requires two objects as input. One should be a
matrix of predictors X
, and the other a matrix with two
columns called time and event for
survival information Y
.
The data example contains protein levels as predictors and can be loaded as follows:
### load dataset
data("X_proteomic")
data("Y_proteomic")
X <- X_proteomic
Y <- Y_proteomic
rm(X_proteomic, Y_proteomic)
C:1BfP43083d9d5ab9-pipeline.R
X and Y are two
data.frame
objects. X is related to the
explanatory variables. Rows are observations and columns are the
different variables of interest. For Y matrix, rows are
the observations with the same row names as X, and it
has to have two columns. The first one is called time
and
the second one, event
. time
refers to the
follow-up time until the event or until the last control if the
observation is a right censored observation. The event could be
TRUE/FALSE
or 1/0
for those observations that
have reached the event or are censored. An example could be the shown
below:
7529 | 7531 | 7534 | 1978 | 7158 | |
---|---|---|---|---|---|
TCGA-A2-A0SV-01A | 0.025273 | -0.030661 | 0.458350 | 1.369000 | -0.40781 |
TCGA-A2-A0YT-01A | 0.348210 | 0.347870 | 0.891570 | -0.188540 | -0.13896 |
TCGA-BH-A1F0-01A | 0.132340 | -0.348210 | 0.078923 | 0.227890 | -0.49063 |
TCGA-B6-A0IK-01A | 0.352670 | 0.376670 | -0.285750 | -0.380980 | -1.43080 |
TCGA-E2-A1LE-01A | 0.110140 | 0.094990 | -0.159960 | -0.098148 | 0.40160 |
time | event | |
---|---|---|
TCGA-A2-A0SV-01A | 825 | TRUE |
TCGA-A2-A0YT-01A | 723 | TRUE |
TCGA-BH-A1F0-01A | 785 | TRUE |
TCGA-B6-A0IK-01A | 571 | TRUE |
TCGA-E2-A1LE-01A | 879 | TRUE |
C:1BfP43083d9d5ab9-pipeline.R
The dimensions of each object are:
X |
---|
150 |
369 |
Y |
---|
150 |
2 |
C:1BfP43083d9d5ab9-pipeline.R
Before performing any survival analysis, it is relevant to understand the structure of the time-to-event data. A key preliminary step is examining how events (e.g., deaths, failures) and censored observations (e.g., lost to follow-up, study termination) are distributed across the dataset, as this exploration helps identify potential biases, assess data quality, and guide appropriate modeling choices.
The plot_events()
function in this package provides a
straightforward way to visualize this distribution. It displays event
occurrences and censoring times, allowing users to:
This visualization serves as an exploratory tool, ensuring that subsequent survival analysis rests on reliable data patterns. The function’s output can highlight the need for extended follow-up or careful interpretation of results in contexts with heavy censoring.
The plot_events()
function requires the Y
matrix and optionally accepts the following arguments among others:
categories
: the names of each category (a character
vector of length two).y.text
: the label for the y-axis.roundTo
: the smallest numeric unit to which break times
will be rounded. For example, if set to 0.25, a value of 1.32 will be
rounded to 1.25.max.breaks
: the number of breaks for the
histogram.ggp_density.event <- plot_events(Y = Y,
categories = c("Censored","Event"),
y.text = "Number of observations",
roundTo = 0.5,
max.breaks = 15,
txt.x.angle = 90)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Before fitting survival models, it is important to ensure that there is sufficient statistical power to avoid overfitting. The Events Per Variable (EPV) metric supports this evaluation by indicating how many outcome events are available per predictor in the model.
According to literature, the higher values the better. As a recommendation, values greater than 10 should be used to obtain models with good predictions, but this value should not be lower than 5.
A low EPV value (typically <10) increases the risk of:
The getEPV()
function calculates and reports the EPV for
a given dataset. This helps to:
Additionally, working with low EPV values often indicates the presence of high-dimensional datasets, where the number of predictors is large relative to the number of events. In such cases, it is advisable to use survival models specifically designed to handle high-dimensional settings, such as Coxmos methods.
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
In this case, the EPV has a value of 0.195122, which indicates that the number of predictors is high relative to the number of observed events. This makes it unreliable to apply classical survival algorithms, as they are prone to overfitting under these conditions.
When working with datasets (i.e., X
matrices) that
include qualitative or categorical variables, proper encoding is
essential prior to applying dimensionality reduction techniques such as
Principal Component Analysis (PCA) or Partial Least Squares (PLS). Most
of these algorithms require purely numerical input, so categorical
variables (e.g., sex = {Male, Female}
,
tumor_stage = {I, II, III}
) must be transformed into a
suitable format that retains their informational content without
introducing artificial ordinality.
The factorToBinary()
function automates the
transformation of categorical variables into dummy variables using
one-hot encoding. This process addresses several important issues:
Avoiding misinterpretation: Prevents algorithms
from mistakenly interpreting categorical values as ordinal or numeric
(e.g., assigning sex Male = 1
, Female = 2
,
etc.).
Compatibility: Ensures that categorical features are represented numerically, which is required by linear algebra-based techniques such as PCA or PLS.
Dimensionality control: Allows flexibility in the encoding format:
K
dummy variables (one for each category) for full
representation.K - 1
dummy variables to avoid perfect
multicollinearity in regression-based models.Easy integration: The resulting matrix preserves
original column names with appended suffixes (e.g.,
sex_Male
, tumor_stage_II
), enabling
straightforward traceability in downstream processes.
It accepts the following arguments:
X
: numeric matrix or data frame. Only qualitative
variables (class factor) will be transformed into binary
variables.
all
: logical. If TRUE
, one binary
variable will be generated for each level of the factor (k
dummies). If FALSE
, only k - 1
variables will
be returned, using the first level as the reference category (default:
TRUE
).
sep
: character. Symbol used to concatenate the
original variable name and the level name in the new column names. For
example, if the variable is "sex"
and
sep = "_"
, the resulting dummy variables will be named
"sex_male"
, "sex_female"
, etc.
This function ensures compatibility with numerical algorithms while keeping variable names interpretable and traceable.
C:1BfP43083d9d5ab9-pipeline.R
Proper validation of survival models requires thoughtful partitioning
of the dataset into training and test subsets. The
getTrainTest()
function provides a consistent and
event-aware approach to splitting survival data, ensuring that the
characteristics of time-to-event outcomes are preserved during
validation.
This function performs event-balanced splits by default, maintaining a similar proportion of events and censored cases across both training and test sets. This is essential to avoid biased performance estimates and to ensure that the test set contains a sufficient number of events for evaluation.
In datasets composed of multiple data blocks (e.g., omics and
clinical data), getTrainTest()
synchronizes the
partitioning across all blocks, preserving sample alignment and ensuring
integrity during downstream analysis. Whether working with simple
matrices or multiblock structures, the function guarantees that each
block is split consistently.
It also supports flexible validation schemes. By setting the
times
argument, users can perform a single data split
(times = 1
) or repeated random splits for cross-validation
(times > 1
), allowing for evaluation of model stability
and robustness across different partitions. A fixed random seed ensures
reproducibility of the results.
split_data <- getTrainTest(X, Y, p = 0.7)
X_train <- split_data$X_train #106x369
Y_train <- split_data$Y_train
X_test <- split_data$X_test #44x369
Y_test <- split_data$Y_test
C:1BfP43083d9d5ab9-pipeline.R
This splitting strategy is especially relevant in survival analysis,
where event sparsity and censoring can lead to unreliable validation if
not properly addressed. By incorporating stratified sampling and support
for complex data structures, getTrainTest()
serves as a
reliable bridge between preprocessing and model development. It enables
performance assessment that reflects real-world behavior while
preserving the temporal dynamics of time-to-event data.
After loading the data, users may wish to perform a survival analysis to examine the relationship between explanatory variables and time-to-event outcomes. However, the traditional Cox proportional hazards (Cox-PH) regression model is only applicable to low-dimensional datasets. To address this limitation, we provide a set of functions that combine sparse Partial Least Squares (sPLS) techniques with Cox regression to enable the analysis of high-dimensional datasets.
Coxmos includes the following methodologies for handling both low- and high-dimensional data:
More information on each approach can be found in the help documentation for the corresponding function.
cox()
,
coxSW()
, and coxEN()
.splsicox()
,
splsdrcox()
, and splsdacox()
.However, an easier and unified way to run all models is provided via
the main function coxmos()
, which allows execution of any
method by specifying the desired algorithm through the
method
parameter.
To perform a survival analysis using our example dataset, we will focus on the methods designed for high-dimensional data, particularly those based on sPLS and Cox Elastic Net techniques.
We have already mentioned that we need to work with a
high-dimensional methodology. However, if we try to run a classical
analysis with one of our functions, such as a Cox analysis using the
entire matrix X, we will get an error due to the specification of the
MIN_EPV
parameter.
This parameter (set by default to 5) establishes a minimum ratio of variables that should be included in a survival model according to the number of patients who experience the event. The formula is \(\frac{\text{### events}}{\text{### variables in survival model}}\).
If we perform a classical cox survival analysis we will obtain an error.
cox_model <- coxmos(method = "cox",
X = X_train, Y = Y_train,
x.center = T, x.scale = F,
MIN_EPV = 5)
C:1BfP43083d9d5ab9-pipeline.R
Despite specifying an EPV of 5, we should have a maximum of 10 variables in our final survival model in relation to the number of events found in our training set.
Once the problem has been demonstrated, we will proceed to perform the methods that allow us to work with high dimensional datasets. To do this, it will not only be necessary to launch the methods themselves that calculate the survival model, but also a cross-validation to estimate the optimal values for each methodology.
In addition, Coxmos also includes a stepwise
selection algorithm adapted for survival models, known as
coxSW
. This method allows users to automatically select the
optimal subset of variables based on a predefined EPV threshold.
This strategy is particularly useful in small to medium-sized
datasets, where controlling model complexity is essential to avoid
overfitting. The algorithm performs an internal forward-backward
selection process, iteratively adding or removing variables to find a
combination that respects the EPV constraint, set by default to 5 but
customizable via the MIN_EPV
argument.
coxSW_model <- coxmos(method = "coxSW",
X = X_train, Y = Y_train,
x.center = T, x.scale = F,
MIN_EPV = 5)
#> Final Model Reached!
C:1BfP43083d9d5ab9-pipeline.R
coxSW_model
#> The method used is coxSW.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> var_840 -1.6945403 0.1836837 0.3666049 0.4247067 -3.989907 6.609913e-05
#> var_115004 1.7095074 5.5262383 0.9298226 0.8679379 1.969620 4.888199e-02
#> var_6464 1.6712799 5.3189712 0.6128604 0.6706816 2.491913 1.270573e-02
#> var_8408 1.5996913 4.9515035 0.5239328 0.5036659 3.176096 1.492715e-03
#> var_7849 0.8613763 2.3664154 0.5004561 0.4826959 1.784511 7.434063e-02
#> var_6790 2.5574714 12.9031485 0.5460080 0.5552463 4.606013 4.104631e-06
#> var_8660 -1.8350215 0.1596101 0.4834143 0.4794525 -3.827327 1.295425e-04
#> var_5159 1.1925233 3.2953859 0.3061701 0.2230829 5.345652 9.009238e-08
#> var_581 2.0133191 7.4881302 0.7725757 0.8125503 2.477778 1.322034e-02
#> var_2908 -0.8365087 0.4332204 0.3306509 0.2869672 -2.914997 3.556914e-03
C:1BfP43083d9d5ab9-pipeline.R
Although this method is robust and simple to implement, it becomes impractical for large datasets due to its high computational cost. Therefore, it is recommended primarily for focused studies with a moderate number of predictors.
In order to perform survival analysis with our high-dimensional data, we have implemented a series of methods that utilize techniques such as Cox Elastic Net, to select a lower number of features applying a penalty or sPLS methodology in order to reduce the dimensionality of the input data.
To evaluate the performance of these methods, we have implemented
cross-validation, which allows us to estimate the optimal parameters for
future predictions based on prediction metrics such as:
AIC, C-INDEX, I.BRIER
and Area Under the Curve (AUC). By default AUC metric
is used with the "cenROC"
evaluator as it has provided the
best results in our tests. However, multiple AUC evaluators could be
used:
"risksetROC", "survivalROC", "cenROC", "nsROC", "smoothROCtime_C"
and "smoothROCtime_I"
.
Furthermore, a mix of multiple metrics could be used to obtain the optimal model. The user has to establish different weights for each metric and all of them will be considered to compute the optimal model (the total weight must sum 1).
In addition, we have included options for normalizing data, filtering
variables, and setting the minimum EPV
, as well as specific
parameters for each method, such as the alpha value for Cox Elastic Net
and the number of components for sPLS models. Overall, our
cross-validation methodology allows us to effectively analyze
high-dimensional survival data and optimize our model parameters.
Cox Elastic Net is based on the glmnet R package for survival analysis. However, the structure of the object and the way the analysis is performed has been updated using our cross-validation methodology to estimate the optimal parameters for future predictions.
During the cross-validation, we will iterate over the different alpha
values between 0
and 1
(lasso and ridge
regression). We can also select the maximum number of variables we want
to select. In this case, we are not going to fix a specific value and
the algorithm will update the value to a new one that meets the
requirements for the EPV restriction.
NOTE: When a penalty value of 0 is selected, full ridge penalty is used. In this case, any EPV or max.variables restriction is used and all variables could be selected. For high dimensional datasets, values between (0, 1] are recommended.
cv.coxen_res <- cv.coxmos(method = "coxEN",
X = X_train, Y = Y_train,
EN.alpha.list = c(0.1, 0.5, 0.9),
n_run = 2, k_folds = 5)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Cross-validation will be omitted for time consumption, but once the best model is computed to user could see how many variables and which penalty has been selected as the best. Once the cross-validation study is performed, the user must compute a survival model specifying the parameters. The user could select the values by the cross-validation object, but in this case we are going to select them by hand.
coxen_model <- coxmos(method = "coxEN",
X = X_train, Y = Y_train,
EN.alpha = 0.5, #cv.coxen_res$opt.EN.alpha,
max.variables = 8) #cv.coxen_res$opt.nvar
C:1BfP43083d9d5ab9-pipeline.R
coxen_model
#> The method used is coxEN.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> var_840 -0.70459296 0.4943097 0.3700628 0.3623640 -1.94443433 0.05184310
#> var_3897 0.14405821 1.1549513 0.1349672 0.1356156 1.06225378 0.28812049
#> var_83481 0.10505359 1.1107701 0.1288665 0.1232818 0.85214192 0.39413533
#> var_2194 -0.15920751 0.8528194 0.1443478 0.1327259 -1.19952114 0.23032537
#> var_2625 -0.24789765 0.7804398 0.1259469 0.1144574 -2.16585117 0.03032256
#> var_3620 -0.42535528 0.6535376 0.3617911 0.2841891 -1.49673321 0.13446269
#> var_4629 -0.25749311 0.7729870 0.1644677 0.1563013 -1.64741554 0.09947266
#> var_7535 -0.01031489 0.9897381 0.2983250 0.2676666 -0.03853632 0.96926008
C:1BfP43083d9d5ab9-pipeline.R
After obtaining the model, if we print it we could see the cox model
with all the variables that have been selected. However, some of the
selected variables have not been significant. If we wish to calculate a
model where all selected variables are significant, we should update the
remove_non_significant
parameter to TRUE
.
coxen_model <- coxmos(method = "coxEN",
X = X_train, Y = Y_train,
EN.alpha = 0.5, #cv.coxen_res$opt.EN.alpha
max.variables = 8, #cv.coxen_res$opt.nvar
remove_non_significant = T, alpha = 0.05)
C:1BfP43083d9d5ab9-pipeline.R
coxen_model
#> The method used is coxEN.
#> A total of 5 variables have been removed due to non-significance filter inside cox model.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> var_840 -1.0335236 0.3557512 0.2982663 0.2818888 -3.666423 0.0002459674
#> var_2625 -0.2526926 0.7767066 0.1155368 0.1051668 -2.402779 0.0162710096
#> var_4629 -0.3361206 0.7145369 0.1372981 0.1454396 -2.311066 0.0208291919
C:1BfP43083d9d5ab9-pipeline.R
After this change, a total of 3 variables have been selected. And a
total of 5 have been removed due to non-significance. To check which
variables have been removed, we can check this using the parameter
nsv
(no-significant variables).
C:1BfP43083d9d5ab9-pipeline.R
Similarly, cross-validation will be conducted for the sPLS-based models, beginning with the sPLS-ICOX approach. In this case, the internal weighting mechanism of the sPLS model for computing the components of the \(X\) matrix is based on univariate Cox models. For variable selection, a penalty is applied to the individual Cox models, selecting variables with p-values lower than \(1 - \text{penalty}\). After computing the weight vector, this step allows to reduce the dimensionality of the dataset, enabling the subsequent fitting of a Cox model using the latent variables (or principal components) derived from the sPLS model.
cv.splsicox_res <- cv.coxmos(method = "splsicox",
X = X_train, Y = Y_train,
max.ncomp = 2, penalty.list = c(0.5, 0.9),
n_run = 2, k_folds = 5)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
If you want to see the evolution of your metric, you can access to the plots inside the cross-validation object.
C:1BfP43083d9d5ab9-pipeline.R
We will generate a sPLS-ICOX model with the optimal number of principal components and its penalty based on the results obtained from the cross-validation.
splsicox_model <- coxmos(method = "splsicox",
X = X_train, Y = Y_train,
n.comp = 2, #cv.splsicox_res$opt.comp,
penalty = 0.9) #cv.splsicox_res$opt.penalty
splsicox_model
#> The method used is sPLS-ICOX.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> comp_1 0.7882958 2.199644 0.1669181 0.1989155 3.962968 7.402366e-05
#> comp_2 1.2413073 3.460134 0.3495130 0.3246738 3.823245 1.317069e-04
C:1BfP43083d9d5ab9-pipeline.R
The sPLS-DRCOX method presents a different strategy for reducing the dataset compared to sPLS-ICOX. In this case, deviance residuals (DR) are used as the response vector in an sPLS model to obtain the latent variables. Instead of computing univariate Cox models, this method optimizes the weight vector to explain the DR of each observation.
Coxmos also implements an alternative variable selection
strategy based on a heuristic search, following the approach used by the
mixOmics algorithms, which is employed in sPLS-DRCOX. In this
case, variable selection is guided by a vector indicating the number of
variables to be tested. Multiple values can be specified to iterate over
different numbers of variables. However, if the value is set to
NULL
, the number of variables is automatically determined
by the heuristic search algorithm.
With this method, the user can specify the minimum and maximum number of variables and the number of cutpoints (how many variables to test between the minimum and the maximum number of variables) to be tested. After the first iteration, the algorithm will select the optimal number of variables and will further investigate better results between the existing cut points and the optimal value selected.
cv.splsdrcox_dynamic_res <- cv.coxmos(method = "splsdrcox",
X = X_train, Y = Y_train,
max.ncomp = 2, vector = NULL,
MIN_NVAR = 1, MAX_NVAR = 50,
n.cut_points = 10,
n_run = 2, k_folds = 5)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
After the cross-validation, the user can select the exact number of components and variables to use in their model.
splsdrcox_dynamic_model <- coxmos(method = "splsdrcox",
X = X_train, Y = Y_train,
n.comp = 2, #cv.splsdrcox_dynamic_res$opt.comp,
vector = 50, #cv.splsdrcox_dynamic_res$opt.nvar,
n.cut_points = 10)
splsdrcox_dynamic_model
#> The method used is sPLS-DRCOX-Dynamic.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> comp_1 0.7112537 2.036543 0.1197552 0.1176486 6.045580 1.488738e-09
#> comp_2 0.7039326 2.021688 0.1343166 0.1203101 5.850986 4.886660e-09
C:1BfP43083d9d5ab9-pipeline.R
Finally, a Cox-based sPLS-DA model is applied. This algorithm is the simplest among the proposed methods and involves the least statistical complexity, although, depending on the dataset, it can outperform the other approaches. In this case, sPLS-DA is used to classify patients based on whether or not they experienced the study event, disregarding the time to event. Subsequently, a Cox model is fitted using the latent variables obtained from sPLS-DA, along with the complete \(Y\) matrix containing times and event/censoring information. This method also incorporates the same variable selection technique as sPLS-DRCOX.
cv.splsdacox_dynamic_res <- cv.coxmos(method = "splsdacox",
X = X_train, Y = Y_train,
max.ncomp = 2, vector = NULL,
MIN_NVAR = 1, MAX_NVAR = 50,
n.cut_points = 10,
n_run = 2, k_folds = 5)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Once the best model is obtained through cross-validation, it must be calculated using the desired parameters.
splsdacox_dynamic_model <- coxmos(method = "splsdacox",
X = X_train, Y = Y_train,
n.comp = 2, #cv.splsdacox_dynamic_res$opt.comp,
vector = 50) #cv.splsdacox_dynamic_res$opt.nvar,
splsdacox_dynamic_model
#> The method used is sPLS-DACOX-Dynamic.
#> Survival model:
#> coef exp(coef) se(coef) robust se z Pr(>|z|)
#> comp_1 0.4294835 1.536464 0.10812331 0.09318388 4.608990 4.046303e-06
#> comp_2 0.2886224 1.334588 0.08920555 0.08365058 3.450334 5.598941e-04
C:1BfP43083d9d5ab9-pipeline.R
Next, we will analyze the results obtained from the multiple models to see which one provides the best predictions based on our data. To do this, we will use the test set that has not been used to train any model. However, in case that test data are not available, the train data can also be used.
Initially, we will compare the results for each of the methods
according to the evaluator we desire. Additionaly, we must provide a
list of the different models as well as the X
and
Y
evaluation set.
When evaluating survival model results, we must indicate at which
temporal points we want to perform the evaluation. As we already
specified a NULL
value for the times
parameter
in the cross-validation, we are going to let the algorithm compute them
again.
lst_models <- list("COX-EN" = coxen_model,
"sPLS-ICOX" = splsicox_model,
"sPLS-DRCOX-Dynamic" = splsdrcox_dynamic_model,
"sPLS-DACOX-Dynamic" = splsdacox_dynamic_model)
eval_results <- eval_Coxmos_models(lst_models = lst_models,
X_test = X_test, Y_test = Y_test,
times = NULL)
C:1BfP43083d9d5ab9-pipeline.R
We can print the results obtained in the console where we can see, for each of the selected methods, the training time and the time it took to be evaluated, as well as their AIC, C-Index and AUC metrics and at which time points it was evaluated.
eval_results
#> Evaluation performed for methods: COX-EN, sPLS-ICOX, sPLS-DRCOX-Dynamic, sPLS-DACOX-Dynamic.
#> COX-EN:
#> training.time: 0.0065
#> evaluating.time: 0.0042
#> AIC: 353.447
#> C.Index: 0.6906
#> time: 377, 554.786, 732.572, 910.358, 1088.143, 1265.929, 1443.715, 1621.5, 1799.286, 1977.072, 2154.858, 2332.643, 2510.429, 2688.215, 2866
#> AUC: 0.55871
#> brier_time: 116, 186, 322, 377, 385, 454, 477, 489, 506, 518, 522, 606, 614, 616, 678, 703, 723, 747, 759, 825, 847, 904, 1015, 1048, 1150, 1174, 1189, 1233, 1275, 1324, 1508, 1528, 1611, 1694, 1742, 1793, 1935, 1972, 2632, 2854, 2866, 3001, 3472, 3669
#> I. Brier: 0.17834
#>
#> sPLS-ICOX:
#> training.time: 0.0551
#> evaluating.time: 0.004
#> AIC: 330.3254
#> C.Index: 0.7997
#> time: 377, 554.786, 732.572, 910.358, 1088.143, 1265.929, 1443.715, 1621.5, 1799.286, 1977.072, 2154.858, 2332.643, 2510.429, 2688.215, 2866
#> AUC: 0.6103
#> brier_time: 116, 186, 322, 377, 385, 454, 477, 489, 506, 518, 522, 606, 614, 616, 678, 703, 723, 747, 759, 825, 847, 904, 1015, 1048, 1150, 1174, 1189, 1233, 1275, 1324, 1508, 1528, 1611, 1694, 1742, 1793, 1935, 1972, 2632, 2854, 2866, 3001, 3472, 3669
#> I. Brier: 0.17912
#>
#> sPLS-DRCOX-Dynamic:
#> training.time: 0.0064
#> evaluating.time: 0.0042
#> AIC: 316.4159
#> C.Index: 0.8154
#> time: 377, 554.786, 732.572, 910.358, 1088.143, 1265.929, 1443.715, 1621.5, 1799.286, 1977.072, 2154.858, 2332.643, 2510.429, 2688.215, 2866
#> AUC: 0.63535
#> brier_time: 116, 186, 322, 377, 385, 454, 477, 489, 506, 518, 522, 606, 614, 616, 678, 703, 723, 747, 759, 825, 847, 904, 1015, 1048, 1150, 1174, 1189, 1233, 1275, 1324, 1508, 1528, 1611, 1694, 1742, 1793, 1935, 1972, 2632, 2854, 2866, 3001, 3472, 3669
#> I. Brier: 0.15933
#>
#> sPLS-DACOX-Dynamic:
#> training.time: 0.0066
#> evaluating.time: 0.0039
#> AIC: 344.9342
#> C.Index: 0.7197
#> time: 377, 554.786, 732.572, 910.358, 1088.143, 1265.929, 1443.715, 1621.5, 1799.286, 1977.072, 2154.858, 2332.643, 2510.429, 2688.215, 2866
#> AUC: 0.53489
#> brier_time: 116, 186, 322, 377, 385, 454, 477, 489, 506, 518, 522, 606, 614, 616, 678, 703, 723, 747, 759, 825, 847, 904, 1015, 1048, 1150, 1174, 1189, 1233, 1275, 1324, 1508, 1528, 1611, 1694, 1742, 1793, 1935, 1972, 2632, 2854, 2866, 3001, 3472, 3669
#> I. Brier: 0.18955
#>
C:1BfP43083d9d5ab9-pipeline.R
However, we can also obtain graphical results where we can compare
each method over time, as well as their average scores using the
function plot_evaluation()
or
plot_evaluation.list()
if multiple evaluators have been
tested. The user can choose to plot the AUC for time prediction points
or the Brier Score.
lst_eval_results_auc <- plot_evaluation(eval_results, evaluation = "AUC", pred.attr = "mean")
lst_eval_results_brier <- plot_evaluation(eval_results, evaluation = "IBS", pred.attr = "mean")
C:1BfP43083d9d5ab9-pipeline.R
After generating the evaluation plot with Coxmos, an R object is returned containing two sublists. The first sublist contains time-dependent evaluation results for each method, along with a variant that includes the average performance values. The second sublist enables comparison of the mean performance across methods using statistical tests such as the T-test, Wilcoxon test, ANOVA, or Kruskal–Wallis test.
C:1BfP43083d9d5ab9-pipeline.R
Another possible comparison is related to the computation times for cross-validation, model creation, and evaluation. In this case, cross-validations and methods are loaded.
lst_models_time <- list(#cv.coxen_res,
coxen_model,
#cv.splsicox_res,
splsicox_model,
#cv.splsdrcox_dynamic_res,
splsdrcox_dynamic_model,
#cv.splsdacox_dynamic_res,
splsdacox_dynamic_model,
eval_results)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
A critical diagnostic step when using Cox models is evaluating whether the Proportional Hazards (PH) assumption holds for each predictor or component. This assumption underpins the validity of Cox regression results, and violations can lead to misleading inferences.
So the function plot_proportionalHazard()
tries to
answer the question: “Does my Cox model violate the fundamental
proportional hazards assumption, and if so, which specific variables or
components show time-dependent effects that could invalidate my
results?”
The plot_proportionalHazard()
function provides a visual
and statistical assessment of this assumption. It leverages Schoenfeld
residuals to test whether the hazard ratio associated with each variable
remains constant over time. If patterns in the Schoenfeld residuals are
systematic or show statistically significant deviations, it indicates
that the effect of a variable changes over time during the follow-up
period.
This is particularly important when working with dimension-reduced components, such as those generated from sPLS models. Because these components aggregate the effects of multiple variables, subtle time-dependent behavior from individual features may be masked, and only detectable through residual diagnostics.
The function can be applied to a single model using
plot_proportionalHazard()
, or to a list of models with
plot_proportionalHazard.list()
. The output plot shows, for
each component or variable, the residual pattern and associated p-value
from the proportional hazards test.
C:1BfP43083d9d5ab9-pipeline.R
Variables or components with a significant p-value (typically p < 0.05) indicate a violation of the PH assumption.
C:1BfP43083d9d5ab9-pipeline.R
This visualization helps determine whether model refinement is needed. For instance, if the assumption is violated, one might:
In this example, we observed a significant violation of the proportional hazards assumption associated with the first component. As a result, we would typically need to either stratify the model, remove the variable, or include an interaction term with time to address this issue. However, since this vignette is a simplified illustrative example, we will proceed with the model as is.
Following the cross-validation, we have selected the sPLS-DRCOX methodology as the most suitable model for our data. We will now study and interpret its results based on the study variables or latent variables used.
The plot_forest()
function provides a concise and
interpretable summary of the effects estimated by a Cox model. It
visualizes Hazard Ratios (HRs) and their confidence intervals for each
predictor or component, allowing for rapid identification of
statistically significant associations and the directionality of their
effects.
Forest plots are especially useful in models derived from Cox, but
plot_forest()
function was updated to work with all
Coxmos approaches, where each component represents a latent
combination of variables. The function automatically handles these
structures, labeling components appropriately and ensuring that effect
sizes can be meaningfully compared across components or even between
models.
For each variable or component, the plot displays:
Interpretation is straightforward: components with HR > 1 are associated with increased risk (right-sided bars), while those with HR < 1 are associated with reduced risk (left-sided bars). If the confidence interval crosses HR = 1, the effect is considered non-significant.
This visualization bridges statistical output and clinical interpretation, helping to prioritize features or components that have both statistical and biological relevance.
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Another type of visualization implemented for all supported models, whether classical or sPLS-based, is the distribution of predicted values by event status. These plots are useful for assessing how well the model separates patients who experienced the event from those who did not, based on the model’s output.
The function plot_cox.event()
(or
plot_cox.event.list()
for multiple models) enables the
generation of density and histogram plots using predictions derived from
the fitted Cox model. Internally, the function leverages the prediction
types provided by the coxph
function in the
survival
package. The main types include:
lp
(linear predictors): the log-risk scores for each
observation based on the model’s coefficients. These reflect the
relative risk associated with an individual’s covariate profile.risk
: the relative risk of experiencing the event,
obtained by exponentiating the linear predictor.expected
: the expected number of events for each
subject over time, given their covariate profile.terms
: the contribution of each variable to the linear
predictor, shown separately.survival
: the estimated probability of survival at a
given time point.By selecting the appropriate prediction type, the function produces plots that show the distribution of predicted values stratified by event status. These plots allow for visual evaluation of the model’s discriminative ability, indicate that the model is effectively distinguishing between censored and event observations.
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
How much do my original variables (for classical Cox) or components (sPLS-COX) contribute to the full model?
Understanding the internal structure of sPLS-COX models is essential
for translating their predictive power into biological insight. The
eval_Coxmos_model_per_variable()
function enables an
evaluation of how individual variables or latent components contribute
to model performance.
A central goal of this analysis is to assess component-level predictive performance, both in training and test datasets. The function decomposes the model’s AUC by component, allowing users to determine which components carry the most prognostic information. Comparing individual component AUCs against the full model’s linear predictor helps identify whether some components are particularly dominant or redundant.
From a validation standpoint, the function supports the identification of overfitting risk. Components that perform well in training data but drop in test performance may lack generalizability. Conversely, components that consistently perform across datasets represent robust predictors worthy of further investigation.
The eval_Coxmos_model_per_variable()
function
provides:
In terms of interpretation, components with high AUC values can be considered the primary drivers of the model. If an individual component outperforms the full model (linear predictor), it may represent potential overfitting, and should be considered in model refinement.
variable_auc_results <- eval_Coxmos_model_per_variable(model = lst_models$`sPLS-DRCOX-Dynamic`,
X_test = lst_models$`sPLS-DRCOX-Dynamic`$X_input,
Y_test = lst_models$`sPLS-DRCOX-Dynamic`$Y_input)
variable_auc_plot_train <- plot_evaluation(variable_auc_results, evaluation = "AUC")
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
The plot_sPLS_Coxmos()
function provides a visualization
framework for exploring sPLS components in the context of survival
analysis. One of its capabilities is the visual detection of outliers,
which can significantly impact model interpretation and performance.
Through score plots, users can visually inspect how samples are distributed across the latent component space. Samples that appear distant from the main cluster may indicate potential outliers — observations with unique profiles that could disproportionately influence the model. Identifying such samples early allows for more informed decisions about data quality, potential stratification, or exclusion.
In addition to outlier detection, the score plots also highlight sample clustering patterns. They help determine whether patients group naturally according to event status and whether component axes capture meaningful clinical variation. A clear separation between event groups supports the relevance of the extracted components.
Loading plots complement this view by showing the variables that contribute most to each component. They can reveal which features are driving sample positioning in the score space and whether any variables have excessive influence.
The biplot integrates both sample and variable information, allowing users to examine relationships between extreme samples and specific variables.
The plot_sPLS_Coxmos()
function generates graphical
representations of sPLS components from any Coxmos sPLS-based survival
model. It supports three types of plots depending on the
mode
selected:
mode = "scores"
: displays sample projections onto the
latent components (score plot).mode = "loadings"
: shows variable contributions to the
selected components (loading plot).mode = "biplot"
: combines both sample and variable
information in a single view.ggp_scores <- plot_sPLS_Coxmos(model = lst_models$`sPLS-DRCOX-Dynamic`,
comp = c(1,2), mode = "scores")
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
ggp_loadings <- plot_sPLS_Coxmos(model = lst_models$`sPLS-DRCOX-Dynamic`,
comp = c(1,2), mode = "loadings",
top = 10)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
ggp_biplot <- plot_sPLS_Coxmos(model = lst_models$`sPLS-DRCOX-Dynamic`,
comp = c(1,2), mode = "biplot",
top = 15,
only_top = T,
overlaps = 20)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Which original variables drive my sPLS model’s survival predictions, and what direction of effect (protective or risk) do they contribute to each component?
The plot_pseudobeta()
function allows for a detailed
inspection of how individual variables influence the components of a
sPLS-based survival model. This visualization provides a transparent and
biologically interpretable summary of variable-level
contributions, capturing both the magnitude and direction of
effect for each feature.
By decomposing each component into pseudobeta coefficients, it becomes possible to rank variables based on their impact on the linear predictor (LP) and to assess whether their effects are consistent across components. The directionality—whether a variable acts as a protective or risk factor—is visualized through the orientation of the bars, even when components aggregate multiple opposing effects.
This type of analysis is particularly valuable for biological interpretation, as it links abstract components back to measurable variables. It also reveals whether known biomarkers align with expected effects or whether new candidate variables emerge as strong contributors to survival risk.
The function includes several customizable options:
onlySig = TRUE
filters components based on statistical
significance (by default at alpha = 0.05
), reducing
variable noise contribution and emphasizing robust effects.top
defines the number of top variables to display
(e.g., top = 20
for overview; top = NULL
for
full detail).ggp.simulated_beta <- plot_pseudobeta(model = lst_models$`sPLS-DRCOX-Dynamic`,
error.bar = T, onlySig = T, alpha = 0.05,
zero.rm = T, auto.limits = T, top = 20,
show_percentage = T, size_percentage = 2)
C:1BfP43083d9d5ab9-pipeline.R
The sPLS-DRCOX-Dynamic
model was computed using a total
of two latent components. While both components were classified as
risk-associated based on their overall positive
coefficients in the linear predictor, individual variables within these
components may still exhibit protective effects,
depending on the sign and magnitude of their specific weights.
The following plot displays the pseudobeta coefficients for the top 20 original variables contributing to the model. These coefficients reflect the direction (risk [positive values] vs. protective [negative values]) and strength of each variable’s influence on the model’s survival prediction.
In this configuration, the top 20 variables account for approximately
63.99% of the total value of the linear predictor. To
obtain a complete view of all variable contributions the
top
parameter have to be set to NULL
.
C:1BfP43083d9d5ab9-pipeline.R
Kaplan-Meier (KM) curves are a fundamental tool in survival analysis. They estimate the probability of survival over time and allow visual comparison between different patient groups. When used in conjunction with a risk prediction model, such as the Cox model, KM curves help assess whether the predicted risk groups truly differ in their survival outcomes. Typically, the survival probability is plotted against time, and differences between curves indicate the effectiveness of the model in stratifying risk.
In this context, the getAutoKM()
and
getAutoKM.list()
functions provide an automated way to
generate KM plots from models. These functions can be used to analyze
the survival curves based on:
"LP"
),"COMP"
), or"VAR"
).To generate Kaplan-Meier curves using the full linear predictor, set
the type
parameter to "LP"
. This uses the
value of the linear predictor (a combination of component scores and Cox
coefficients) to divide patients into high- and low-risk groups. The
optimal cut-point is computed using the surv_cutpoint()
function from the survminer
R package. Other parameters,
such as comp
or top
, are ignored in this
configuration.
C:1BfP43083d9d5ab9-pipeline.R
This function returns a list containing both numerical information (cut-points, significance levels) and the corresponding KM plots.
To display the Kaplan-Meier curve associated with the full linear predictor:
C:1BfP43083d9d5ab9-pipeline.R
The curve illustrates how effectively the model separates patients into groups with statistically different survival probabilities. The log-rank test p-value indicates whether the observed differences are significant.
This approach provides a straightforward and interpretable method to validate the prognostic power of a model’s output. By transforming risk predictions into time-dependent survival probabilities, it bridges the gap between statistical modeling and clinical decision-making.
In sPLS-based models, each latent component contributes to the
model’s predictive power. Setting type = "COMP"
allows the
user to evaluate the prognostic value of each component separately
through KM curves. This helps identify whether a single component is
sufficient to stratify patients effectively, or if multiple components
are necessary.
The comp
parameter selects which components to include
in the analysis. If multiple components are specified, a separate KM
plot will be generated for each.
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
Finally, by setting type = "VAR"
, Kaplan-Meier curves
are generated for individual variables from the original dataset. This
approach is valuable when:
You want to examine how specific variables behave independently of the sPLS model.
You need to validate the clinical relevance of original features in survival stratification.
This method uses ori_data = TRUE
to determine cutpoints
from raw (unscaled) variable values, ensuring interpretability in
clinical settings. The top
parameter controls how many
top-ranked variables to include, and the function will automatically
compute log-rank p-values to filter only significant results if
only_sig = TRUE
.
LST_KM_RES_VAR <- getAutoKM(type = "VAR",
model = lst_models$`sPLS-DRCOX-Dynamic`,
top = 10,
ori_data = T, #original data selected
only_sig = T, alpha = 0.05)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
The framework also supports prediction and interpretation for new patients, offering tools to visualize how a new observation compares to the training data in terms of risk and variable importance.
To illustrate this functionality, we select a censored observation from the test dataset:
C:1BfP43083d9d5ab9-pipeline.R
This observation corresponds to a censored patient with the last follow-up recorded at time 825:
knitr::kable(Y_test[rownames(new_pat),])
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")
time | event | |
---|---|---|
TCGA-A2-A0SV-01A | 825 | TRUE |
C:1BfP43083d9d5ab9-pipeline.R
Additionally, the new observation can be added to the density and histogram plots that the model has computed. While this function can be useful in visualizing the results, it is important to note that no definitive conclusions can be drawn from these types of plots alone. They serve as an additional means of viewing the predictive values for the new patient (“lp”, “risk”, “expected”, “terms”, “survival”) in relation to the training dataset and its histogram or density plot, along with the events or censored patients.
pat_density <- plot_observation.eventDensity(observation = new_pat,
model = lst_models$`sPLS-DRCOX-Dynamic`,
type = "lp")
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
pat_histogram <- plot_observation.eventHistogram(observation = new_pat,
model = lst_models$`sPLS-DRCOX-Dynamic`,
type = "lp")
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
The function plot_observation.pseudobeta()
allows
comparison between the new patient’s variable profile and the
pseudo-beta coefficients derived from a trained model. This comparison
highlights variables contributing to increased or decreased predicted
risk, relative to the average values in the training set.
ggp.simulated_beta_newObs <- plot_observation.pseudobeta(model = lst_models$`sPLS-DRCOX-Dynamic`,
observation = new_pat,
error.bar = T, onlySig = T, alpha = 0.05,
zero.rm = T, auto.limits = T, show.betas = T, top = 20,
txt.x.angle = 90)
C:1BfP43083d9d5ab9-pipeline.R
The resulting figure shows:
This helps identify whether variables behave in a risk-aligned or protective direction. For example, a model variable with a negative pseudo-beta (protective) whose patient value is in the opposite direction (above the mean) contributes to a risk profile.
C:1BfP43083d9d5ab9-pipeline.R
The function eval_Coxmos_model_per_variable()
enables
the evaluation of how individual variables or latent components
contribute to model performance. This evaluation can also be applied to
new observations, such as a test dataset, by specifying the and
arguments.
variable_auc_results_test <- eval_Coxmos_model_per_variable(model = lst_models$`sPLS-DRCOX-Dynamic`,
X_test = X_test,
Y_test = Y_test)
variable_auc_plot_test <- plot_evaluation(variable_auc_results_test, evaluation = "AUC")
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
In this example, component 2 outperforms the full linear predictor (LP) model, which may suggest potential overfitting. This occurs when the model captures patterns that are overly specific to the training data and fail to generalize to new observations.
Note: This vignette uses a toy dataset, and the results should not be interpreted in a biological context.
Furthermore, LP plots, similar to those for a single observations, can be generated for multiple patients at the same time. For example, by selecting 5 patients.
C:1BfP43083d9d5ab9-pipeline.R
knitr::kable(Y_test[1:5,])
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")
#> Warning: 'xfun::attr()' is deprecated.
#> Use 'xfun::attr2()' instead.
#> See help("Deprecated")
time | event | |
---|---|---|
TCGA-A2-A0SV-01A | 825 | TRUE |
TCGA-A2-A0YT-01A | 723 | TRUE |
TCGA-BH-A1FB-01A | 3669 | TRUE |
TCGA-D8-A1XC-01A | 377 | TRUE |
TCGA-BH-A1EX-01A | 1508 | TRUE |
C:1BfP43083d9d5ab9-pipeline.R
This section allows the visualization of the individual linear predictor (LP) values for each variable, along with the total LP calculated using all variables included in the survival model.
A potential application of this function is to analyze the same patient at different time points, for example, after receiving treatment, and track its evolution over time with respect to the original variables.
To generate the figure, use the
plot_multipleObservations.LP.list()
or
plot_multipleObservations.LP()
function.
lst_cox.comparison <- plot_multipleObservations.LP(model = lst_models$`sPLS-DRCOX-Dynamic`,
observations = lst_observations,
top = 10)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
In the resulting plot, the LP contribution of each variable is displayed for each observation, as well as the total LP. This visualization helps identify which observations carry higher predicted risk compared to others.
To evaluate the model’s performance on unseen data, the cutoffs
derived from getAutoKM()
can be used to stratify the test
dataset using Kaplan-Meier analysis. This approach helps determine how
well the model separates patients into groups with different survival
probabilities.
This modality uses the final linear predictor (LP), which combines all variables or components weighted by their estimated coefficients, to classify patients into high- and low-risk groups. The cutoff value, usually determined using the training data, is applied to the test data to stratify patients.
Once the cutoff is retrieved using getCutoffAutoKM()
,
the getTestKM()
function applies it to the test set and
generates the survival curves. A log-rank test is performed to assess
whether the survival distributions between the groups are significantly
different, indicating that the model has learned meaningful risk
stratification.
lst_cutoff <- getCutoffAutoKM(LST_KM_RES_LP)
LST_KM_TEST_LP <- getTestKM(model = lst_models$`sPLS-DRCOX-Dynamic`,
X_test = X_test, Y_test = Y_test,
type = "LP",
BREAKTIME = NULL, n.breaks = 20,
cutoff = lst_cutoff)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
This option evaluates the contribution of individual latent components from the model. Each component represents a compressed combination of input variables designed to maximize the model’s predictive power.
By applying a cutoff to each component separately, you can assess whether one or more components independently capture significant survival-related structure. This is particularly useful in multivariate models, where understanding the relevance of each component can guide dimensionality reduction or biological interpretation.
lst_cutoff <- getCutoffAutoKM(LST_KM_RES_COMP)
LST_KM_TEST_COMP <- getTestKM(model = lst_models$`sPLS-DRCOX-Dynamic`,
X_test = X_test, Y_test = Y_test,
type = "COMP",
BREAKTIME = NULL, n.breaks = 20,
cutoff = lst_cutoff)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
This modality focuses on the original input variables, rather than derived components or LP. It enables a more granular analysis by evaluating whether each variable, considered individually, can stratify the test data into groups with different survival outcomes.
This is helpful when interpreting the biological or clinical
relevance of variables, as it highlights those that retain predictive
value outside the multivariate context. The ori_data = TRUE
flag ensures that the model works directly with the original variable
scales.
lst_cutoff <- getCutoffAutoKM(LST_KM_RES_VAR)
LST_KM_TEST_VAR <- getTestKM(model = lst_models$`sPLS-DRCOX-Dynamic`,
X_test = X_test, Y_test = Y_test,
type = "VAR", ori_data = T,
BREAKTIME = NULL, n.breaks = 20,
cutoff = lst_cutoff)
C:1BfP43083d9d5ab9-pipeline.R
C:1BfP43083d9d5ab9-pipeline.R
In this case, although the variable was significant in the training data, it shows similar behavior across both groups in the test data.