Cox Regression Under Quasi-Independent Double Truncation

library(survdt)

Inverse probability weighted estimators

The survdt package implements several methods for Cox regression with quasi-independent double truncation. We illustrate them through an application to the aids dataset, which holds records of AIDS incubation times from a retrospective sample of patients with transfusion-acquired HIV. The inclusion criteria were that the patient was diagnosed with AIDS between the discovery of the disease in 1982 and the end of the study in 1986, with time measured in months since HIV infection. The event time of interest is the AIDS incubation time incu, with the truncation times contained in ltrunc and rtrunc respectively. See ?survdt::aids for more details.

To fit a Cox model with nonparametric inverse probability weighting, use the coxph_indtrunc() function in the survdt package. The first input argument should be a model formula with a Survdt or Survdt2 response. Survdt() is used under time-independent covariates (see ?survdt::Survdt), while Survdt2 is used when there are time-varying covariates (see Time-varying covariates for an example).

aids_fit <- coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                           data = aids)
aids_fit
#> 
#> IPW Cox model under quasi-independent truncation.
#> 
#>   Call: coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
#>             data = aids)
#> 
#>   Model was fit on 295 complete samples (0 dropped due to missingness).
#>   Min event time: 0
#>   Max event time: 89
#>   295 observed events in the score function, time range [0, 89].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: W-asurv
#> 
#> Model estimates:
#>                  coef        se     zstat         pval
#> age_gp4-59 -0.6895787 0.5450223 -1.265230 2.057888e-01
#> age_gp<=4   2.1222687 0.3380197  6.278535 3.417777e-10
confint(aids_fit)
#>                 2.5%     97.5%
#> age_gp4-59 -1.757803 0.3786454
#> age_gp<=4   1.459762 2.7847752

The default setting applies robust inverse probability of non-truncation weights and fits the model by maximizing a weighted partial likelihood. Several choices of time-varying weights are available through the input argument ipw_type. They offer improved robustness to outliers and extreme inverse probability weights when compared to standard inverse probability weighting (ipw_type="W-1").

coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
               data = aids,
               ipw_type = "W-1")
#> 
#> IPW Cox model under quasi-independent truncation.
#> 
#>   Call: coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
#>             data = aids, ipw_type = "W-1")
#> 
#>   Model was fit on 295 complete samples (0 dropped due to missingness).
#>   Min event time: 0
#>   Max event time: 89
#>   295 observed events in the score function, time range [0, 89].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: none
#> 
#> Model estimates:
#>                  coef        se     zstat         pval
#> age_gp4-59 -0.8849401 0.7114121 -1.243921 2.135288e-01
#> age_gp<=4   2.0283216 0.3329256  6.092416 1.112191e-09

In this case, the coefficient estimates change under different weightings but the z-statistics are fairly similar. In Sensitivity analysis for positivity violations, we find that different weight choices can have a greater impact in the aids data during sensitivity analysis. See ?survdt::coxph_indtrunc for more details on the different weighting options.

Lastly, stratified Cox models can be fit by specifying a strata_formula. See Stratified Cox models for an example.

Plotting survival and hazard estimates

Given a fitted Cox model, the hazard or survival function estimates can be plotted using the plot_coxsurv() function in the survdt package. The first input argument should be a model object from coxph_indtrunc(), while the second input argument should be a data frame with the covariate values to plot the conditional survival or hazard function for.

plot_coxsurv(aids_fit, 
             newdata = aids[c(1,40), ],
             target = "survival")
#> Obtaining the baseline hazard with standard errors by calling:
#>  coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
#>     data = aids, basehaz = TRUE, se = TRUE)


plot_coxsurv(aids_fit, 
             newdata = aids[c(1, 40, 200), ], 
             ci = FALSE, 
             target = "cumhaz")
#> Obtaining the baseline hazard by calling:
#>  coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
#>     data = aids, basehaz = TRUE, se = FALSE)

Note that plot_coxsurv() prints a message telling us it is computing the baseline hazard and its standard errors, since (by default) this was not done when fitting the model. Use basehaz=TRUE when running coxph_indtrunc() to remove the need to re-compute the baseline hazard for each plot.

Assumption checking and model diagnostics

The statistical methods implemented in coxph_indtrunc() require two key assumptions to ensure unbiased inference in the presence of double truncation. First, the truncation times should be independent of the event time and covariates within the observable data region, which is known as quasi-independence. Second, all possible event times should have a positive probability of being observed, i.e. not being truncated, which is known as the positivity assumption. The survdt package includes diagnostics for both of these assumptions. They are described in the following two sections. Residual diagnostics for the Cox model specification are available through the residuals() function (see survdt::coxph_indtrunc).

Testing quasi-independent truncation with covariates

The survdt package includes the function test_quasiindep_covariates() to test for violations of the quasi-independence assumption. This is accomplished through a two-stage test based on stratifying by covariate values. The first stage checks for dependence within strata, while the second stage checks across strata. The test can be repeated with different stratifications when there are many covariates involved. See ?survdt::test_quasiindep_covariates for more details.

The first input argument should be a model formula with a Survdt response and factor variables defining covariate strata on the right hand side. Non-factor variables will be automatically coerced to factors. Below we run the diagnostic test on the aids data, with age group defining the covariate strata.

qind_test <- test_quasiindep_covariates(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                                        data = aids)
qind_test
#> 
#> Two-stage quasi-independent truncation test involving covariates.
#> 
#>   Unstratified NPMLE fit based on 295 complete samples (0 dropped due to missingness)
#>     with 71 unique event times.
#>   Min event time: 0
#>   Max event time: 89
#> 
#>   Stage 1 (within strata quasi-independence).
#>   Test statistic: chisq=5.4, df=3
#>   Significant evidence not found against quasi-independence (p=0.15)
#> 
#>   Stage 2 (across strata equality of selection probabilities).
#>   Test statistic: 0.24
#>     Time range:   [0, 89]
#>     Time weights: none
#>   Critical value: 0.31 at level 0.95
#>   Significant evidence not found against equality across strata (p=0.21).
#>   Results based on 2000 simulations from the estimated null distribution.
#> 
#> Summary of the 3 covariate groups used in the test:
#>     stratum size
#>   age_gp>59  141
#>  age_gp4-59  120
#>   age_gp<=4   34

Both stages of the test fail to find significant evidence of dependent truncation in the aids data, so the quasi-independence assumption required by coxph_indtrunc() is reasonably satisfied. The test results for the second stage can be plotted to visualize how strongly the truncation times could depend on the covariates. This is done by comparing the estimated selection (non-truncation) probabilities across strata.

plot(qind_test)

The estimated selection probabilities all lie within the 95% uniform confidence band around the unstratified estimates, as expected given the p-value from the stage 2 test printed above.

Sensitivity analysis for positivity violations

The survdt package includes the function positivity_sens_indtrunc() to conduct sensitivity analysis for the Cox model under potential violations of the positivity assumption. Given a sequence of potential values for the truncated baseline probability mass, it returns a corresponding sequence of adjusted coefficient estimates and associated sensitivity intervals which show how the fitted model could change due to non-positivity. See ?survdt::positivity_sens_indtrunc for more details.

The first input argument should be a model formula with Survdt response. The truncated masses are specified through trunc_mass, and should not be too close to one or the algorithm may not converge. Many of the other input arguments are shared with coxph_indtrunc().

First, we apply the sensitivity analysis with standard inverse probability weighting, which does not have a time-varying component.

positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                         data = aids, ipw_type = "W-1",
                         trunc_mass = seq(0, 0.11, by = 0.01))
#> Warning in positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, : 
#>   Sensitivity analysis failed at trunc_mass 0.04 with error message 
#>   'Algorithm reached max_iter without convergence'
#>   
#> See ?positivity_sens_indtrunc for suggestions to improve convergence.
#> 
#> Sensitivity analysis for positivity violations in a Cox model under quasi-
#>  independent truncation.
#> 
#>   Call: positivity_sens_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ 
#>             age_gp, data = aids, trunc_mass = seq(0, 0.11, by = 0.01), 
#>             ipw_type = "W-1")
#> 
#>   Model was fit on 295 complete samples (0 dropped due to missingness).
#>   Min event time: 0
#>   Max event time: 89
#>   295 observed events in the score function, time range [0, 89].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: none
#> 
#> Sensitivity analysis estimates:
#>            trunc_mass=0 trunc_mass=0.01 trunc_mass=0.02 trunc_mass=0.03
#> age_gp4-59   -0.8849401       -1.304184       -1.549481       -1.799851
#> age_gp<=4     2.0283216        2.011124        2.017726        2.027230
#>            trunc_mass=0.04 trunc_mass=0.05 trunc_mass=0.06 trunc_mass=0.07
#> age_gp4-59              NA              NA              NA              NA
#> age_gp<=4               NA              NA              NA              NA
#>            trunc_mass=0.08 trunc_mass=0.09 trunc_mass=0.1 trunc_mass=0.11
#> age_gp4-59              NA              NA             NA              NA
#> age_gp<=4               NA              NA             NA              NA

Note that the estimator failed to converge even at small truncated mass values. This can occur for many reasons (see ?survdt::positivity_sens_indtrunc for more information). In the aids data, it turns out that the estimated selection probability for the largest observed incubation times approaches zero, leading to extreme inverse probability weights. Therefore we apply time-varying weights that reduce the influence of event time outliers and extreme weights by using the option ipw_type = "W-asurv". This is the default weighting used in positivity_sens_indtrunc() and coxph_indtrunc().

sens <- positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                                 data = aids,
                                 trunc_mass = seq(0, 0.11, by = 0.01))
sens
#> 
#> Sensitivity analysis for positivity violations in a Cox model under quasi-
#>  independent truncation.
#> 
#>   Call: positivity_sens_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ 
#>             age_gp, data = aids, trunc_mass = seq(0, 0.11, by = 0.01))
#> 
#>   Model was fit on 295 complete samples (0 dropped due to missingness).
#>   Min event time: 0
#>   Max event time: 89
#>   295 observed events in the score function, time range [0, 89].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: W-asurv
#> 
#> Sensitivity analysis estimates:
#>            trunc_mass=0 trunc_mass=0.01 trunc_mass=0.02 trunc_mass=0.03
#> age_gp4-59   -0.6895787      -0.8425212      -0.9318757       -1.014750
#> age_gp<=4     2.1222687       2.1291125       2.1384747        2.148357
#>            trunc_mass=0.04 trunc_mass=0.05 trunc_mass=0.06 trunc_mass=0.07
#> age_gp4-59       -1.097130       -1.181844       -1.270912       -1.366208
#> age_gp<=4         2.158563        2.169015        2.179676        2.190524
#>            trunc_mass=0.08 trunc_mass=0.09 trunc_mass=0.1 trunc_mass=0.11
#> age_gp4-59       -1.469787       -1.584160      -1.712650       -1.859948
#> age_gp<=4         2.201545        2.212731       2.224076        2.235577

These robust weights improve the numerical convergence of the sensitivity analysis estimators by a substantial amount. The estimates can also be plotted to better visualize the results.

plot(sens)

plot(sens, hazratio = TRUE)

Clearly the coefficient estimate for adults is highly sensitive to potential positivity violations, while the coefficient estimate for children is more stable.

Stratified Cox models

The survdt package supports stratified Cox models, where the baseline hazard is allowed to differ across strata. In the coxph_indtrunc() function, stratification factors are specified through the strata_formula input argument, which should be a one-sided formula with no response variable. We illustrate this with a simulated sample featuring non-proportional hazards, which is included in the survdt package as nonprop_sample. The data has a binary covariate group with a time-varying log hazard ratio of \(0.7\times 1(t>0.6)\) and a continuous covariate x with a constant log hazard ratio of \(-0.5\).

We fit a model stratified by group.

strat_fit <- coxph_indtrunc(Survdt(event_time, ltrunc, rtrunc) ~ x, 
                            data = nonprop_sample, 
                            strata_formula = ~group)
strat_fit
#> 
#> IPW Cox model under quasi-independent truncation.
#> 
#>   Call: coxph_indtrunc(formula = Survdt(event_time, ltrunc, rtrunc) ~ 
#>             x, data = nonprop_sample, strata_formula = ~group)
#> 
#>   Model was fit on 490 complete samples (0 dropped due to missingness).
#>   Min event time: 0.11
#>   Max event time: 1.08
#>   490 observed events in the score function, time range [0.11, 1.08].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: W-asurv
#> 
#> Model estimates:
#>         coef        se     zstat        pval
#> x -0.7192016 0.2409035 -2.985434 0.002831761
#> 
#> Summary of the 2 strata:
#>  stratum size unique event times
#>   group1  260                 70
#>   group0  230                 80
confint(strat_fit)
#>        2.5%      97.5%
#> x -1.191364 -0.2470394

Plotting the cumulative hazard estimates for the two groups at the same x, we see that they are indeed approximately equal until time 0.6, after which the hazard for group 1 is about \(\exp(0.7)\approx 2\) times as large as the hazard for group 0.

plot_coxsurv(strat_fit, 
             newdata = data.frame(group = c(0, 1), x = c(1, 1)),
             target = "cumhaz")
#> Obtaining the baseline hazard with standard errors by calling:
#>  coxph_indtrunc(formula = Survdt(event_time, ltrunc, rtrunc) ~ 
#>     x, data = nonprop_sample, strata_formula = ~group, basehaz = TRUE, 
#>     se = TRUE)

Time-varying covariates

The survdt package supports time-varying covariates in the coxph_indtrunc() function. We illustrate this with a simulated sample featuring non-proportional hazards, which is included in the survdt package as nonprop_sample. The data has a binary covariate group with a time-varying log hazard ratio of \(0.7\times 1(t>0.6)\) and a continuous covariate x with a constant log hazard ratio of \(-0.5\).

We fit a model with the correct time-varying effect of group. First, we split each row in the data frame into two possible observation windows corresponding to \(t\leq 0.6\) and \(t>0.6\). The survdt package includes the function time_split() for this purpose. See ?survdt::time_split for more details.

split_dat <- time_split(nonprop_sample, 
                        cut = 0.6, 
                        stop_name = "event_time", 
                        start = "start",
                        event = "event",
                        id = "id")

Then we add a column group_tgr06 containing the time-varying covariate \(1(t>0.6)\times\)group, and fit the Cox model using a Survdt2 response.

split_dat$group_tgr06 <- (split_dat$event_time > 0.6) * split_dat$group
tvar_fit <- coxph_indtrunc(Survdt2(start, event_time, event, ltrunc, rtrunc, id) ~ 
                             group_tgr06 + x, 
                           data = split_dat)
tvar_fit
#> 
#> IPW Cox model under quasi-independent truncation.
#> 
#>   Call: coxph_indtrunc(formula = Survdt2(start, event_time, event, ltrunc, 
#>             rtrunc, id) ~ group_tgr06 + x, data = split_dat)
#> 
#>   Model was fit on 490 complete samples (0 dropped due to missingness).
#>   Min event time: 0.11
#>   Max event time: 1.08
#>   490 observed events in the score function, time range [0.11, 1.08].
#>   Inverse probability of non-truncation weights estimated nonparametrically.
#>   Time-varying weight component: W-asurv
#> 
#> Model estimates:
#>                   coef        se     zstat         pval
#> group_tgr06  0.9494582 0.2023921  4.691182 2.716319e-06
#> x           -0.7169686 0.2425353 -2.956142 3.115140e-03
confint(tvar_fit)
#>                   2.5%      97.5%
#> group_tgr06  0.5527769  1.3461395
#> x           -1.1923290 -0.2416082

The confidence intervals from the estimated Cox model contain the true values of 0.7 and -0.5.