The survdt package implements several methods for Cox
regression with quasi-independent double truncation. We illustrate them
through an application to the aids dataset, which holds
records of AIDS incubation times from a retrospective sample of patients
with transfusion-acquired HIV. The inclusion criteria were that the
patient was diagnosed with AIDS between the discovery of the disease in
1982 and the end of the study in 1986, with time measured in months
since HIV infection. The event time of interest is the AIDS incubation
time incu, with the truncation times contained in
ltrunc and rtrunc respectively. See
?survdt::aids for more details.
To fit a Cox model with nonparametric inverse probability weighting,
use the coxph_indtrunc() function in the
survdt package. The first input argument should be a model
formula with a Survdt or Survdt2 response.
Survdt() is used under time-independent covariates (see
?survdt::Survdt), while Survdt2 is used when
there are time-varying covariates (see Time-varying covariates for an
example).
aids_fit <- coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp,
data = aids)
aids_fit
#>
#> IPW Cox model under quasi-independent truncation.
#>
#> Call: coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp,
#> data = aids)
#>
#> Model was fit on 295 complete samples (0 dropped due to missingness).
#> Min event time: 0
#> Max event time: 89
#> 295 observed events in the score function, time range [0, 89].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: W-asurv
#>
#> Model estimates:
#> coef se zstat pval
#> age_gp4-59 -0.6895787 0.5450223 -1.265230 2.057888e-01
#> age_gp<=4 2.1222687 0.3380197 6.278535 3.417777e-10
confint(aids_fit)
#> 2.5% 97.5%
#> age_gp4-59 -1.757803 0.3786454
#> age_gp<=4 1.459762 2.7847752The default setting applies robust inverse probability of
non-truncation weights and fits the model by maximizing a weighted
partial likelihood. Several choices of time-varying weights are
available through the input argument ipw_type. They offer
improved robustness to outliers and extreme inverse probability weights
when compared to standard inverse probability weighting
(ipw_type="W-1").
coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp,
data = aids,
ipw_type = "W-1")
#>
#> IPW Cox model under quasi-independent truncation.
#>
#> Call: coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp,
#> data = aids, ipw_type = "W-1")
#>
#> Model was fit on 295 complete samples (0 dropped due to missingness).
#> Min event time: 0
#> Max event time: 89
#> 295 observed events in the score function, time range [0, 89].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: none
#>
#> Model estimates:
#> coef se zstat pval
#> age_gp4-59 -0.8849401 0.7114121 -1.243921 2.135288e-01
#> age_gp<=4 2.0283216 0.3329256 6.092416 1.112191e-09In this case, the coefficient estimates change under different
weightings but the z-statistics are fairly similar. In Sensitivity
analysis for positivity violations, we find that different weight
choices can have a greater impact in the aids data during
sensitivity analysis. See ?survdt::coxph_indtrunc for more
details on the different weighting options.
Lastly, stratified Cox models can be fit by specifying a
strata_formula. See Stratified Cox models for an
example.
Given a fitted Cox model, the hazard or survival function estimates
can be plotted using the plot_coxsurv() function in the
survdt package. The first input argument should be a model
object from coxph_indtrunc(), while the second input
argument should be a data frame with the covariate values to plot the
conditional survival or hazard function for.
plot_coxsurv(aids_fit,
newdata = aids[c(1,40), ],
target = "survival")
#> Obtaining the baseline hazard with standard errors by calling:
#> coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp,
#> data = aids, basehaz = TRUE, se = TRUE)
plot_coxsurv(aids_fit,
newdata = aids[c(1, 40, 200), ],
ci = FALSE,
target = "cumhaz")
#> Obtaining the baseline hazard by calling:
#> coxph_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~ age_gp,
#> data = aids, basehaz = TRUE, se = FALSE)Note that plot_coxsurv() prints a message telling us it
is computing the baseline hazard and its standard errors, since (by
default) this was not done when fitting the model. Use
basehaz=TRUE when running coxph_indtrunc() to
remove the need to re-compute the baseline hazard for each plot.
The statistical methods implemented in coxph_indtrunc()
require two key assumptions to ensure unbiased inference in the presence
of double truncation. First, the truncation times should be independent
of the event time and covariates within the observable data region,
which is known as quasi-independence. Second, all possible event times
should have a positive probability of being observed, i.e. not being
truncated, which is known as the positivity assumption. The
survdt package includes diagnostics for both of these
assumptions. They are described in the following two sections. Residual
diagnostics for the Cox model specification are available through the
residuals() function (see
survdt::coxph_indtrunc).
The survdt package includes the function
test_quasiindep_covariates() to test for violations of the
quasi-independence assumption. This is accomplished through a two-stage
test based on stratifying by covariate values. The first stage checks
for dependence within strata, while the second stage checks across
strata. The test can be repeated with different stratifications when
there are many covariates involved. See
?survdt::test_quasiindep_covariates for more details.
The first input argument should be a model formula with a
Survdt response and factor variables defining covariate
strata on the right hand side. Non-factor variables will be
automatically coerced to factors. Below we run the diagnostic test on
the aids data, with age group defining the covariate
strata.
qind_test <- test_quasiindep_covariates(Survdt(incu, ltrunc, rtrunc) ~ age_gp,
data = aids)
qind_test
#>
#> Two-stage quasi-independent truncation test involving covariates.
#>
#> Unstratified NPMLE fit based on 295 complete samples (0 dropped due to missingness)
#> with 71 unique event times.
#> Min event time: 0
#> Max event time: 89
#>
#> Stage 1 (within strata quasi-independence).
#> Test statistic: chisq=5.4, df=3
#> Significant evidence not found against quasi-independence (p=0.15)
#>
#> Stage 2 (across strata equality of selection probabilities).
#> Test statistic: 0.24
#> Time range: [0, 89]
#> Time weights: none
#> Critical value: 0.31 at level 0.95
#> Significant evidence not found against equality across strata (p=0.21).
#> Results based on 2000 simulations from the estimated null distribution.
#>
#> Summary of the 3 covariate groups used in the test:
#> stratum size
#> age_gp>59 141
#> age_gp4-59 120
#> age_gp<=4 34Both stages of the test fail to find significant evidence of
dependent truncation in the aids data, so the
quasi-independence assumption required by coxph_indtrunc()
is reasonably satisfied. The test results for the second stage can be
plotted to visualize how strongly the truncation times could depend on
the covariates. This is done by comparing the estimated selection
(non-truncation) probabilities across strata.
The estimated selection probabilities all lie within the 95% uniform confidence band around the unstratified estimates, as expected given the p-value from the stage 2 test printed above.
The survdt package includes the function
positivity_sens_indtrunc() to conduct sensitivity analysis
for the Cox model under potential violations of the positivity
assumption. Given a sequence of potential values for the truncated
baseline probability mass, it returns a corresponding sequence of
adjusted coefficient estimates and associated sensitivity intervals
which show how the fitted model could change due to non-positivity. See
?survdt::positivity_sens_indtrunc for more details.
The first input argument should be a model formula with
Survdt response. The truncated masses are specified through
trunc_mass, and should not be too close to one or the
algorithm may not converge. Many of the other input arguments are shared
with coxph_indtrunc().
First, we apply the sensitivity analysis with standard inverse probability weighting, which does not have a time-varying component.
positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp,
data = aids, ipw_type = "W-1",
trunc_mass = seq(0, 0.11, by = 0.01))
#> Warning in positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, :
#> Sensitivity analysis failed at trunc_mass 0.04 with error message
#> 'Algorithm reached max_iter without convergence'
#>
#> See ?positivity_sens_indtrunc for suggestions to improve convergence.
#>
#> Sensitivity analysis for positivity violations in a Cox model under quasi-
#> independent truncation.
#>
#> Call: positivity_sens_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~
#> age_gp, data = aids, trunc_mass = seq(0, 0.11, by = 0.01),
#> ipw_type = "W-1")
#>
#> Model was fit on 295 complete samples (0 dropped due to missingness).
#> Min event time: 0
#> Max event time: 89
#> 295 observed events in the score function, time range [0, 89].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: none
#>
#> Sensitivity analysis estimates:
#> trunc_mass=0 trunc_mass=0.01 trunc_mass=0.02 trunc_mass=0.03
#> age_gp4-59 -0.8849401 -1.304184 -1.549481 -1.799851
#> age_gp<=4 2.0283216 2.011124 2.017726 2.027230
#> trunc_mass=0.04 trunc_mass=0.05 trunc_mass=0.06 trunc_mass=0.07
#> age_gp4-59 NA NA NA NA
#> age_gp<=4 NA NA NA NA
#> trunc_mass=0.08 trunc_mass=0.09 trunc_mass=0.1 trunc_mass=0.11
#> age_gp4-59 NA NA NA NA
#> age_gp<=4 NA NA NA NANote that the estimator failed to converge even at small truncated
mass values. This can occur for many reasons (see
?survdt::positivity_sens_indtrunc for more information). In
the aids data, it turns out that the estimated selection
probability for the largest observed incubation times approaches zero,
leading to extreme inverse probability weights. Therefore we apply
time-varying weights that reduce the influence of event time outliers
and extreme weights by using the option
ipw_type = "W-asurv". This is the default weighting used in
positivity_sens_indtrunc() and
coxph_indtrunc().
sens <- positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp,
data = aids,
trunc_mass = seq(0, 0.11, by = 0.01))
sens
#>
#> Sensitivity analysis for positivity violations in a Cox model under quasi-
#> independent truncation.
#>
#> Call: positivity_sens_indtrunc(formula = Survdt(incu, ltrunc, rtrunc) ~
#> age_gp, data = aids, trunc_mass = seq(0, 0.11, by = 0.01))
#>
#> Model was fit on 295 complete samples (0 dropped due to missingness).
#> Min event time: 0
#> Max event time: 89
#> 295 observed events in the score function, time range [0, 89].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: W-asurv
#>
#> Sensitivity analysis estimates:
#> trunc_mass=0 trunc_mass=0.01 trunc_mass=0.02 trunc_mass=0.03
#> age_gp4-59 -0.6895787 -0.8425212 -0.9318757 -1.014750
#> age_gp<=4 2.1222687 2.1291125 2.1384747 2.148357
#> trunc_mass=0.04 trunc_mass=0.05 trunc_mass=0.06 trunc_mass=0.07
#> age_gp4-59 -1.097130 -1.181844 -1.270912 -1.366208
#> age_gp<=4 2.158563 2.169015 2.179676 2.190524
#> trunc_mass=0.08 trunc_mass=0.09 trunc_mass=0.1 trunc_mass=0.11
#> age_gp4-59 -1.469787 -1.584160 -1.712650 -1.859948
#> age_gp<=4 2.201545 2.212731 2.224076 2.235577These robust weights improve the numerical convergence of the sensitivity analysis estimators by a substantial amount. The estimates can also be plotted to better visualize the results.
Clearly the coefficient estimate for adults is highly sensitive to potential positivity violations, while the coefficient estimate for children is more stable.
The survdt package supports stratified Cox models, where
the baseline hazard is allowed to differ across strata. In the
coxph_indtrunc() function, stratification factors are
specified through the strata_formula input argument, which
should be a one-sided formula with no response variable. We illustrate
this with a simulated sample featuring non-proportional hazards, which
is included in the survdt package as
nonprop_sample. The data has a binary covariate
group with a time-varying log hazard ratio of \(0.7\times 1(t>0.6)\) and a continuous
covariate x with a constant log hazard ratio of \(-0.5\).
We fit a model stratified by group.
strat_fit <- coxph_indtrunc(Survdt(event_time, ltrunc, rtrunc) ~ x,
data = nonprop_sample,
strata_formula = ~group)
strat_fit
#>
#> IPW Cox model under quasi-independent truncation.
#>
#> Call: coxph_indtrunc(formula = Survdt(event_time, ltrunc, rtrunc) ~
#> x, data = nonprop_sample, strata_formula = ~group)
#>
#> Model was fit on 490 complete samples (0 dropped due to missingness).
#> Min event time: 0.11
#> Max event time: 1.08
#> 490 observed events in the score function, time range [0.11, 1.08].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: W-asurv
#>
#> Model estimates:
#> coef se zstat pval
#> x -0.7192016 0.2409035 -2.985434 0.002831761
#>
#> Summary of the 2 strata:
#> stratum size unique event times
#> group1 260 70
#> group0 230 80
confint(strat_fit)
#> 2.5% 97.5%
#> x -1.191364 -0.2470394Plotting the cumulative hazard estimates for the two groups at the
same x, we see that they are indeed approximately equal
until time 0.6, after which the hazard for group 1 is about \(\exp(0.7)\approx 2\) times as large as the
hazard for group 0.
plot_coxsurv(strat_fit,
newdata = data.frame(group = c(0, 1), x = c(1, 1)),
target = "cumhaz")
#> Obtaining the baseline hazard with standard errors by calling:
#> coxph_indtrunc(formula = Survdt(event_time, ltrunc, rtrunc) ~
#> x, data = nonprop_sample, strata_formula = ~group, basehaz = TRUE,
#> se = TRUE)The survdt package supports time-varying covariates in
the coxph_indtrunc() function. We illustrate this with a
simulated sample featuring non-proportional hazards, which is included
in the survdt package as nonprop_sample. The
data has a binary covariate group with a time-varying log
hazard ratio of \(0.7\times
1(t>0.6)\) and a continuous covariate x with a
constant log hazard ratio of \(-0.5\).
We fit a model with the correct time-varying effect of
group. First, we split each row in the data frame into two
possible observation windows corresponding to \(t\leq 0.6\) and \(t>0.6\). The survdt package
includes the function time_split() for this purpose. See
?survdt::time_split for more details.
split_dat <- time_split(nonprop_sample,
cut = 0.6,
stop_name = "event_time",
start = "start",
event = "event",
id = "id")Then we add a column group_tgr06 containing the
time-varying covariate \(1(t>0.6)\times\)group, and
fit the Cox model using a Survdt2 response.
split_dat$group_tgr06 <- (split_dat$event_time > 0.6) * split_dat$group
tvar_fit <- coxph_indtrunc(Survdt2(start, event_time, event, ltrunc, rtrunc, id) ~
group_tgr06 + x,
data = split_dat)
tvar_fit
#>
#> IPW Cox model under quasi-independent truncation.
#>
#> Call: coxph_indtrunc(formula = Survdt2(start, event_time, event, ltrunc,
#> rtrunc, id) ~ group_tgr06 + x, data = split_dat)
#>
#> Model was fit on 490 complete samples (0 dropped due to missingness).
#> Min event time: 0.11
#> Max event time: 1.08
#> 490 observed events in the score function, time range [0.11, 1.08].
#> Inverse probability of non-truncation weights estimated nonparametrically.
#> Time-varying weight component: W-asurv
#>
#> Model estimates:
#> coef se zstat pval
#> group_tgr06 0.9494582 0.2023921 4.691182 2.716319e-06
#> x -0.7169686 0.2425353 -2.956142 3.115140e-03
confint(tvar_fit)
#> 2.5% 97.5%
#> group_tgr06 0.5527769 1.3461395
#> x -1.1923290 -0.2416082The confidence intervals from the estimated Cox model contain the true values of 0.7 and -0.5.