---
title: "Cox Regression Under Quasi-Independent Double Truncation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Cox Regression Under Quasi-Independent Double Truncation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 5,
  fig.height = 3,
  fig.align = "center"
)
```

```{r setup}
library(survdt)
```

## Inverse probability weighted estimators

The `survdt` package implements several methods for Cox regression with 
quasi-independent double truncation.
We illustrate them through an application to the `aids` dataset, which holds 
records of AIDS incubation times from a retrospective sample of patients with 
transfusion-acquired HIV.
The inclusion criteria were that the patient was diagnosed with AIDS between the
discovery of the disease in 1982 and the end of the study in 1986, with time 
measured in months since HIV infection.
The event time of interest is the AIDS incubation time `incu`, with the truncation 
times contained in `ltrunc` and `rtrunc` respectively. See `?survdt::aids` for more
details.

To fit a Cox model with nonparametric inverse probability weighting, use the
`coxph_indtrunc()` function in the `survdt` package.
The first input argument should be a model formula with a `Survdt` or `Survdt2`
response.
`Survdt()` is used under time-independent covariates (see `?survdt::Survdt`), 
while `Survdt2` is used when there are time-varying covariates (see 
[Time-varying covariates] for an example).
```{r}
aids_fit <- coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                           data = aids)
aids_fit
confint(aids_fit)
```

The default setting applies robust inverse probability of non-truncation 
weights and fits the model by maximizing a weighted partial likelihood.
Several choices of time-varying weights are available through the input 
argument `ipw_type`.
They offer improved robustness to outliers and extreme inverse probability weights
when compared to standard inverse probability weighting (`ipw_type="W-1"`).

```{r}
coxph_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
               data = aids,
               ipw_type = "W-1")
```

In this case, the coefficient estimates change under different weightings but the
z-statistics are fairly similar.
In [Sensitivity analysis for positivity violations], we find that different
weight choices can have a greater impact in the `aids` data during sensitivity
analysis.
See `?survdt::coxph_indtrunc` for more details on the different weighting options.

Lastly, stratified Cox models can be fit by specifying a `strata_formula`.
See [Stratified Cox models] for an example.

### Plotting survival and hazard estimates

Given a fitted Cox model, the hazard or survival function estimates can be
plotted using the `plot_coxsurv()` function in the `survdt` package.
The first input argument should be a model object from `coxph_indtrunc()`, while
the second input argument should be a data frame with the covariate values to 
plot the conditional survival or hazard function for.

```{r, fig.height = 3.5}
plot_coxsurv(aids_fit, 
             newdata = aids[c(1,40), ],
             target = "survival")

plot_coxsurv(aids_fit, 
             newdata = aids[c(1, 40, 200), ], 
             ci = FALSE, 
             target = "cumhaz")
```

Note that `plot_coxsurv()` prints a message telling us it is computing the 
baseline hazard and its standard errors, since (by default) this was not done 
when fitting the model.
Use `basehaz=TRUE` when running `coxph_indtrunc()` to remove the need to re-compute
the baseline hazard for each plot.

### Assumption checking and model diagnostics

The statistical methods implemented in `coxph_indtrunc()` require two key
assumptions to ensure unbiased inference in the presence of double truncation.
First, the truncation times should be independent of the event time and 
covariates within the observable data region, which is known as quasi-independence.
Second, all possible event times should have a positive probability of being
observed, i.e. not being truncated, which is known as the positivity assumption.
The `survdt` package includes diagnostics for both of these assumptions.
They are described in the following two sections.
Residual diagnostics for the Cox model specification are available through the
`residuals()` function (see `survdt::coxph_indtrunc`).

#### Testing quasi-independent truncation with covariates

The `survdt` package includes the function `test_quasiindep_covariates()` to test
for violations of the quasi-independence assumption.
This is accomplished through a two-stage test based on stratifying by covariate
values.
The first stage checks for dependence within strata, while the second stage checks
across strata.
The test can be repeated with different stratifications when there are many 
covariates involved.
See `?survdt::test_quasiindep_covariates` for more details.

The first input argument should be a model formula with a `Survdt` response and
factor variables defining covariate strata on the right hand side.
Non-factor variables will be automatically coerced to factors.
Below we run the diagnostic test on the `aids` data, with age group defining the
covariate strata.
```{r}
qind_test <- test_quasiindep_covariates(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                                        data = aids)
qind_test
```

Both stages of the test fail to find significant evidence of dependent truncation
in the `aids` data, so the quasi-independence assumption required by 
`coxph_indtrunc()` is reasonably satisfied.
The test results for the second stage can be plotted to visualize how strongly
the truncation times could depend on the covariates.
This is done by comparing the estimated selection (non-truncation) probabilities
across strata. 

```{r, fig.height = 3.5}
plot(qind_test)
```

The estimated selection probabilities all lie within the 95% uniform confidence 
band around the unstratified estimates, as expected given the p-value from 
the stage 2 test printed above.

#### Sensitivity analysis for positivity violations

The `survdt` package includes the function `positivity_sens_indtrunc()` to conduct
sensitivity analysis for the Cox model under potential violations of the positivity
assumption.
Given a sequence of potential values for the truncated baseline probability mass, 
it returns a corresponding sequence of adjusted coefficient estimates and associated 
sensitivity intervals which show how the fitted model could change due to 
non-positivity.
See `?survdt::positivity_sens_indtrunc` for more details.

The first input argument should be a model formula with `Survdt` response.
The truncated masses are specified through `trunc_mass`, and should not be too
close to one or the algorithm may not converge.
Many of the other input arguments are shared with `coxph_indtrunc()`.

First, we apply the sensitivity analysis with standard inverse probability 
weighting, which does not have a time-varying component.
```{r error=TRUE, warning=TRUE}
positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                         data = aids, ipw_type = "W-1",
                         trunc_mass = seq(0, 0.11, by = 0.01))

```

Note that the estimator failed to converge even at small truncated mass values.
This can occur for many reasons (see `?survdt::positivity_sens_indtrunc` for
more information).
In the `aids` data, it turns out that the estimated selection probability for the
largest observed incubation times approaches zero, leading to extreme inverse
probability weights.
Therefore we apply time-varying weights that reduce the influence of event time
outliers and extreme weights by using the option `ipw_type = "W-asurv"`.
This is the default weighting used in `positivity_sens_indtrunc()` and 
`coxph_indtrunc()`.

```{r}
sens <- positivity_sens_indtrunc(Survdt(incu, ltrunc, rtrunc) ~ age_gp, 
                                 data = aids,
                                 trunc_mass = seq(0, 0.11, by = 0.01))
sens

```

These robust weights improve the numerical convergence of the sensitivity analysis
estimators by a substantial amount.
The estimates can also be plotted to better visualize the results.

```{r}
plot(sens)
plot(sens, hazratio = TRUE)
```

Clearly the coefficient estimate for adults is highly sensitive to potential
positivity violations, while the coefficient estimate for children is more 
stable.


### Stratified Cox models

The `survdt` package supports stratified Cox models, where the baseline hazard
is allowed to differ across strata.
In the `coxph_indtrunc()` function, stratification factors are specified through
the `strata_formula` input argument, which should be a one-sided formula with
no response variable.
We illustrate this with a simulated sample featuring non-proportional hazards,
which is included in the `survdt` package as `nonprop_sample`.
The data has a binary covariate `group` with a time-varying log hazard ratio of 
$0.7\times 1(t>0.6)$ and a continuous covariate `x` with a constant log hazard
ratio of $-0.5$.

We fit a model stratified by `group`.
```{r}
strat_fit <- coxph_indtrunc(Survdt(event_time, ltrunc, rtrunc) ~ x, 
                            data = nonprop_sample, 
                            strata_formula = ~group)
strat_fit
confint(strat_fit)
```

Plotting the cumulative hazard estimates for the two groups at the same `x`, 
we see that they are indeed approximately equal until time 0.6, after which the
hazard for group 1 is about $\exp(0.7)\approx 2$ times as large as the hazard
for group 0.

```{r, fig.height=3.5}
plot_coxsurv(strat_fit, 
             newdata = data.frame(group = c(0, 1), x = c(1, 1)),
             target = "cumhaz")
```


### Time-varying covariates

The `survdt` package supports time-varying covariates in the `coxph_indtrunc()`
function.
We illustrate this with a simulated sample featuring non-proportional hazards,
which is included in the `survdt` package as `nonprop_sample`.
The data has a binary covariate `group` with a time-varying log hazard ratio of 
$0.7\times 1(t>0.6)$ and a continuous covariate `x` with a constant log hazard
ratio of $-0.5$.

We fit a model with the correct time-varying effect of `group`.
First, we split each row in the data frame into two possible observation windows
corresponding to $t\leq 0.6$ and $t>0.6$.
The `survdt` package includes the function `time_split()` for this purpose.
See `?survdt::time_split` for more details.

```{r}
split_dat <- time_split(nonprop_sample, 
                        cut = 0.6, 
                        stop_name = "event_time", 
                        start = "start",
                        event = "event",
                        id = "id")
```

Then we add a column `group_tgr06` containing the time-varying covariate 
$1(t>0.6)\times$`group`, and fit the Cox model using a `Survdt2` response.
```{r}
split_dat$group_tgr06 <- (split_dat$event_time > 0.6) * split_dat$group
tvar_fit <- coxph_indtrunc(Survdt2(start, event_time, event, ltrunc, rtrunc, id) ~ 
                             group_tgr06 + x, 
                           data = split_dat)
tvar_fit
confint(tvar_fit)
```

The confidence intervals from the estimated Cox model contain the true values 
of 0.7 and -0.5.
