Likelihood functions described

Likelihood functions in this package

The following describes the negative log-likelihood (nll) functions available in this package for estimating the rate of background mortality and virulence. The negative of the log-likelihood is returned as this is required as input for the maximum likelihood estimation of parameters by the function mle2 in the package bbmle by Ben Bolker and the R Core Development team.

In each case ;

Indices '1' and '2' denote background mortality and mortality due to infection, respectively
d a death indicator; '1' died during experiment, '0' right-censored during or at end of experiment
g an infection-treatment indicator; '1' infected treatment, '0' uninfected treatment.
All models assume each member of an uninfected treatment is uninfected, but not all models assume each member of an infected treatment is infected.
Some models assume all members of an infected treatment are infected, but allow for variation in virulence among members for the population; this variation can be discreately or continously distributed, and be directly observed or assumed to be present.
There is also a model allowing for recovery from infection.

The functions:

nll_basic

This function is for the 'basic' log-likelihood expression described above. It assumes all the individuals in the infected population are infected and they all experience the same pattern of mortality due to infection, i.e. virulence is homogeneous for the population.

\[\begin{align} S_{OBS.INF}(t) &= S_1(t) \cdot S_2(t) \\ \\ f_{OBS.INF}(t) &= f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t) \\ \\ h_{OBS.INF}(t) &= f_{OBS.INF}(t) \;/\; S_{OBS.INF}(t) \\ \\ &= h_1(t) + h_2(t) \end{align}\]

Hence the observed rate of mortality in the infected population at time t is equal to the the rate of background mortality at time time t, h₁(t), plus the rate of mortality due to infection at time t, h₂(t), that is, the pathogen's virulence at time t.

The log-likelihood expression for the probability of dying at time t in the infected and uninfected treatments is,

\[\begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[h_1(t_i) + g \cdot h_2(t_i)\right] +\log\left[S_1(t_i)\right] + g \cdot \log\left[S_2(t_i)\right] \right\rbrace \end{equation}\]

back to list

nll_basic_logscale

As for nll_basic, except input values of location and scale parameters are assumed to be on a logscale.

back to list

nll_controls

This function is for uninfected or control data only. It assumes the observed pattern of mortality is due only to background mortality,

\[\begin{align} S_{OBS.UNINF}(t) &= S_1(t) \\ \\ f_{OBS.UNINF}(t) &= f_1(t) \\ \\ h_{OBS.UNINF}(t) &= f_1(t) \;/\; S_1(t) \\ \\ &= h_1(t) \end{align}\]

The log-likelihood expression for the probability of dying at time t in the uninfected treatment is,

\[\begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[ h_1(t_i)\right] +\log\left[ S_1(t_i) \right] \right\rbrace \end{equation}\]

back to list

nll_exposed_infected

Exposure to infection does not garantee infection. This function assumes only a proportion p of the individuals exposed to infection experience both background mortality and an increased rate of mortality due to infection, while the remaining proportion (1 - p) experience only background mortality.

The expressions describing the observed patterns of mortality in the infected treatment are,

\[\begin{align} S_{OBS.INF}(t) &= p \cdot [S_1(t) \cdot S_2(t)] + (1-p) \cdot [S_1(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_2(t) + f_2(t) \cdot S_1(t)] + (1-p) \cdot f_1(t) \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align}\]

where p is a constant to be estimated; 0 ≤ p ≤ 1. Here the pattern of mortality experienced by members of the treatment exposed to infection is no longer homogeneous as it varies according to whether an individual is infected or not.

The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is,

\[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} infected & \Rightarrow d \log \left[ h_{OBS.INF}(t_i)\right] +\log\left[ S_{OBS.INF}(t_i) \right] \\ \\ uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log\left[ S_1(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]

This type of model is sometimes referred to as a 'cure' model [1]. This is not because infected hosts recover from infection, but rather because there will be proportionately fewer infected individuals in the 'infected' population over time due to their higher rate of mortality, i.e., the 'infected' population is progressively 'cured' of its infected members.

back to list

nll_frailty

This function assumes all the individuals in an infected treatment are infected, but there is unobserved variation in the pathogen's virulence. It is assumed there is an underlying rate of mortality due to infection, h_V(t), which is multiplied by a constant, e.g., \(\lambda\), where \(\lambda\) is distributed as a continuous random variable with a mean value of 1. 'Frail' individuals with values of \(\lambda\) > 1 tend to die earlier than those with \(\lambda\) < 1 [2,3].

In this package \(\lambda\) is assumed to follow either the gamma or inverse Gaussian distribution. In the case of the gamma distribution, the hazard function for mortality due to infection, h₂(t), is

\[\begin{equation} h_2(t) = \frac{h_V(t)}{1 + \theta H_V(t)} \end{equation}\]

where H_V(t) is the cumulative hazard function for the underlying pattern of virulence at time t, h_V(t), and \(\theta\) is a constant describing the variance of the distribution in the rate of mortality and a parameter to be estimated.

The corresponding cumulative survival function for mortality due to infection, S₂(t), is

\[\begin{equation} S_2(t) = [1 + \theta H_v(t)]^{-1/\theta} \end{equation}\]

In the case of the inverse Gaussian distribution, the hazard function for mortality due to infection, h₂(t), is

\[\begin{equation} h_2(t) = \frac{h_V(t)}{[1 + 2 \theta H_V(t)]^{1/2}} \end{equation}\]

and the cumulative survival function, S₂(t),

\[\begin{equation} S_2(t) = \exp \left\{ \frac{1}{\theta} \cdot \left( 1 - [1 + 2 \theta H_V(t)]^{1/2} \right) \right\} \end{equation}\]

In both cases the overall log-likelihood expression for the observed mortality in infected and uninfected treatments is,

\[\begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[ h_1(t_i) + g \cdot h_2(t_i) \right] +\log\left[ S_1(t_i) \right] + g \cdot \log[S_2(t_i)] \right\rbrace \end{equation}\]

with the expressions for h₂(t) and S₂(t) corresponding with the probability distribution describing the unobserved variation in virulence.

back to list

nll_frailty_correlated

This function allows for separate, but positively correlated, frailty effects acting on background mortality and mortality due to infection, where the strength of this correlation is a variable to be estimated [4].

The observed rate of mortality due to infection at time t in an infected population, h_OBS.INF(t), equals

\[\begin{equation} h_{OBS.INF}(t) = h_1(t) + h_2(t) - \rho \sqrt{\theta_1} \sqrt{\theta_2} \frac{h_1(t)H_V(t) + h_2(t)H_B(t)}{1 + \theta_B H_B(t) + \theta_V H_V(t)} \end{equation}\]

where h₁(t) and h₂(t) are the population wide rates of mortality due to the background mortality and that due to infection at time t, respectively. H_B(t) and H_V(t) are the cumulative hazard functions for the underlying background mortality and the underlying virulence of the pathogen at time t, respectively; \(\theta_B\) and \(\theta_V\) are the variances of the unobserved variation in the background mortality and that due to infection, respectively, and constants to be estimated. \(\rho\) is the strength of the positive correlation between the two frailty effects (\(\rho\) ≥ 0). Estimates of \(\rho\) will tend towards zero as the difference in the variance of the two frailty effects increases.

The population wide rate of background mortality at time t, h₁(t), is

\[\begin{equation} h_1(t) = \frac{h_B(t)}{1 + \theta_B H_B(t)} \\ \end{equation}\]

where h_B(t) and H_B(t) are the hazard and cumulative hazard functions for the underlying rate of background mortality at time t, respectively. The population wide rate of mortality due to infection at time t, h₂(t), is

\[\begin{equation} h_2(t) = \frac{h_V(t)}{1 + \theta_V H_V(t)} \end{equation}\]

where h_V(t) and H_V(t) are the hazard and cumulative hazard functions for the underlying rate of mortality due to infection at time t, respectively.

The observed cumulative survival of infected hosts at time t, S_OBS.INF(t), is given by

\[\begin{equation} S_{OBS.INF}(t) = \left[S_1(t)^{-\theta_B} + S_2(t)^{-\theta_V} - 1 \right]^{- \rho / \sqrt{\theta_B} \sqrt{\theta_V}} \cdot S_1(t)^{1 - \rho \sqrt{\theta_B} \sqrt{\theta_V}} \cdot S_2(t)^{1 - \rho \sqrt{\theta_V} \sqrt{\theta_B}} \\ \\ \end{equation}\]

where

\[\begin{equation} S_1(t) = \left[ 1 + \theta_B H_B(t) \right]^{-1 / \theta_B} \\ \end{equation}\]

and

\[\begin{equation} S_2(t) = \left[ 1 + \theta_V H_V(t) \right]^{-1 / \theta_V} \\ \end{equation}\]

The expressions S₁(t) and S₂(t) correct for an error in the original paper where the cumulative hazard terms were multiplied by \(\sqrt{\theta_i}\), instead of \(\theta_i\); i = U, V [4].

back to list

nll_frailty_logscale

This function is identical to the function nll_frailty except it values for location and scale parameters are on a logscale; this can help convergence during maximum likelihood estimation.

back to list

nll_frailty_shared

In this function it is assumed there is an underlying rate of background mortality, h₁(t), and an underlying rate of mortality due to infection, h₂(t), and both are multiplied by a constant, e.g., \(\lambda\),

\[\begin{equation} h(t) = \lambda \left[ h_1(t) + h_2(t) \right] \end{equation}\]

where \(\lambda\) is distributed as a continuous random variable drawn from a gamma distribution with a mean of 1 and variance \(\theta\) [4].

In this case, the observed rate of mortality in the infected treatment, h_OBS.INF(t), at time t will be,

\[\begin{equation} h_{OBS.INF}(t) = \frac{h_1(t) + h_2(t)}{1 + \theta [H_1(t) + H_2(t)]} \end{equation}\]

the observed cumulative survival at time t, S_OBS.INF(t), will be,

\[\begin{equation} S_{OBS.INF}(t) = \left(1 + \theta \left[H_1(t) + H_2(t) \right] \right)^{-1/\theta} \end{equation}\] and the corresponding likelihood model, \[\begin{equation} \log L=\sum_{i=1}^{n}\left\lbrace d \log \left[\frac{h_1(t_i) + gh_2(t_i)}{1 + \theta[H_1(t_i) + gH_2(t_i)]} \right] +\log \left[(1 + \theta[H_1(t_i) + gH_2(t_i)])^{-1/\theta} \right] \right\rbrace \end{equation}\]

where g is an infection indicator taking a value of '1' for an infected treatment and '0' for an uninfected or control treatment.

See the original paper [4] for a discussion of the shortcomings of this type of model.

back to list

nll_proportional_virulence

This function assumes a proportional hazards relationship for virulence among infected treatments within an experiment, such that,

\[\begin{equation} h_A(t) \;/\; h_B(t) = c \end{equation}\]

where h_A(t) and h_B(t) are the hazard functions describing pathogen virulence in infected treatments A and B at time t, respectively, and c is a constant.

A treatment of infected hosts is chosen as the reference population. The observed patterns of survival in this treatment are described in the same way as for the function nll_basic,

\[\begin{align} S_{OBS.REF}(t) &= S_1(t) \cdot S_{REF}(t) \\ \\ f_{OBS.REF}(t) &= f_1(t) \cdot S_{REF}(t) + f_{REF}(t) \cdot S_1(t) \\ \\ h_{OBS.REF}(t) &= f_{OBS.REF}(t) \;/\; S_{OBS.REF}(t) \\ \\ &= h_1(t) + h_{REF}(t) \end{align}\]

where the indice REF indicates the reference treatment or population of infected hosts.

The patterns of survival and mortality in other infected treatments are estimated as a function of those for the reference population,

\[\begin{align} S_{OBS.ALT}(t) &= S_1(t) \cdot \left[ S_{REF}(t) \right]^\theta \\ \\ f_{OBS.ALT}(t) &= f_1(t) \cdot \left[ S_{REF}(t) \right]^\theta + \theta \cdot \left[ S_{REF}(t) \right]^{\theta - 1} \cdot f_{REF}(t) \cdot S_1(t) \\ \\ h_{OBS.ALT}(t) &= f_{OBS.ALT}(t) \;/\; S_{OBS.ALT}(t) \\ \\ &= h_1(t) + \theta \cdot h_{REF}(t) \end{align}\]

where the indice ALT indicates and alternative infected treatment, and \(\theta\) is a constant scaling the pathogen's virulence relative to that in the reference population; \(\theta > 0\)

The overall log-likelihood expression for the observed pattern of mortality in the infected and uninfected treatments is,

\[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninf & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ ref & \Rightarrow d \log \left[ h_1(t_i) + h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{REF}(t_i) \right] \\ \\ alt & \Rightarrow d \log \left[ h_1(t_i) + \theta h_{REF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \theta \log \left[ S_{REF}(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]

where uninf, ref and alt refer to the uninfected, infected reference and alternative infected treatments, respectively.

back to list

nll_recovery

This function allows for infected hosts that recover from infection. It assumes that all hosts in an infected treatment are initially infected and experience increased mortality due to infection. However hosts can recover from infection, such that, recovered individuals only experience background mortality equal to that of hosts in a matching uninfected or control treatment.

The timing of recovery from infection is assumed to follow a probability distribution and to be independent of the probability distributions for the timing of background mortality and mortality due to infection. Hence the pattern of events in an infected treatment at time t, S_INF.POP(t), can be expressed as the product of three independent probability distributions,

\[\begin{equation} S_{INF.POP}(t) = S_1(t) \cdot S_2(t) \cdot S_3(t) \end{equation}\]

where S₁(t) is the cumulative survival function for background mortality at time t, S₂(t) is the cumulative survival function for mortality due to infection at time t, and S₃(t) is the cumulative probaility that an infection 'survives' until time t. Here the index 'INF.POP' is used rather than 'OBS.INF' as recovery from infection may not be an observed event.

Differentiating the above with respect to time and taking the negative gives the probability density function, f_INF.POP(t), for events occurring in the population at time t, \[\begin{equation} \begin{aligned} f_{INF.POP}(t) &= f_1 \left(t\right) \cdot S_2\left(t\right) \cdot S_3\left(t\right) \\ &+f_2 \left(t\right) \cdot S_1\left(t\right) \cdot S_3\left(t\right) \\ &+f_3 \left(t\right) \cdot S_1\left(t\right) \cdot S_2\left(t\right) \end{aligned} \end{equation}\]

where the sum of the first two expressions gives the probability an infected host is infected and dies at time t, from either background mortality or mortality due to infection, and corresponds with data collected on the time of death of infected hosts.

The third expression describes the probability an infected host is alive and recovers from infection at time t, and corresponds with data collected on the timing of recovery of infected hosts. Hence the expression can be estimated when the timing of recovery is known. This will not be the case if a host's recovery status is only determined after the host has died or been censored, as the data collected correspond with the time hosts recovered and subsequently survived before dying or being censored. However it is assumed recovered individuals experience the same background mortality as uninfected hosts. This information can be used to estimate the likelihood a recovered individual dying at time t, recovered at an earlier time and subsequently survived until time t, when it died or was censored. For example, in an experiment recording survival daily, the probability a recovered individual dies on the second day t₂ can be estimated from,

\[\begin{equation*} \begin{aligned} & \left[ f_3(t_2) \cdot S_1(t_2) \cdot S_2(t_2) \right] \cdot h_1(t_2) \; + \nonumber \\ & \left[ f_3(t_1) \cdot S_1(t_1) \cdot S_2(t_1) \right] \cdot \left[ S_1(t_2)\;/\;S_1(t_1) \right] \cdot h_1(t_2) \end{aligned} \end{equation*}\]

where the first line gives the probability an individual survives until and recovers on the second day, multiplied by the background rate of mortality on day 2. The second line gives the probability an individual recovered on the first day, survived background mortality from day 1 to day 2, S₁(t₂) / S₁(t₁), and died of background mortality on day 2. Multiplication by h₁(t₂) would be omitted in cases where a recovered individual was censored at the end of day 2. Hence observed data for the times when recovered individuals die or are censored can be used to estimate the unobserved distribution of recovery times.

The nll_recovery function assumes all the individuals in an infected treatment were initially infected and their recovery status is known at the time of their death or censoring, i.e., still infected vs. recovered. This information is used along with data on the timing of death or censoring in a matching uninfected or control treatment to calculate the likelihood of the events being described by the model.

NB This function requires the data to be specified in a specific format different to that of other functions.

back to list

nll_recovery_II

This is essentially the same model nll_recovery, except it assumes there was no background mortality, such that, S₁(t) = 1 and f₁(t) = 0 throughout the experiment.

In such cases the only observed dynamics will be the death of infected hosts due to infection, before they recover from infection, f₂(t)S₁(t)S₃(t), or f₂(t)S₃(t) as S₁(t) = 1.

As recovered hosts are assumed to only experience the same background mortality as individuals in a matching control treatment, any recovered individuals will all survive to be right-censored at the end of the experiment.

back to list

nll_two_inf_subpops_obs

This function assumes all the individuals in an infected treatment are infected but there is data identifying the presence of two discrete subpopulations, e.g., A and B in proportions p and (1 - p), respectively, where the additional rate of mortality due to infection in the two subpopulations is different, e.g., hosts dying with/without visible signs of infection or those with viral titres above/below a threshold value.

The observed patterns of mortality in the infected treatment as a whole will be,

\[\begin{align} S_{OBS.INF}(t) &= p \cdot [ S_1(t) \cdot S_{2A}(t)] + (1-p) \cdot [S_1(t) \cdot S_{2B}(t)] \\ \\ f_{OBS.INF}(t) &= p \cdot [f_1(t) \cdot S_{2A}(t) + f_{2A}(t) \cdot S_1(t)] + (1-p) \cdot [f_1(t) \cdot S_{2B}(t) + f_{2B}(t) \cdot S_1(t)] \\ \\ h_{OBS.INF}(t) &= S_{OBS.INF}(t) \;/\; f_{OBS.INF}(t) \\ \\ \end{align}\]

where the indices 2A and 2B denote the two discrete subpopulations of infected hosts.

Mortality in the two subpopulations is assumed to act independently, consequently the likelihood expressions for each population can be estimated separately based on their membership of subpopulation A or B,

\[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ infected \; A & \Rightarrow d \log \left[ h_1(t_i) + h_{2A}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{2A}(t_i) \right] \\ \\ infected \; B & \Rightarrow d \log \left[ h_1(t_i) + h_{2B}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{2B}(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]

where each subpopulation shares the same background mortality as a matching control treatment.

back to list

nll_two_inf_subpops_unobs

This function is similar to the function nll_two_obs_inf_subpops, except the presence of two discrete subpopulations in proportions p and (1 - p) is assumed rather than based on evidence, e.g., data on visual signs of infection or viral titres were not recorded. In this case, p is a parameter to be estimated (0 ≤ p ≤ 1).

The observed patterns of mortality will be as described for nll_two_obs_inf_subpops, but the likelihood expression for the two presumed subpopulations can not be estimated separately as the data are not available to identify them, \[\begin{align} \log L=\sum_{i=1}^{n}match(treatment) \begin{cases} uninfected & \Rightarrow d \log \left[ h_1(t_i)\right] +\log \left[ S_1(t_i) \right] \\ \\ infected & \Rightarrow d \log \left[ h_1(t_i) + h_{OBS.INF}(t_i) \right] +\log \left[ S_1(t_i) \right] + \log \left[ S_{OBS.INF}(t_i) \right] \\ \end{cases} \\ \\ \end{align}\]

hence the patterns of mortality and the proportions of the two supposed subpopulations needs to be inferred from the pattern of mortality for the infected population as a whole.

back to list

Back to top

References

1. Lambert PC. 2007 Modeling of the cure fraction in survival studies. Stata Journal 7, 351–375.

2. Aalen OO. 1988 Heterogeneity in survival analysis. Statistics in Medicine 7, 1121–1137. (doi:10.1002/sim.4780071105)

3. Hougaard P. 1984 Life table methods for heterogeneous populations - distributions describing the heterogeneity. Biometrika 71, 75–83. (doi:10.1093/biomet/71.1.75)

4. Zahl PH. 1997 Frailty modelling for the excess hazard. Statistics in Medicine 16, 1573–1585. (doi:10.1002/(sici)1097-0258(19970730)16:14<1573::aid-sim585>3.0.co;2-q)

Likelihood functions described

Introduction

Log-likelihood expressions: no censoring

Log-likelihood expressions: with censoring

Log-likelihood expressions: relative survival, with censoring

Likelihood functions in this package

The functions:

nll_basic

nll_basic_logscale

nll_controls

nll_exposed_infected

nll_frailty

nll_frailty_correlated

nll_frailty_logscale

nll_frailty_shared

nll_proportional_virulence

nll_recovery

nll_recovery_II

nll_two_inf_subpops_obs

nll_two_inf_subpops_unobs

Back to top

References