Chapter 9 Multivariate Survival Models

Sometimes survival data come in clusters, and multivariate, or frailty, models are appropriate to use.

Ever since the paper by Vaupel, Manton, and Stallard (1979), the concept of frailty has spread in even wider circles of the research community. Although their primary purpose was to show various consequences of admitting individual frailties (“individuals age faster than cohorts”, due to the selection effect), the effect was that people started to implement their frailty model in Cox regression models.

9.1 An Introductory Example

Let us assume that in a follow-up study, the cohort is not homogeneous but instead consists of two equally sized groups with differing hazard rates. Assume further that we have no indication of which group an individual belongs to, and that members of both groups follow an exponential life length distribution: \[\begin{equation*} \begin{split} h_1(t) &= \lambda_1 \\ h_2(t) &= \lambda_2 \\ \end{split} \qquad t > 0. \end{equation*}\] This implies that the corresponding survival functions \(S_1\) and \(S_2\) are \[\begin{equation*} \begin{split} S_1(t) &= e^{-\lambda_1 t} \\ S_2(t) &= e^{-\lambda_2 t} \\ \end{split} \qquad t > 0, \end{equation*}\] and a randomly chosen individual will follow the “population mortality” \(S\), which is a mixture of the two distributions: \[\begin{equation*} S(t) = \frac{1}{2} S_1(t) + \frac{1}{2} S_2(t), \quad t > 0. \end{equation*}\] Let us calculate the hazard function for this mixture. We start by finding the density function \(f\): \[\begin{equation*} f(t) = -\frac{dS(x)}{dx} = \frac{1}{2}\left(\lambda_1 e^{-\lambda_1 t} + \lambda_2 e^{-\lambda_2 t} \right), \quad t > 0. \end{equation*}\] Then, by the definition of \(h\) we get \[\begin{equation} h(t) = \frac{f(t)}{S(t)} = \omega(t) \lambda_1 + \big(1 - \omega(t)\big) \lambda_2, \quad t > 0, \tag{9.1} \end{equation}\] with \[\begin{equation*} \omega(t) = \frac{e^{-\lambda_1 t}}{e^{-\lambda_1 t} + e^{-\lambda_2 t}} \end{equation*}\] It is easy to see that \[\begin{equation*} \omega(t) \rightarrow \left\{ \begin{array}{ll} 0, & \lambda_1 > \lambda_2 \\ \frac{1}{2}, & \lambda_1 = \lambda_2 \\ 1, & \lambda_1 < \lambda_2 \end{array} \right. , \quad \mbox{as } t \rightarrow \infty, \end{equation*}\] implying that \[\begin{equation*} h(t) \rightarrow \min(\lambda_1, \lambda_2), \quad t \rightarrow \infty, \end{equation*}\] see Figure 9.1.

Population hazard function (solid line). The dashed lines are the hazard functions of each group, constant at 1 and 2.

FIGURE 9.1: Population hazard function (solid line). The dashed lines are the hazard functions of each group, constant at 1 and 2.

The important point here is that it is impossible to tell from data alone whether the population is homogeneous, with all individuals following the same hazard function given by equation (9.1), or if it in fact consists of two groups, each following a constant hazard rate. Therefore, individual frailty models like \(h_i(t) = Z_i h(t), \quad i = 1, \ldots, n\), where \(Z_i\) is the “frailty” for individual No. \(i\), and \(Z_1, \ldots, Z_n\) are independent and identically distributed (iid) are less useful.

A heuristic explanation to all this is the dynamics of the problem: We follow a population (cohort) over time, and the composition of it changes over time. The weaker individuals die first, and the proportion stronger will steadily grow as time goes by.

Another terminology is to distinguish between individual and population hazards. In Figure 9.1 the solid line is the population hazard, and the dashed lines represent the two kinds of individual hazards present. Of course, in a truly homogeneous population, these two concepts coincide.

9.2 Frailty Models

Frailty models in survival analysis correspond to hierarchical models in linear or generalized linear models. They are also called mixed effects models. A general theory, with emphasis on using R, of mixed effects models can be found in Pinheiro and Bates (2000).

9.2.1 The simple frailty model

Vaupel, Manton, and Stallard (1979) described an individual frailty model, \[\begin{equation*} h(t; \mathbf{x}, Z) = h_0(t) Z e^{\boldsymbol{\beta} \mathbf{x}}, \quad t > 0, \end{equation*}\] where \(Z\) is assumed to be drawn independently for each individual. Hazard rates for ``random survivors’’ are not proportional, but converging (to each other) if the frailty distribution has finite variance. Thus, the problem may be less pronounced in AFT than in PH regression. However, as indicated in the introductory example, with individual frailty the identification problems are large, and such models are best avoided.

9.2.2 The shared frailty model

Frailty models work best when there is a natural grouping of the data, so that observations from the same group are dependent, while two individual survival times from different groups can be regarded as independent. Such a model may be described as \[\begin{equation} h_i(t; \mathbf{x}) = h_{i0}(t) e^{\boldsymbol{\beta}\mathbf{x}}, \quad i = 1, \ldots, s;\; t > 0, \tag{9.2} \end{equation}\] which simply is a stratified Cox regression model. By assuming
\[\begin{equation} h_{i0}(t) = Z_i h_0(t), \quad i = 1, \ldots s;\; t > 0, \tag{9.3} \end{equation}\] the traditional multivariate frailty model emerges. Here it is assumed that \(Z_1, \ldots, Z_s\) are independent and identically distributed (iid), usually with a lognormal distribution. From equations (9.2) and (9.3) we get, with \(U_i = \log(Z_i)\), \[\begin{equation*} h_i(t; \mathbf{x}) = h_{0}(t) e^{\boldsymbol{\beta}\mathbf{x} + U_i}, \quad i = 1, \ldots, s;\; t > 0. \end{equation*}\] In this formulation, \(U_1, \ldots, U_s\) are iid normal with mean zero and unknown variance \(\sigma^2\). Another popular choice of distribution for the \(Z\):s is the gamma distribution.

In R, the package coxme (T. Therneau 2020) fits frailty models. We look at the fertility data set in the R package eha, see Table ??.

It seems natural to assume that the lengths of birth intervals vary with mother, so we try a frailty model with id (mother’s id) as the grouping variable. Also notice that the first interval for each woman is measured from marriage (only married women are included in this data set) to first birth, so we will start by removing them. They are characterized by parity being 0.

fe <- fert[fert$parity != 0, ]
library(coxme)
fit <- coxme(Surv(next.ivl, event) ~ age + ses + parity + 
             (1 | id), data = fe)
summary(fit, digits = 3)
Cox mixed-effects model fit by maximum likelihood
  Data: fe
  events, n = 8458, 10312
  Iterations= 15 65 
                    NULL Integrated    Fitted
Log-likelihood -71308.31  -69905.17 -68211.87

                  Chisq   df p  AIC   BIC
Integrated loglik  2806    6 0 2794  2752
 Penalized loglik  6193 1212 0 3770 -4764

Model:  Surv(next.ivl, event) ~ age + ses + parity + (1 | id) 
Fixed coefficients
              coef exp(coef) se(coef)      z       p
age        -0.0796     0.923  0.00417 -19.11 0.00000
sesunknown -0.0779     0.925  0.06067  -1.28 0.20000
sesupper    0.1009     1.106  0.15799   0.64 0.52000
seslower   -0.1703     0.843  0.04967  -3.43 0.00061
parity     -0.1141     0.892  0.01014 -11.26 0.00000

Random effects
 Group Variable  Std Dev Variance
 id    Intercept 0.752   0.565   

The estimates of the fixed effects have the same interpretation as in ordinary Cox regression. The question is if the results point to the significance of including frailty terms? In the last line of the output we get the estimate of the frailty variance, \(\sigma^2 = 0.565\), but no \(p\)-value for the test of the null hypothesis \(H_0: \sigma = 0\). One explanation to this is that ordinary asymptotic theory does not hold for parameter values at the boundary of the parameter space, the value \(\sigma = 0\) is on that boundary.

One way to get a feeling for the impact of the frailty effect is to fit the same model but without frailty, i.e., the term (1 | id), see Table 9.1.

source("R/unicode.R")
fit0 <- coxreg(Surv(next.ivl, event) ~ age + ses + parity, 
               data = fe)
cap = "Fertility data, no-frailty model."
lab = "nofrail9"
fit.out(fit0, caption = cap, label = lab)
TABLE 9.1: Fertility data, no-frailty model.
Covariate level W_mean Coef HR SE LR_p
age 33.088 -0.0802 0.9229 0.0027 0.0000
ses 0.0002
farmer 0.491 0 1
unknown 0.186 -0.0845 0.9190 0.0298
upper 0.018 0.1467 1.1580 0.0829
lower 0.305 -0.0833 0.9201 0.0256
parity 4.242 0.0126 1.0127 0.0069 0.0680
Max Log Likelihood -70391.1

We can compare the two “max log likelihoods,” in the frailty model the “Integrated” value −69905, and in the fixed effects case −70391. The difference is so large (486) that we safely can reject the hypothesis that the frailty model is not needed. As an “informal” test, you could take twice that difference and treat it as as a \(\chi^2\) statistic with 1 degree of freedom, calculate a \(p\)-value and take as real \(p\)-value one half of that (all this because ordinary asymptotic theory does not hold for parameter values on the boundary of the parameter space!). This gives an approximation of the true \(p\)-value that is not too bad.

As a final example, let us look back at old age mortality in the R package eha. This example also shows a dangerous situation that is too easy to overlook. It has nothing to do with frailty, but with a problem caused by missing data.

Take a look at the variables in oldmort (Table ??):

The variable m.id is mother’s id. Siblings will have the same value on that variable, and we can check whether we find a “sibling effect” in the sense that siblings tend to have a similar risk of dying.

Cox mixed-effects model fit by maximum likelihood
  Data: oldmort
  events, n = 888, 3340 (3155 observations deleted due to missingness)
  Iterations= 16 84 
                    NULL Integrated    Fitted
Log-likelihood -5702.662   -5693.43 -5646.252

                   Chisq    df          p   AIC     BIC
Integrated loglik  18.46  4.00 1.0012e-03 10.46   -8.69
 Penalized loglik 112.82 49.23 6.7155e-07 14.36 -221.39

Model:  Surv(enter, exit, event) ~ sex + civ + (1 | m.id) 
Fixed coefficients
                 coef exp(coef)   se(coef)     z       p
sexfemale  -0.2026325 0.8165783 0.07190559 -2.82 0.00480
civmarried -0.4653648 0.6279060 0.12144548 -3.83 0.00013
civwidow   -0.3305040 0.7185615 0.11964284 -2.76 0.00570

Random effects
 Group Variable  Std Dev    Variance  
 m.id  Intercept 0.23719204 0.05626007
Now, compare with the corresponding fixed effects model in Table 9.2.
TABLE 9.2: Old age mortality, fixed effects model.
Covariate level W_mean Coef HR SE LR_p
sex 0.0000
male 0.406 0 1
female 0.594 -0.2434 0.7840 0.0474
civ 0.0000
unmarried 0.080 0 1
married 0.530 -0.3972 0.6722 0.0812
widow 0.390 -0.2613 0.7700 0.0787
Max Log Likelihood -13558

Note that we now got a very much smaller value of the maximized log likelihood, −13558 compared to −5693! Something is wrong, and the big problem is that the two analyzes were performed on different data sets. How is that possible, we used oldmort on both occasions? The variable m.id has a lot of missing values, almost 50 per cent are missing (NA), and the standard treatment of NA:s in R is to simply remove each record that contains an NA on any of the variables in the analysis. So, in the first case, the frailty model, a lot of records are removed before analysis, but not in the second. To be able to compare the models we must remove all records with m.id = NA from the second analysis, see Table 9.3.

olm <- oldmort[!is.na(oldmort$m.id), ]
fit0 <- coxreg(Surv(enter, exit, event) ~ sex + civ, 
               data = olm)
TABLE 9.3: Old age mortality for persons with known mother.
Covariate level W_mean Coef HR SE LR_p
sex 0.0056
male 0.418 0 1
female 0.582 -0.1959 0.8221 0.0704
civ 0.0013
unmarried 0.076 0 1
married 0.555 -0.4430 0.6421 0.1181
widow 0.369 -0.3102 0.7333 0.1159
Max Log Likelihood -5693.8

This is another story! We now got very similar values of the maximized log likelihoods, −5693.8 compared to −5693.4! The conclusion is that in this case, there is no frailty effect whatsoever.

One lesson to learn from this example is that you have to be very cautious when a data set contains missing values. Some functions, like drop1, give a warning when a situation like this is detected, but especially when comparisons are made in more than one step, it is too easy to forget the dangers.

Also note the warning that is printed in the results of coxme: 3163 observations deleted due to missingness. This is a warning that should be taken seriously. \(\ \Box\)

9.2.3 Parametric frailty models

It is possible utilize the connection between Poisson regression and the piecewise constant proportional hazards model discussed in Chapters 7 and 8 to fit parametric frailty models. We look at the fertility data again. The standard analysis without frailty effects is shown in Table 9.4.

fit0 <- pchreg(Surv(next.ivl, event) ~ parity + ses,
              cuts = 0:13, data = fe)
TABLE 9.4: Fertility analysis, fixed effects.
Covariate level W_mean Coef HR SE LR_p
parity 4.242 -0.1306 0.8775 0.0048 0.0000
ses 0.0000
farmer 0.491 0 1
unknown 0.186 0.0949 1.0995 0.0291
upper 0.018 0.0866 1.0904 0.0829
lower 0.305 -0.1508 0.8600 0.0255
Max Log Likelihood -14068.4
Restricted mean 2.299

Note the use of the function survival::survSplit (comment # 1 in the code), the transformation to factor for the slices of time (years) created by survSplit, and the creation of an offset (# 3). This is described in detail in Chapter 8.

For testing the presence of a random effect over women, an alternative is to use the function glmmML in the package with the same name.

library(glmmML)
fe13 <- survSplit(fe, end = "next.ivl", event = "event", 
                  cut = 1:13, episode = "years", 
                  start = "start") # 1
fe13$years <- as.factor(fe13$years) # 2
fe13$offs <- log(fe13$next.ivl - fe13$start) # 3
fit1 <- glmmML(event ~ parity + ses + years + offset(offs), 
               cluster = id, family = poisson, method = "ghq", 
               data = fe13, n.points = 9)
out <- with(fit1, cbind(coefficients, coef.sd))
colnames(out) <- c("Coef", "se(Coef)")
round(out[1:5, ], 3)
              Coef se(Coef)
(Intercept) -3.729    0.090
parity      -0.274    0.006
sesunknown   0.086    0.065
sesupper    -0.033    0.171
seslower    -0.262    0.054

The estimated scale parameter in the mixing distribution is 0.851 with standard error 0.024, so the conclusion is very much the same as with the nonparametric (coxme) approach: The clustering effect of mother is very strong and must be taken into account in the analysis of birth intervals. The nonparametric approach is easier to use and recommended, but see Section 9.3.

9.3 Stratification

A simple way to eliminate the effect of clustering is to stratify on the clusters. In the birth intervals example, it would mean that intervals are only compared to other birth intervals from the same mother. The drawback with a stratified analysis is that it is not possible to estimate the effect of covariates that are constant within clusters. In the birth intervals case, it is probable that ses, socio-economic status, would vary little within families. On the other hand, the effect of birth order or mother’s age would be suitable to analyze in a stratified setting, see Table 9.5.
TABLE 9.5: Fertility data, stratified Cox regression.
Covariate level W_mean Coef HR SE LR_p
parity 4.242 -0.3287 0.7199 0.0078 0.0000
prev.ivl 2.423 -0.0536 0.9478 0.0157 0.0006
Max Log Likelihood -9557.9
Contrast this result with an unstratified analysis in Table 9.6.
TABLE 9.6: Fertility data, unstratified Cox regression.
Covariate level W_mean Coef HR SE LR_p
parity 4.242 -0.0838 0.9196 0.0048 0.0000
prev.ivl 2.423 -0.3192 0.7267 0.0108 0.0000
Max Log Likelihood -70390.9

Note how the effect of parity is diminished when aggregating comparison over all women, while the effect of the length of the previous interval is enlarged. Try to explain why this result is expected!

Under certain circumstances it is actually possible to estimate an effect of a covariate that is constant within strata, but only if it is interacted with a covariate that is not constant within strata. See the example about maternal and infant mortality in Chapter 10.

References

Pinheiro, J., and D. Bates. 2000. Mixed-Effects Models in S and S-Plus. New York: Springer-Verlag.
Therneau, T. 2020. Coxme: Mixed Effects Cox Models. http://CRAN.R-project.org/package=coxme.
Vaupel, J. G., K. G. Manton, and E. Stallard. 1979. “The Impact of Heterogeneity in Individual Frailty on the Dynamics of Mortality.” Demography 16: 439–54.