Chapter 6 More on Cox Regression

Vital concepts like time-dependent covariates, communal covariates, handling of ties, model checking, sensitivity analysis, etc., are introduced in this chapter.

6.1 Time-Varying Covariates

Only a special kind of time-varying covariates can be treated in R by the packages eha and survival, and that is so-called piecewise constant functions. How this is done is best described by an example.

Example 6.1 (Civil status)

The covariate (factor) civil status (called civst in the R data frame) is an explanatory variable in a mortality study, which changes value from 0 to 1 at marriage. How should this be coded in the data file? The solution is to create two records (for individuals that marry), each with a fixed value of \(civ\_st\):

Original record: \((t_0, t, d, x(s), t_0 < s \le t)\), married at time \(T\), \(t_0 < T < t\):

\[ \text{civst}(s) = \left\{\begin{array}{ll} \text{unmarried} , & s < T \\ \text{married}, & s \ge T \end{array}\right. \]

First new record: \((t_0, T, 0, \text{unmarried})\), always censored.
Second new record: \((T, t, d, \text{married})\).

The data file will contain two records like (with \(T = 30\)) what you can see in Table ??.

In this way, a time-varying covariate can always be handled by utilizing left truncation and right censoring. See also Figure 6.1 for an illustration.

FIGURE 6.1: A time-varying covariate (unmarried or married). Right censored and left truncated at age 30 (at marriage).

And note that this situation is formally equivalent to a situation with two individuals, one unmarried and one married. The first one is right censored at exactly the same time as the second one enters the study (left truncation). \(\ \Box\)

A word of caution: Marriage status may be interpreted as an internal covariate, i.e., the change of marriage status is individual, and may depend on health status. For instance, maybe only healthy persons get married. So, the risk is that health status acts as a confounder in the relation between marriage and survival. Generally, the use of time dependent internal covariates is dangerous, and one should always think of possible confounding or reverse causality taking place when allowing for it.

6.2 Communal Covariates

Communal (external) covariates are covariates that vary in time outside any individual, and is common to all individuals at each specific calendar time. In econometric literature, such a variable is often called exogenous. This could be viewed upon as a special case of a time-varying covariate, but without the danger of reversed causality that was discussed above.

Example 6.2 (Did food prices affect mortality in the 19th century?)

For some 19th century parishes in southern Sweden, yearly time series of food prices are available. In this example we use deviation from the log trend in annual rye prices, shown in Figure 6.2.

Warning in axis(2, at = ttts, labels = um(ttts)): conversion failure
on '−0.4' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ttts, labels = um(ttts)): conversion failure
on '−0.4' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ttts, labels = um(ttts)): conversion failure
on '−0.4' in 'mbcsToSbcs': dot substituted for <92>

FIGURE 6.2: Log rye price deviations from trend, 19th century southern Sweden.

The idea behind this choice of transformation is to focus on deviations from the “normal” as a proxy for short-term variation in economic stress. We illustrate the idea with a subset of the built-in data set scania, and we show the information for individual No. 1.

   id enter   exit event birthdate  sex   ses
29  1    50 59.242     1  1781.454 male lower

Now, to put on the food prices as a time-varying covariate, use the function make.communal. We show what happens to individual No. 1.

scand <- make.communal(scand, logrye[, 2, drop = FALSE], 
                        start = 1801.75)
scand[scand$id == 1, ]

    enter   exit event birthdate id  sex   ses foodprices
1  50.000 50.296     0  1781.454  1 male lower      0.126
2  50.296 51.296     0  1781.454  1 male lower      0.423
3  51.296 52.296     0  1781.454  1 male lower     -0.019
4  52.296 53.296     0  1781.454  1 male lower     -0.156
5  53.296 54.296     0  1781.454  1 male lower     -0.177
6  54.296 55.296     0  1781.454  1 male lower     -0.085
7  55.296 56.296     0  1781.454  1 male lower     -0.007
8  56.296 57.296     0  1781.454  1 male lower      0.104
9  57.296 58.296     0  1781.454  1 male lower      0.118
10 58.296 59.242     1  1781.454  1 male lower     -0.197

A new variable, foodprices is added, and each individual’s duration is split up in one year long intervals (except the first and last). This is the way of treating foodprices as a time-varying covariate. The analysis is then straightforward.

fit <- coxreg(Surv(enter, exit, event) ~ ses + sex + foodprices, 
                data = scand)

TABLE 6.1: Foodprices and old age mortality, 19th century Scania.
Covariate	level	W_mean	Coef	HR	SE	LR_p
ses						0.6869
	upper	0.202	0	1
	lower	0.798	-0.0304	0.9701	0.0752
sex						0.5004
	male	0.504	0	1
	female	0.496	0.0411	1.0419	0.0610
foodprices		0.002	0.3206	1.3779	0.1816	0.0775
Max Log	Likelihood		-7184.1

Socio-economic status and sex do not matter much, but food prices do; the effect is almost significant at the 5 per cent level. See Table 6.1. \(\ \Box\)

6.3 Tied Event Times

Tied event times cannot, in theory, occur with continuous-time data, but it is impossible to measure duration and life lengths with infinite precision. Data are always more or less rounded, tied event times occur frequently in practice. This may cause problems (biased estimates) if occurring too frequently. There are a few ways to handle tied data, and the so-called exact method considers all possible permutations of the tied event times in each risk set. It works as shown in the following example.

Example 6.3 (Likelihood contribution at tied event time)

\(R_i = \{1, 2, 3\}\), 1 and 2 are events; two possible orderings:

\[\begin{eqnarray*} L_i(\boldsymbol{\beta}) & = & \frac{\psi(1)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(2)}{\psi(2) + \psi(3)} \\ & + & \frac{\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(1)}{\psi(1) + \psi(3)} \\ & = & \frac{\psi(1)\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \left\{\frac{1}{\psi(2) + \psi(3)} + \frac{1}{\psi(1) + \psi(3)}\right\} \end{eqnarray*}\] \(\ \Box\)

The main drawback with the exact method is that it easily becomes unacceptably slow, because of the huge number of permutations that may be necessary to consider. It is however available in the function coxph in the survival package as an option.

Fortunately, there are a few excellent approximations, most notably Efron’s, which is the default method in most survival packages in R. Another common approximation, due to Breslow, is the default in some statistical software and also possible to choose in the eha and survival packages in R. Finally, there is no cost involved in using these approximations in the case of no ties at all; they will all give identical results.

With too much ties in the data, there is always the possibility to use discrete time methods. One simple way of doing it is to use the option method = "ml" in the coxreg function in the R package eha.

Example 6.4 (Birth intervals)

We look at length of birth intervals between first and second births for married women in 19th century northern Sweden, a subset of the data set ‘fert,’ available in the eha package.
Four runs with coxreg are performed, with all the possible ways of treating ties.

library(eha)
first <- fert[fert$parity == 1, ]
## Default method (not necessary to specify method = "efron"
fit.e <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                method = "efron")
## Breslow
fit.b <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                method = "breslow")
## The hybrid mppl:
fit.mp <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                 method = "mppl", coxph = FALSE)
## True discrete:
fit.ml <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                 method = "ml", coxph = FALSE)

Then we compose a table of the four results, see Table ??.

There are almost no difference in results between the four methods. For the exact method mentioned above, the number of permutations is \(\sim 7 \times 10^{192}\), which is impossible to handle.\(\ \Box\)

With the help of the function risksets in eha, it is easy to check the composition of risk sets in general and the frequency of ties in particular.

rs <- risksets(Surv(first$next.ivl, first$event))
tt <- table(rs$n.event)
tt


  1   2   3   4   5   6   7   9 
369 199 133  60  27  14   2   2

This output says that 369 risk sets have only one event each (no ties), 199 risk sets have exactly two events each, etc.

The object rs is a list with seven components. Two of them are n.event, which counts the number of events in each risk set, and size, which gives the size (number of individuals under observation) of each risk set. Both these vectors are ordered after the risktimes, which is another component of the list. For the rest of the components, see the help page for risksets.

It is easy to produce the Nelson-Aalen plot with the output from risksets, see Figure 6.3.

par(las = 1)
plot(rs$risktimes, cumsum(rs$n.event / rs$size), type = "s", 
     xlab = "Duration (years)", ylab = "Cum. hazards")
abline(h = 0)

FIGURE 6.3: Nelson-Aalen plot with risksets.

One version of the corresponding survival function is shown in Figure 6.4.

par(las = 1)
sur <- exp(-cumsum(rs$n.event / rs$size))
plot(rs$risktimes, sur, type = "s", 
     xlab = "Duration (years)", ylab = "Surviving fraction")
abline(h = 0)

FIGURE 6.4: Survival plot with the aid of the function risksets.

6.4 Stratification

Stratification means that data is split up in groups called strata, and a separate partial likelihood function is created for each stratum, but with common values on the regression parameters corresponding to the common explanatory variables. In the estimation, these partial likelihoods are multiplied together, and the product is treated as a likelihood function. Thus, there is one restriction on the parameters, they are the same across strata.

There are typically two reasons for stratification. First, if the proportionality assumption does not hold for a factor covariate, a way out is to stratify along it. Second, a factor may have too many levels, so that it is inappropriate to treat is as an ordinary factor. This argument is similar to the one about using a frailty model (Chapter 10). Stratification is also a useful tool with matched data, see Chapter 11.

When a factor does not produce proportional hazards between categories, stratify on the categories. For tests of the proportionality assumption, see Section 6.7.1.

Example 6.5 (Birth intervals)

For the birth interval data, we stratify on ses in the Cox regression:

library(eha)
source("R/fit.out.R")
fert1 <- fert[fert$parity == 1, ]
levels(fert1$parish) <- c("Jörn", "Norsjö", "Skellefteå")
fert1$parish <- relevel(fert1$parish, ref = "Skellefteå")
fert1$age.25 <- fert1$age - 25       # Center
fert1$year.1860 <- fert1$year - 1860 # Center
fit <- coxreg(Surv(next.ivl, event) ~ age.25 + year.1860 +
              prev.ivl + parish + strata(ses), 
              data = fert1)
fit.out(fit, caption = "Birth intervals, 19th century Skellefteå.", 
        label = "strafr6")

TABLE 6.2: Birth intervals, 19th century Skellefteå.
Covariate	level	W_mean	Coef	HR	SE	LR_p
age.25		2.151	-0.0227	0.9776	0.0057	0.0001
year.1860		-1.336	-0.0007	0.9993	0.0022	0.7504
prev.ivl		1.309	-0.2245	0.7989	0.0292	0.0000
parish						0.0093
	Skellefteå	0.937	0	1
	Jörn	0.010	0.6196	1.8581	0.2144
	Norsjö	0.053	0.1869	1.2055	0.1087
Max Log	Likelihood		-9047.3

The results in Table 6.2 show that calendar time is unimportant, while mother’s age and the length of the previous interval, from marriage to first birth, both are important with birth intensity decreasing with both. Also, note that the continuous covariates are centered around a reasonable value in the range of both. It will not affect the rest of the table in this case, but with interactions involving either it would.

Next plot the fit, see Figure 6.5. The non-proportionality pattern is clearly visible. Note the line par(lwd = 1.5): It magnifies the line widths by 50 per cent in the plot, compared to default. And the values on the \(y\) axis refers to a 25-year-old mother giving birth in Skellefteå 1860.

par(lwd = 1.5, cex = 0.8)
plot(fit, ylab = "Cumulative hazards", xlab = "Years")

FIGURE 6.5: Cumulative hazards by socio-economic status, second birth interval.

6.5 Sampling of Risk Sets

Some of the work by Langholz and Borgan (1995) is implemented in eha. The idea behind sampling of risk sets is that with huge data sets, in each risk sets there will be a few (often only one) events, but many survivors. From an efficiency point of view, not much will be lost by using only a random fraction of the survivors in each risk set. The gain is of course computational speed and memory saving. How it is done in eha is shown by an example.

Example 6.6 (Sampling of risk sets, male mortality.)

For the male mortality data set, we compare a full analysis with one where only four survivors are sampled in each risk set.

fit <- coxreg(Surv(enter, exit, event) ~ ses, 
              data = mort)
fit.4 <- coxreg(Surv(enter, exit, event) ~ ses, 
                max.survs = 4, data = mort)
f1 <- coefficients(summary(fit))[c(1, 3)]
f4 <- coefficients(summary(fit.4))[c(1, 3)]
out <- rbind(f1, f4)
colnames(out) <- c("Coef", "se(Coef)")
rownames(out) <- c("Original", "Sample")
round(out, 4)

            Coef se(Coef)
Original -0.4795   0.1207
Sample   -0.5442   0.1357

The results are comparable and a gain in execution time was noted. Of course, with the small data sets we work with here, the difference is of no practical importance.\(\ \Box\)

It is worth noting that the function risksets has an argument max.survs, which, when it is set, sample survivors for each risk set in the data. The output can then be used as input to coxreg, see the relevant help pages for more information.

6.6 Residuals

Residuals are generally the key to judgment of model fit to data. It is easy to show how it works with linear regression.

Example 6.7 (Linear regression)

Figure 6.6 shows a scatter plot of bivariate data and a fitted straight line (left panel) and residuals versus fitted values (right panel).

FIGURE 6.6: Residuals in linear regression.

The residuals are the vertical distances between the line and the points.\(\ \Box\)

Unfortunately, this kind of simple and intuitive graph does not exist for results from a Cox regression analysis. However, there are a few ideas how to accomplish something similar. The main idea goes as follows. If \(T\) is a survival time, and \(S(t)\) the corresponding survivor function (continuous time), then it is a mathematical fact that \(U = S(T)\) is uniformly distributed on the interval \((0, 1)\). Continuing, this implies that \(-\log(U)\) is exponentially distributed with rate (parameter) 1.

It is thus shown that \(H(T)\) (\(H\) is the cumulative hazard function for \(T\)) is exponentially distributed with rate 1. This motivates the residuals, given a sample \(T_1, \ldots, T_n\) of survival times,

\[\begin{equation} r_{Ci} = \hat{H}(T_i; \mathbf{x}_i) = \exp(\boldsymbol{\beta} \mathbf{x}_i)\hat{H}_0(T_i),\quad i = 1, \ldots, n, \end{equation}\] which should behave like a censored sample from an exponential distribution with parameter 1, if the model is good. If, for some \(i\), \(T_i\) is censored, so is the residual.

6.6.1 Martingale residuals

A drawback with the Cox-Snell residuals is that they contain censored values. An attempt to correct the censored residuals leads to the so-called martingale residuals. The idea is simple; it builds on the “lack-of-memory” property of the exponential distribution. The expected value is 1, and by adding 1 to the censored residuals, they become predictions (estimates) of the corresponding uncensored values. then finally a twist is added: Subtract 1 from all values and multiply by \(-1\), leading to the definition of the martingale residuals.

\[\begin{equation} r_{Mi} = \delta_i - r_{Ci}, i = 1, \ldots, n. \end{equation}\] where \(\delta_i\) is the indicator of event for the \(i\):th observation. These residuals may be interpreted as the Observed minus the Expected number of events for each individual. Direct plots of the martingale residuals tend to be less informative, especially with large data sets.

Example 6.8 (Plot of martingale residuals)

The data set kidney from the survival package is used. For more information about it, read its help page in R. For each patient, two survival times are recorded, but only one will be used here. That is accomplished with the function duplicated.

library(survival)
head(kidney)

  id time status age sex disease frail
1  1    8      1  28   1   Other   2.3
2  1   16      1  28   1   Other   2.3
3  2   23      1  48   2      GN   1.9
4  2   13      0  48   2      GN   1.9
5  3   22      1  32   1   Other   1.2
6  3   28      1  32   1   Other   1.2

k1 <- kidney[!duplicated(kidney$id), ]
fit <- coxreg(Surv(time, status) ~ disease + age + sex, 
              data = k1)

The extractor function residuals is used to extract the martingale residuals from the fit. See Figure 6.7.

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <92>

FIGURE 6.7: Martingale residuals from a fit of the kidney data.

With a large data set, the plot will be hard to interpret, see Figure 6.8, which shows the martingale residuals from a fit of the male mortality (mort) data in the package eha.

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−0.5' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−0.5' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ojs, labels = um(ojs)): conversion failure on
'−0.5' in 'mbcsToSbcs': dot substituted for <92>

FIGURE 6.8: Martingale residuals from a fit of the male mortality data.

It is clear that the residuals are grouped in two clusters. The positive ones are connected to the observations with an event, while the negative ones are corresponding to the censored observations. It is hard to draw any conclusions from plots like this one. However, the residuals are used in functions that evaluate goodness-of-fit, for instance the function cox.zph in the survival package. \(\ \Box\)

6.7 Checking Model Assumptions

6.7.1 Proportionality

The proportionality assumption is extremely important to check in the proportional hazards models. We have already seen how violations of the assumption can be dealt with (stratification).

We exemplify with the `births’ data set in the eha package.

Example 6.9 (Birth intervals, proportionality assumption)

In order to keep observations reasonably independent, we concentrate on one specific birth interval per woman, the interval between the second and third birth. That is, in our sample are all women with at least two births, and we monitor each woman from the day of her second delivery until the third, or if that never happens, throughout her observable fertile period, i.e, to age 50 or to loss of follow-up. The latter will result in a right-censored interval.

fert2 <- fert[fert$parity == 2, ]

Then we save the fit from a Cox regression performed by the function coxph in the survival package (important!),

fit <- survival::coxph(Surv(next.ivl, event) ~ ses + age + 
                           year + parish, 
             data = fert2)

and check the proportionality assumption. It is done with the function cox.zph in the survival package.

prop.full <- survival::cox.zph(fit)
prop.full

       chisq df      p
ses     9.76  3 0.0207
age     7.14  1 0.0075
year    2.34  1 0.1259
parish  1.39  2 0.4984
GLOBAL 18.73  7 0.0091

In the table above, look first at the last row, GLOBAL. It is the result of a test of the global null hypothesis that proportionality holds. The small \(p\)-value tells us that we have a big problem with the proportionality assumption. Looking further up, we see two possible problems, the variables age and ses. Unfortunately, age is a continuous variable. A categorical variable can be stratified upon, but now we have to categorize first.

Let us first investigate the effect of stratifying on ses.

fit1 <- coxph(Surv(next.ivl, event) ~ strata(ses) + 
               age + year + parish, data = fert2)

and plot the result, see Figure 6.9.

FIGURE 6.9: Stratification on socio-economic status, third birth interval data.

The non-proportionality is visible, some curves cross. The test of proportionality of the stratified model shows that we still have a problem with age.

fit1.zph <- survival::cox.zph(fit1)
print(fit1.zph)

       chisq df      p
age     7.85  1 0.0051
year    1.61  1 0.2045
parish  1.47  2 0.4793
GLOBAL  9.19  4 0.0564

Since age is continuous covariate, we may have to categorize it. First its distribution is checked with a histogram, see Figure 6.10.

hist(fert2$age, main = "", xlab = "age")

FIGURE 6.10: The distribution of mother’s age, third birth interval data.

The range of age may reasonably be split into four equal-length intervals with the cut function.

fert2$qage <- cut(fert2$age, 4)
fit2 <- survival::coxph(Surv(next.ivl, event) ~ strata(ses) + 
              qage + year + parish, data = fert2)

and then the proportionality assumption is tested again.

fit2.zph <- survival::cox.zph(fit2)
fit2.zph

       chisq df     p
qage   7.589  3 0.055
year   0.758  1 0.384
parish 1.052  2 0.591
GLOBAL 8.536  6 0.201

The result is now reasonable, with the high age group deviating slightly. This is not so strange; fertility becomes essentially zero in the age range in the upper forties, and therefore it is natural that the proportionality assumption is violated. One way of dealing with the problem, apart from stratification, would be to analyze the age groups separately. \(\ \Box\)

6.7.2 Log-linearity

In the Cox regression model, the effect of a covariate on the hazard is log-linear, that is, it affects the log hazard linearly. Sometimes a covariate needs to be transformed to fit into the pattern. The first question is then: Should the covariate be transformed in the analysis? A graphical test can be done by plotting the martingale residuals from a null fit against the covariate, See Figure 6.11.

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(1, at = oj1, labels = um(oj1)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = oj2, labels = um(oj2)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <92>

FIGURE 6.11: Plot of residuals against the explanatory variable \(x\).

The function lowess fits a “smooth” curve to a scatterplot. The curvature is clearly visible, let us see what happens if we make the plot with \(x^2\) instead of \(x\). (Note that we now are cheating!) See Figure 6.12.

par(las = 1)
plot(x^2, residuals(fit), axes = FALSE, xlab = expression(x^2))
axis(1)
ah <- seq(-4, 1, by = 1)
axis(2, at = ah, labels = um(ah))

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−4' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−3' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−2' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(2, at = ah, labels = um(ah)): conversion failure on
'−1' in 'mbcsToSbcs': dot substituted for <92>

box()
lines(lowess(x^2, residuals(fit)))

FIGURE 6.12: Plot of residuals against the explanatory variable \(x^2\).

This plot indicates more that an increasing \(x\) is associated with an increasing risk for the event to occur.

6.8 Fixed Study Period Survival

Sometimes survival up to some fixed time is of interest. For instance, in medicine five-year survival may be used as a measure of the success of a treatment. The measure is then the probability for a patient to survive five years after treatment.

This is simple to implement; it is a binary experiment, where for each patient survival or not survival is recorded (plus covariates, such as treatment). The only complicating factor is incomplete observations, i.e., right censoring. In that case it is recommended to use ordinary Cox regression anyway. Otherwise, logistic regression with a suitable link function works well. Of course, as mentioned earlier, the cloglog link is what gets you closest to the proportional hazards model, but usually it makes very little difference. See Kalbfleisch and Prentice (2002), pages 334–336, for slightly more detail.

6.9 Left or Right Censored Data.

Sometimes the available data is either left or right censored, i.e., for each individual \(i\), we know survival of \(t_i\) or not, i.e., data \[(t_i, \delta_i), \quad i = 1, \ldots n, \] \(\delta_i = 1\) if survival (died after \(t_i\)) and \(\delta_i = 0\) if not (died before \(t_i\)). It is still possible to estimate the survivor function \(S\) nonparametrically. The likelihood function is \[\begin{equation*} L(S) = \prod_{i=1}^n \left\{S(t_i)\right\}^{\delta_i} \left\{ 1 - S(t_i)\right\}^{1 - \delta_i} \end{equation*}\] How to maximize this under the restrictions \(t_i < t_j \Rightarrow S(t_i) \ge S(t_j)\) with the EM algorithm is shown by Groeneboom and Wellner (1992).

6.10 The Weird Bootstrap

In small to medium-sized samples, the estimated parameters and their standard errors may be not so reliable, and in such cases the bootstrap technique can be useful. There are a couple of ways to implement it in Cox regression, and here the weird bootstrap is presented.

The resampling procedure is done risk set by riskset. If a specific risk set of size \(n\) contains \(d\) events, \(d\) “new” events are drawn from the \(n\) members of the risk set, without replacement, and with probabilities proportional to their scores. Then the regression parameter is estimated and saved in the bootstrap sample. This procedure is repeated a suitable number of times.

The “weird” part of the resampling procedure is the fact that it is performed independently over risk sets, that is, one individual who is present in two risk sets may well have an event (for instance, die) in both risk sets. That is, the risk sets are treated as independent entities in this respect. See Andersen et al. (1993) for more detail.

The weird bootstrap is implemented in the coxreg function in the eha package. As an example, child mortality in 19th century Sweden is analyzed with respect to the sex difference in mortality.

fit <- coxreg(Surv(enter, exit, event) ~ sex, boot = 300, 
              data = child)
b_hat <- fit$coefficents[1]
b_se <- sqrt(fit$var[1, 1])
b_sample <- fit$bootstrap[1, ]

This fitting procedure takes a fairly long time, around 19 minutes on a well equipped laptop (2018). The bootstrap result is a matrix where the number of rows is equal to the number of estimated parameters (one in this case), and the number of columns is equal to the number of bootstrap replicates (300 in this case). The child data set is fairly large with 26574 children and 5616 deaths. The number of risk sets is 2497, with sizes between 20000 and 27000, and up to 60 events in one risk set.

Now it is up to the user to do whatever she usually does with bootstrap samples. For instance, a start could be to get a grip of the distribution of the bootstrap sample around the parameter estimate, see Figure 6.13.

par(las = 1)
plot(density(b_sample - b_hat, bw = 0.005), 
     xlab = "Deviation", main = "", axes = FALSE)
xh <- c(-0.05, 0.00, 0.05)
axis(1, at = xh, labels = um(xh))

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <92>

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <e2>

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <88>

Warning in axis(1, at = xh, labels = um(xh)): conversion failure on
'−0.05' in 'mbcsToSbcs': dot substituted for <92>

axis(2)
box()
abline(h = 0, v = 0)

FIGURE 6.13: Bootstrap sample distribution around the estimate.

The mean of the bootstrap sample is \(-0.084\) and its sample standard deviation is 0.019. We can compare with the output of the standard Cox regression, see Table 6.3.

TABLE 6.3: Child mortality, ordinary Cox regression.
Covariate	level	W_mean	Coef	HR	SE	LR_p
sex						0.0018
	male	0.510	0	1
	female	0.490	-0.0833	0.9201	0.0267
Max Log	Likelihood		-56509.9

The mean is very close to the actual estimate, but the bootstrap standard error is slightly smaller than the estimated standard error.

For more on the bootstrap, see Efron and Tibshirani (1993), Hinkley (1988), and Hall and Wilson (1991).

References

Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding. 1993. Statistical Models Based on Counting Processes. Berlin: Springer-Verlag.

Efron, B., and R. J. Tibshirani. 1993. An Introduction to the Bootstrap. New York: Chapman; Hall.

Groeneboom, P., and J. A. Wellner. 1992. Nonparametric Maximum Likelihood Estimators for Interval Censoring and Deconvolution. Boston: Birkhäuser.

Hall, P., and S. R. Wilson. 1991. “Two Guidelines for Bootstrap Hypothesis Testing.” Biometrics 47: 757–62.

Hinkley, D. V. 1988. “Bootstrap Methods (with Discussion).” J. Royal Statist. Soc. B 50: 321–37.

———. 2002. The Statistical Analysis of Failure Time Data. Second. Hoboken, N.J.: Wiley.

Langholz, B., and Ø. Borgan. 1995. “Counter-Matching: A Stratified Nested Case-Control Sampling Method.” Biometrika 82: 69–79.