5.3 Tied event times

Tied event times can in theory not occur with continuous-time data, but it is impossible to measure duration and life lengths with infinite precision. Data are always more or less rounded, tied event times occur frequently in practice. This may cause problems (biased estimates) if occurring too frequently. There are a few ways to handle tied data, and the so-called exact method considers all possible permutations of the tied event times in each risk set. It works as shown in the following example.

Example 5.3 (Likelihood contribution at tied event time)

\(R_i = \{1, 2, 3\}\), 1 and 2 are events; two possible orderings:

\[\begin{eqnarray*} L_i(\boldsymbol{\beta}) & = & \frac{\psi(1)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(2)}{\psi(2) + \psi(3)} \\ & + & \frac{\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(1)}{\psi(1) + \psi(3)} \\ & = & \frac{\psi(1)\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \left\{\frac{1}{\psi(2) + \psi(3)} + \frac{1}{\psi(1) + \psi(3)}\right\} \end{eqnarray*}\] \(\Box\)

The main drawback with the exact method is that it easily becomes unacceptably slow, because of the huge number of permutations that may be necessary to consider. It is however available in the function coxph in the survival package as an option.

Fortunately, there are a few excellent approximations, most notably Efron’s, which is the default method in most survival packages in R. Another common approximation, due to Breslow, is the default in some statistical software and also possible to choose in the eha and survival packages in R. Finally, there is no cost involved in using these approximations in the case of no ties at all; they will all give identical results.

With too much ties in the data, there is always the possibility to use discrete time methods. One simple way of doing it is to use the option method = "ml" in the coxreg function in the R package eha.

Example 5.4 (Birth intervals.)

We look at length of birth intervals between first and second births for married women in 19th century northern Sweden, a subset of the data set ‘fert’, available in the eha package.
Four runs with coxreg are performed, with all the possible ways of treating ties.

library(eha)
first <- fert[fert$parity == 1, ]
## Default method (not necessary to specify method = "efron"
fit.e <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                method = "efron")
## Breslow
fit.b <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                method = "breslow")
## The hybrid mppl:
fit.mp <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                 method = "mppl")
## True discrete:
fit.ml <- coxreg(Surv(next.ivl, event) ~ year + age, data = first, 
                 method = "ml")

Then we compose a table of the four results, see Table 5.2.

TABLE 5.2: Comparison of four methods of handling ties, estimated coefficients.
	year	age
Efron	0.0017	-0.0390
Breslow	0.0017	-0.0390
MPPL	0.0017	-0.0391
ML	0.0017	-0.0391

There are almost no difference in results between the four methods. For the exact method mentioned above, the number of permutations is \(\sim 7 \times 10^{192}\), which is impossible to handle.\(\Box\)

With the help of the function risksets in eha, it is easy to check the composition of risk sets in general and the frequency of ties in particular.

rs <- risksets(Surv(first$next.ivl, first$event))
tt <- table(rs$n.event)
tt

## 
##   1   2   3   4   5   6   7   9 
## 369 199 133  60  27  14   2   2

This output says that 369 risk sets have only one event each (no ties), 199 risk sets have exactly two events each, etc.

The object rs is a list with seven components. Two of them are n.event, which counts the number of events in each risk set, and size, which gives the size (number of individuals under observation) of each risk set. Both these vectors are ordered after the risktimes, which is another component of the list. For the rest of the components, see the help page for risksets.

It is easy to produce the Nelson-Aalen plot with the output from risksets. The code is

plot(rs$risktimes, cumsum(rs$n.event / rs$size), type = "s", 
     xlab = "Duration (years)", ylab = "Cumulative hazards")
abline(h = 0)

and the result is shown in Figure 5.3.

FIGURE 5.3: Nelson-Aalen plot with the aid of the function risksets

One version of the corresponding survival can be constructed by

sur <- exp(-cumsum(rs$n.event / rs$size))
plot(rs$risktimes, sur, type = "s", 
     xlab = "Duration (years)", ylab = "Surviving fraction")
abline(h = 0)

and the result is shown in Figure 5.4

FIGURE 5.4: Survival plot with the aid of the function risksets.