5.3 Tied event times
Tied event times can in theory not occur with continuous-time data, but it is impossible to measure duration and life lengths with infinite precision. Data are always more or less rounded, tied event times occur frequently in practice. This may cause problems (biased estimates) if occurring too frequently. There are a few ways to handle tied data, and the so-called exact method considers all possible permutations of the tied event times in each risk set. It works as shown in the following example.
\(R_i = \{1, 2, 3\}\), 1 and 2 are events; two possible orderings:
\[\begin{eqnarray*} L_i(\boldsymbol{\beta}) & = & \frac{\psi(1)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(2)}{\psi(2) + \psi(3)} \\ & + & \frac{\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \times \frac{\psi(1)}{\psi(1) + \psi(3)} \\ & = & \frac{\psi(1)\psi(2)}{\psi(1) + \psi(2) + \psi(3)} \left\{\frac{1}{\psi(2) + \psi(3)} + \frac{1}{\psi(1) + \psi(3)}\right\} \end{eqnarray*}\] \(\Box\)
The main drawback with the exact method is that it easily becomes
unacceptably slow, because of the huge number of permutations that may be
necessary to consider. It is however available in the function coxph in the
survival package as an option.
Fortunately, there are a few excellent approximations, most notably
Efron’s, which is the default method in most
survival packages in
R. Another common approximation, due to Breslow,
is the default in some
statistical software and also possible to choose in the eha and
survival packages in R. Finally, there is no cost involved in using
these approximations in the case of no ties at all; they will all give
identical results.
With too much ties in the data, there is always the possibility to use
discrete time methods. One simple way of doing it is to use the option
method = "ml" in
the coxreg function in the R package eha.
We look at length of birth intervals
between first and second births for married women in 19th century
northern Sweden, a subset of the data set ‘fert’, available in the
eha package.
Four runs with coxreg are performed, with all the possible ways of
treating ties.
library(eha)
first <- fert[fert$parity == 1, ]
## Default method (not necessary to specify method = "efron"
fit.e <- coxreg(Surv(next.ivl, event) ~ year + age, data = first,
method = "efron")
## Breslow
fit.b <- coxreg(Surv(next.ivl, event) ~ year + age, data = first,
method = "breslow")
## The hybrid mppl:
fit.mp <- coxreg(Surv(next.ivl, event) ~ year + age, data = first,
method = "mppl")
## True discrete:
fit.ml <- coxreg(Surv(next.ivl, event) ~ year + age, data = first,
method = "ml")Then we compose a table of the four results, see Table 5.2.
| year | age | |
|---|---|---|
| Efron | 0.0017 | -0.0390 |
| Breslow | 0.0017 | -0.0390 |
| MPPL | 0.0017 | -0.0391 |
| ML | 0.0017 | -0.0391 |
There are almost no difference in results between the four methods. For the
exact method mentioned above, the number of permutations is
\(\sim 7 \times 10^{192}\), which is impossible to handle.\(\Box\)
With the help of the function risksets in eha, it is easy to
check the composition of risk sets in general and the frequency of ties in
particular.
##
## 1 2 3 4 5 6 7 9
## 369 199 133 60 27 14 2 2
This output says that 369 risk sets have only one event each (no ties), 199 risk sets have exactly two events each, etc.
The object rs is a list with seven components. Two of them are
n.event, which counts the number of events in each risk set, and size,
which gives the size (number of individuals under observation) of
each risk set. Both these vectors are ordered after the risktimes,
which is another component of the list. For the rest of the components, see
the help page for risksets.
It is easy to produce the Nelson-Aalen plot
with the output from risksets. The code is
plot(rs$risktimes, cumsum(rs$n.event / rs$size), type = "s",
xlab = "Duration (years)", ylab = "Cumulative hazards")
abline(h = 0)
and the result is shown in Figure 5.3.
FIGURE 5.3: Nelson-Aalen plot with the aid of the function risksets
One version of the corresponding survival can be constructed by
sur <- exp(-cumsum(rs$n.event / rs$size))
plot(rs$risktimes, sur, type = "s",
xlab = "Duration (years)", ylab = "Surviving fraction")
abline(h = 0)
and the result is shown in Figure 5.4
FIGURE 5.4: Survival plot with the aid of the function risksets.