3.2 The Log-Rank Test

The log-rank test is a \(k\)-sample test of equality of survival functions. We first look at the two-sample case, that is, \(k = 2\).

Suppose that we have the small data set illustrated in Figure 3.3. There are two samples, the letters (A, B, C, D, E) and the numbers (1, 2, 3, 4, 5).

Two-sample data, the letters (dashed) and the numbers (solid). Circles denote censored observations, plusses events.

FIGURE 3.3: Two-sample data, the letters (dashed) and the numbers (solid). Circles denote censored observations, plusses events.

The data in Figure 3.3 can be presented i tabular form, see Table 3.1.

TABLE 3.1: Example data for the log-rank test..
group time event
numbers 4.0 TRUE
numbers 2.0 FALSE
numbers 6.0 TRUE
numbers 1.0 TRUE
numbers 3.5 FALSE
letters 5.0 TRUE
letters 3.0 TRUE
letters 6.0 FALSE
letters 1.0 TRUE
letters 2.5 FALSE

We are interested in investigating whether letters and numbers have the same survival chances or not. Therefore, the hypothesis

\[\begin{equation*} H_0: \text{No difference in survival between numbers and letters} \end{equation*}\] is formulated. In order to test \(H_0\), we make five tables, one for each observed event time, see Table 3.2, where the the first table, relating to failure time \(t_{(1)} = 1\), is shown.

TABLE 3.2: The 2x2 table for the first event time.
Deaths Survivals Total
numbers 1 4 5
letters 1 4 5
Total 2 8 10

Let us look at the table at failure time \(t_{(1)} = 1\), i.e., Table 3.2, from the viewpoint of the numbers.

  • The observed number of deaths among numbers: \(1\).
  • The expected number of deaths among numbers: \(2 \times 5 / 10 = 1\).

The expected number is calculated under \(H_0\), i.e., as if there is no difference between letters and numbers regarding mortality. It is further assumed that the two margins (Total) are given (fixed).

Then, given two deaths in total and five out of ten observations are from the group numbers, the expected number of deaths is calculated as above.

This procedure is repeated for each of the five tables, and the results are summarized in Table 3.3.

TABLE 3.3: Observed and expected number of deaths at event times.
Observed Expected Difference Variance
t(1) 1 1.0 0.0 0.44
t(2) 0 0.5 -0.5 0.25
t(3) 1 0.5 0.5 0.25
t(4) 0 0.3 -0.3 0.22
t(5) 1 0.5 0.5 0.25
Sum 3 2.8 0.2 1.41

Finally, the observed test statistic \(T\) is calculated as

\[\begin{equation*} T = \frac{0.2^2}{1.41} \approx 0.028 \end{equation*}\] Under the null hypothesis, this is an observed value from a \(\chi^2(1)\) distribution, and \(H_0\) should be rejected for large values of \(T\). Using a level of significance of 5%, the cutting point for the value of \(T\) is 3.84, far from our observed value of 1.41. The conclusion is therefore that there is no (statistically significant) difference in survival chances between letters and numbers. Note, however, that this result depends on asymptotic (large sample) properties, and in this toy example, these properties are not valid.

For more detail about the underlying theory, see Appendix A.

In R, the log-rank test is performed by the coxph function in the package survival (there are other options).

Let us now look at a real data example, the old age mortality data set oldmort in eha. See Table 3.4 for a sample of five records with selected columns.

TABLE 3.4: Old age mortality.
id enter exit event sex civ
793001208 66.498 67.988 0 male married
793001208 67.988 72.820 0 male married
793001208 72.820 75.542 1 male widow
793001209 66.446 76.568 1 female married
793001210 66.446 67.936 0 female married

We are interested in comparing male and female mortality in the ages 60–85 with a logrank test, and for that purpose we run a Cox regression analysis:

fit <- coxph(Surv(enter, exit, event) ~ sex, data = om)

The result is given by summary(fit):

Call:
coxph(formula = Surv(enter, exit, event) ~ sex, data = om)

  n= 6456, number of events= 1823 

              coef exp(coef) se(coef)      z Pr(>|z|)
sexfemale -0.20635   0.81354  0.04718 -4.374 1.22e-05

          exp(coef) exp(-coef) lower .95 upper .95
sexfemale    0.8135      1.229    0.7417    0.8924

Concordance= 0.532  (se = 0.007 )
Likelihood ratio test= 18.95  on 1 df,   p=1e-05
Wald test            = 19.13  on 1 df,   p=1e-05
Score (logrank) test = 19.2  on 1 df,   p=1e-05

Obviously, we got a lot of information here, more than we actually need. We have in fact performed a Cox regression slightly ahead of schedule! The result of the logrank test is displayed on the last line of output. The \(p\)-value is \(1.179 \times 10^{-5}\), a very small number. Thus, there is a very significant difference in mortality between men and women. But how large is the difference? The answer is found at exp(coef) = 0.8135, which tells us that the female risk of dying is about 81% of the male risk, at each age between 60 and 85.

Remember that this result depends on the proportional hazards assumption. We can graphically check it as follows.

sf <- survfit(Surv(enter, exit, event) ~ strata(sex), 
              data = om, start.time = 60)
plot(sf, xlab = "Age", fun = "cumhaz")

Note that the grouping factor (sex) is given through the function strata in the formula. The result is shown in Figure 3.4.

Old age mortality, women vs. men, cumulative hazards.

FIGURE 3.4: Old age mortality, women vs. men, cumulative hazards.

The proportionality assumption seems to be a good description from 60 to 85–90 years of age, but it seems more doubtful in the very high ages. One reason for this may be that the high-age estimates are based on few observations (most of the individuals in the sample died earlier), so random fluctuations have a large impact in the high ages.

3.2.1 Several samples

The result for the two-sample case is easily extended to the \(k\)-sample case. Instead of one \(2 \times 2\) table per observed event time we get one \(k\times 2\) table per observed event time and we have to calculate expected and observed numbers of events for \((k-1)\) groups at each failure time. The resulting test statistic will have \((k-1)\) degrees of freedom and still be approximately \(\chi^2\) distributed. This is illustrated with the same data set, oldmort, as above, but with the covariate civ, which is a factor with three levels (unmarried, married, widow), instead of sex. Furthermore, the investigation is limited to male mortality.

Call:
coxph(formula = Surv(enter, exit, event) ~ civ, data = om[om$sex == 
    "male", ])

  n= 2872, number of events= 811 

              coef exp(coef) se(coef)      z Pr(>|z|)
civmarried -0.5164    0.5967   0.1440 -3.587 0.000335
civwidow   -0.2636    0.7683   0.1496 -1.762 0.078110

           exp(coef) exp(-coef) lower .95 upper .95
civmarried    0.5967      1.676     0.450    0.7912
civwidow      0.7683      1.302     0.573    1.0301

Concordance= 0.536  (se = 0.009 )
Likelihood ratio test= 18.63  on 2 df,   p=9e-05
Wald test            = 19.63  on 2 df,   p=5e-05
Score (logrank) test = 19.86  on 2 df,   p=5e-05

The degrees of freedom for the score test is now 2, equal to the number of levels in civ minus one. Being unmarried seem to have great impact on old age mortality. It is however recommended to check the proportionality assumption graphically, see Figure 3.5.

Old age male mortality by civil status, cumulative hazards.

FIGURE 3.5: Old age male mortality by civil status, cumulative hazards.

There is obviously nothing that indicates non-proportionality in this case either. Furthermore, the unmarried have have significantly higher mortality than both married and widows.

We do not go deeper into this matter here, mainly because the logrank test generally is a special case of Cox regression, which will be described in detail later in this chapter.