3.6 Interactions

The meaning of interaction between two explanatory variables is described by an example, walking through the three possible combinations of covariate types. We return to the oldmort data set, slightly modified, see Table 3.5, where the first six of a total to 6495 rows are shown.

TABLE 3.5: Old age mortality, Sundsvall 1860-1880
birthdate	enter	exit	sex	farmer	imr.birth
1775.286	84.714	85	female	no	14.20722
1775.092	84.908	85	female	yes	11.97183
1775.374	84.626	85	female	no	16.93548
1775.859	84.141	85	male	yes	16.93548
1775.111	84.889	85	female	no	12.70903
1775.262	84.738	85	female	no	14.23358

The sampling frame is a rectangle in the Lexis diagram, see Figure 3.7. It can be described as all persons born between January 1, 1775 and January 1, 1881, and present in the solid rectangle at some moment.

FIGURE 3.7: Old age mortality sampling frame, Lexis diagram.

Example 3.2 (Old age mortality)

There are four possible explanatory variables behind survival after age 60 in the example. \[\begin{eqnarray*} \mbox{farmer} & = & \left\{ \begin{array}{ll} \mbox{no} & \\ \mbox{yes} & \left(e^{\alpha}\right) \end{array} \right. \\ %\end{equation*} %\begin{equation*} \mbox{sex} & = & \left\{ \begin{array}{ll} \mbox{male} & \\ \mbox{female} & \left(e^{\beta}\right) \end{array} \right. \\ \mbox{birthdate } (x_1) & & \mspace{90mu} \left(e^{\gamma_1 x_1}\right) \\ \mbox{IMR at birth } (x_2) & & \mspace{90mu} \left(e^{\gamma_2 x_2}\right) \\ \end{eqnarray*}\] The two first are factors, and the last two are continuous covariates. We go through the three distinct combinations and illustrate the meaning of interaction in each. \(\Box\)

3.6.1 Two factors

Assume that we, in the oldmort example, only have the factors sex and farmer at our disposal as explanatory variables. We may fit a Cox regression model like this:

fit <- coxph(Surv(enter, exit, event) ~ sex + farmer, data = om)

The output from coxph (the model fit) is saved in the object fit, which is investigated, first by the the function drop1, which performs successive \(\chi^2\) tests of the significance of the estimated regression coefficients (the null hypothesis in each case is that the true coefficient is zero).

drop1(fit, test = "Chisq")

Single term deletions

Model:
Surv(enter, exit, event) ~ sex + farmer
       Df   AIC     LRT  Pr(>Chi)
<none>    25916                  
sex     1 25937 22.5960 1.999e-06
farmer  1 25919  5.0123   0.02517

The \(p\)-value corresponding to the coefficient for sex is very small, meaning that there is a strongly significant difference in mortality between women and men. The difference in mortality between farmers and non-farmers is also statistically significant, but with a larger \(p\)-value.

Next, we want to see the size of the difference in mortality between women and men, and between farmers and non-farmers, so the estimates are printed (rounded to three decimal places).

round(summary(fit)$coefficients, 3)

            coef exp(coef) se(coef)      z Pr(>|z|)
sexfemale -0.232     0.793    0.048 -4.778    0.000
farmeryes -0.127     0.880    0.058 -2.216    0.027

This result tells us that female mortality is 79.3 per cent of the male mortality, and that the farmer mortality is 88 per cent of the non-farmer mortality (from the column “exp(coef)”). Furthermore, this model is multiplicative (additive on the log scale), so we can conclude that the mortality of a female farmer is \(0.793 \times 0.880 \times 100\) = 69.8 per cent of that of a male non-farmer.

We can also illustrate the sex difference graphically, see Figure 3.8.

fit2 <- coxph(Surv(enter, exit, event) ~ strata(sex) + farmer, 
              data = om)
plot(survfit(fit2), fun = "cumhaz", xlim = c(60, 85), 
     lty = 1:2, xlab = "Age")
abline(h = 0)
legend("topleft", legend = c("Men", "Women"), lty = 1:2)

FIGURE 3.8: Cumulative hazards for men and women.

Obviously the proportionality assumption is well met: It can also be formally tested as follows:

fit3 <- coxph(Surv(enter, exit, event) ~ sex + farmer, 
              data = om)
(pczp <- cox.zph(fit3))

##        chisq df     p
## sex    2.777  1 0.096
## farmer 0.676  1 0.411
## GLOBAL 2.964  2 0.227

First look at the \(p\)-value corresponding to GLOBAL: it gives an overall verdict of the proportionality assumption. Values below 0.05 (say) are suspicious: It tells us that some variable is bad, and it is time to look at the individual \(p\)-values. There are no problems here.

As always, you should interpret these tests with care. With large data sets, you will frequently get small \(p\)-values, and still the graph shows a reasonable agreement with proportionality. Then do not bother about the proportionality.

An obvious question to ask: Is the sex difference in mortality the same among farmers as among non-farmers? In other words: Is there an interaction between sex and farmer/non-farmer regarding the effect on mortality?

In order to test for an interaction in R, the plus (\(+\)) sign in the formula is changed to a multiplication sign (\(*\)):

fit4 <- coxph(Surv(enter, exit, event) ~ sex * farmer, 
              data = om)
(x <- drop1(fit4, test = "Chisq"))

## Single term deletions
## 
## Model:
## Surv(enter, exit, event) ~ sex * farmer
##            Df   AIC LRT Pr(>Chi)
## <none>        25912             
## sex:farmer  1 25916 6.4  0.01141

The small \(p\)-value (1.14 per cent) indicates that there is a statistically significant (at the 5 per cent level) interaction effect. The size of the interaction effect is seen in the table of estimated coefficients (rounded to three decimal places):

round(summary(fit4)$coefficients[, 1:2], 3)

##                       coef exp(coef)
## sexfemale           -0.297     0.743
## farmeryes           -0.252     0.777
## sexfemale:farmeryes  0.291     1.338

Interpretation: The mortality of female farmers is \(100 \times 0.743 = 74.3\) per cent of the mortality of male farmers. Hovever, moving to the non-farmer group, this relation has to be multiplied by \(1.338\), the interaction effect. The result is that among non-farmers, the female mortality is 99.4 per cent of the male mortality.

In cases like this, it is often recommended to perform separate analyses for women and men, or, alternatively, for farmers and non-farmers. Do that as an exercise and interpret your results.

3.6.2 One factor and one continuous covariate

Now the covariates sex (factor) and birthdate (continuous) are in focus. First the model without interaction is fitted.

fit5 <- coxph(Surv(enter, exit, event) ~ sex + birthdate, data = om)
drop1(fit5, test = "Chisq")

## Single term deletions
## 
## Model:
## Surv(enter, exit, event) ~ sex + birthdate
##           Df   AIC     LRT Pr(>Chi)
## <none>       25919                 
## sex        1 25936 18.9812 1.32e-05
## birthdate  1 25919  2.2364   0.1348

Obviously sex is statistically significant, while imr.birth is not. We go directly to the estimated coefficients after adding an interaction term to the model:

fit6 <- coxph(Surv(enter, exit, event) ~ sex * birthdate, data = om)
round(summary(fit6)$coefficients[, 1:2], 5)

##                         coef  exp(coef)
## sexfemale            9.12285 9162.24554
## birthdate           -0.00327    0.99673
## sexfemale:birthdate -0.00519    0.99483

What is going on here? The main effect of sex is huge! The answer lies in the interpretation of the coefficients: (i) To evaluate the effect on mortality for an individual with given values of the covariates, you must involve the interaction coefficient. For instance, a female born on January 2, 1801 (birthdate = 1801.003) will have the relative risk

\[\exp(9.12285 - 0.00327 \times 1801.003 - 0.00519 \times 1801.003) = 0.00185\]

compared to a man with birthdate= 0, the reference values! Obviously, it is ridiculous to extrapolate the model 1800 years back in time, but this is nevertheless the correct way to interpret the coefficients. So, at birthdate = 0, women’s mortality is 9162 times higher than that of men! (ii) With interactions involved, it is strongly recommended to center continuous covariates. In our example, we could do that by subtracting 1810 (say) from birthdate. That makes sex = "male", birthdate = 1800 the new reference values. The result is

##                         coef exp(coef)
## sexfemale           -0.26429   0.76775
## birthdate           -0.00327   0.99673
## sexfemale:birthdate -0.00519   0.99483

This is more reasonable-looking: At the beginning of the year 1810, woman had a mortality of 76.8 per cent of that of men.

To summarize: The effects can be illustrated by two curves, one for men and one for women. The coefficients for sex, 0 and -0.26429, are the respective intercepts, and -0.00327 and -0.00846 are the slopes. See Figure 3.9 for a graphical illustration.

FIGURE 3.9: The effect of birthdate on relative mortality for men and women. Reference: Men, birthdate = 1810.

3.6.3 Two continuous covariates

In our example data, the covariates birthdate and imr.birth are continuous, and we get

fit8 <- coxph(Surv(enter, exit, event) ~ birthdate * imr.birth, data = om)
(res <- round(summary(fit8)$coefficients[, 1:2], 5))

##                         coef exp(coef)
## birthdate           -0.03024   0.97021
## imr.birth            0.02004   1.02024
## birthdate:imr.birth  0.00158   1.00158

The interpretation of the coefficients are: The reference point is birthdate = 0 (1810) and imr.birth = 0, and if birthdate = x and imr.birth = y, then the relative risk is
\[ \exp(-0.03024 x + 0.02004 y + 0.00158 x y),\] analogous to the calculation in the case with one continuous and one factor covariate.