4.4 Tabular lifetime data

Although it is possible to use Poisson regression in place of Cox regression, the most useful application of Poisson regression in survival analysis is with tabular data.

Example 4.2 (Mortality in ages 61–80, Sweden 2007.)

In eha, there is the data set swe07, which contains population size and number of deaths by age and sex in Sweden 2007. The age span is restricted to ages 61–80.

head(swe07)

##     pop deaths    sex age  log.pop
## 1 63483    286 female  61 11.05853
## 2 63770    309 female  62 11.06304
## 3 64182    317 female  63 11.06948
## 4 63097    366 female  64 11.05243
## 5 61671    387 female  65 11.02957
## 6 57793    419 female  66 10.96462

tail(swe07)

##      pop deaths  sex age  log.pop
## 35 31074    884 male  75 10.34413
## 36 29718    904 male  76 10.29951
## 37 29722   1062 male  77 10.29964
## 38 28296   1112 male  78 10.25048
## 39 27550   1219 male  79 10.22376
## 40 25448   1365 male  80 10.14439

The Poisson model is, where \(D_{ij}\) is number of deaths and \(P_{ij}\) population size for age \(i\) and sex \(j\), \(i = 61, \ldots, 80\); \(j = \text{female (0)}, \text{male (1)}\), and \(\lambda_{ij}\) is the corresponding mortality,

\[\begin{equation*} D_{ij} \sim \text{Poisson}(\lambda_{ij} P_{ij}), \quad i = 61, 62, \ldots, 80; \; j = 0, 1, \end{equation*}\]

with

\[\begin{equation*} \begin{split} \lambda_{61,j} P_{61,j} &= P_{61,j}\exp(\gamma + \beta * j) \\ &= \exp(\log(P_{61,j}) + \gamma + \beta * j), \quad j = 0, 1, \end{split} \end{equation*}\]

and

\[\begin{equation*} \begin{split} \lambda_{ij} P_{ij} &= P_{ij} \exp(\gamma + \alpha_i + \beta * j) \\ &= \exp(\log(P_{ij}) + \gamma + \alpha_i + \beta * j), \quad i = 62, 63, \ldots, 80; \; j = 0, 1. \end{split} \end{equation*}\]

This calculation shows that it is the log of the population sizes (\(\log(P_{ij}\)) that is the correct offset to use in the Poisson regression. First we want age to be a factor (no restrictions like linearity), then the R function glm (“generalized linear model”) is used to fit a Poisson regression model.

swe07$age <- factor(swe07$age)
fit <- glm(deaths ~ offset(log.pop) + sex + age, family = poisson, 
           data = swe07)
drop1(fit, test = "Chisq")

## Single term deletions
## 
## Model:
## deaths ~ offset(log.pop) + sex + age
##        Df Deviance     AIC    LRT  Pr(>Chi)    
## <none>        10.2   382.8                     
## sex     1   1428.0  1798.5 1417.7 < 2.2e-16 ***
## age    19   9764.6 10099.2 9754.4 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The function drop1 is used above to perform two LR tests. The results are that both variables are highly statistically significant, meaning that both are very important in describing the variation in mortality over sex and age. To know how, we present the parameter estimates.

summary(fit)

## 
## Call:
## glm(formula = deaths ~ offset(log.pop) + sex + age, family = poisson, 
##     data = swe07)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.94763  -0.40528   0.00104   0.32469   1.00890  
## 
## Coefficients:
##             Estimate Std. Error  z value Pr(>|z|)    
## (Intercept) -5.41436    0.03777 -143.358  < 2e-16 ***
## sexmale      0.46500    0.01246   37.311  < 2e-16 ***
## age62        0.04018    0.05171    0.777  0.43705    
## age63        0.15654    0.05019    3.119  0.00181 ** 
## age64        0.28218    0.04896    5.764 8.23e-09 ***
## age65        0.36374    0.04834    7.524 5.30e-14 ***
## age66        0.48712    0.04786   10.178  < 2e-16 ***
## age67        0.61210    0.04755   12.873  < 2e-16 ***
## age68        0.62987    0.04868   12.938  < 2e-16 ***
## age69        0.81418    0.04742   17.168  < 2e-16 ***
## age70        0.81781    0.04744   17.239  < 2e-16 ***
## age71        0.93264    0.04690   19.887  < 2e-16 ***
## age72        1.03066    0.04659   22.122  < 2e-16 ***
## age73        1.12376    0.04618   24.334  < 2e-16 ***
## age74        1.27287    0.04548   27.985  < 2e-16 ***
## age75        1.37810    0.04507   30.577  < 2e-16 ***
## age76        1.45245    0.04481   32.413  < 2e-16 ***
## age77        1.61461    0.04366   36.979  < 2e-16 ***
## age78        1.73189    0.04317   40.115  < 2e-16 ***
## age79        1.83958    0.04266   43.125  < 2e-16 ***
## age80        2.00326    0.04217   47.502  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 10876.873  on 39  degrees of freedom
## Residual deviance:    10.232  on 19  degrees of freedom
## AIC: 382.81
## 
## Number of Fisher Scoring iterations: 3

The results so tell us that males have a distinctly higher mortality than females, and that mortality steadily increases with age (no surprises!). The parameter estimates for age also opens up the possibility to simplify the model by assuming a linear or quadratic effect of age on mortality. We leave that possibility for later.

One question remains: Is the female advantage relatively the same over ages? To answer that question, we introduce an interaction:

fit1 <- glm(deaths ~ offset(log.pop) + sex * age, 
            family = poisson, data = swe07)
drop1(fit1, test = "Chisq")

## Single term deletions
## 
## Model:
## deaths ~ offset(log.pop) + sex * age
##         Df Deviance    AIC    LRT Pr(>Chi)
## <none>        0.000 410.58                
## sex:age 19   10.232 382.81 10.232    0.947

Note that the drop1 function only tests the interaction term. This is because, as we saw before, the main effects tend to be meaningless in the presence of interaction. However, here there is no sign of interaction, and we can conclude that in a survival model with sex as covariate, we have proportional hazards! This is most easily seen in graphs. The first plot is based on the estimated Poisson model without interaction. Some calculations are first needed.

beta <- coefficients(fit)[2]
alpha <- coefficients(fit)[-2] # Remove sex
alpha[2:length(alpha)] <- alpha[2:length(alpha)] + alpha[1]
lambda.females <- exp(alpha)
lambda.males <- exp(alpha + beta)

Then the plot of the hazard functions by

plot(61:80, lambda.males, ylim = c(0, 0.06), xlab = "Age", 
     type = "s")
lines(61:80, lambda.females, type = "s", lty = 2)
abline(h = 0)

Note the parameter ylim. It sets the scale on the \(y\) axis, and some trial and error may be necessary to get a good choice. The function line adds a curve to an already existing plot, and the function abline adds a straight line; the argument h = 0 results in a horizontal line. See the help pages for these functions for further information.

The result is shown in Figure 4.4.

FIGURE 4.4: Model based hazard functions for males (upper) and females (lower).

We can also make a plot based only on raw data. For that it is only necessary to calculate the occurrence-exposure rates. An occurrence-exposure rate is simply the ratio between the number of occurrences (of some event) divided by total exposure (waiting) time. In the present example, mean population size during one year is a very good approximation of the exact total exposure time (in years).

femal <- swe07[swe07$sex == "female", ]
mal <- swe07[swe07$sex == "male", ]
f.rate <- femal$deaths / femal$pop
m.rate <- mal$deaths / mal$pop

And finally the plot of the raw death rates is created as follows,

plot(61:80, m.rate, ylim = c(0, 0.06), xlab = "Age", 
     ylab = "Mortality", type = "s")
lines(61:80, f.rate, lty = 2, type = "s")
abline(h = 0)

with the result as shown in Figure 4.5.

FIGURE 4.5: Hazard functions for males (upper) and females (lower) based on raw data.

The differences between Figure 4.4 and Figure 4.5 are very small, showing that the Poisson model without interaction fits the Swedish old age mortality data very well. \(\Box\)