4.4 Tabular lifetime data
Although it is possible to use Poisson regression in place of Cox regression, the most useful application of Poisson regression in survival analysis is with tabular data.
In eha, there is the data set swe07, which contains population
size and number of deaths by age and sex in Sweden 2007. The age span is
restricted to ages 61–80.
## pop deaths sex age log.pop
## 1 63483 286 female 61 11.05853
## 2 63770 309 female 62 11.06304
## 3 64182 317 female 63 11.06948
## 4 63097 366 female 64 11.05243
## 5 61671 387 female 65 11.02957
## 6 57793 419 female 66 10.96462
## pop deaths sex age log.pop
## 35 31074 884 male 75 10.34413
## 36 29718 904 male 76 10.29951
## 37 29722 1062 male 77 10.29964
## 38 28296 1112 male 78 10.25048
## 39 27550 1219 male 79 10.22376
## 40 25448 1365 male 80 10.14439
The Poisson model is, where \(D_{ij}\) is number of deaths and \(P_{ij}\)
population size for age \(i\) and sex \(j\), \(i = 61, \ldots, 80\);
\(j = \text{female (0)}, \text{male (1)}\), and \(\lambda_{ij}\) is the corresponding
mortality,
\[\begin{equation*} D_{ij} \sim \text{Poisson}(\lambda_{ij} P_{ij}), \quad i = 61, 62, \ldots, 80; \; j = 0, 1, \end{equation*}\]
with
\[\begin{equation*} \begin{split} \lambda_{61,j} P_{61,j} &= P_{61,j}\exp(\gamma + \beta * j) \\ &= \exp(\log(P_{61,j}) + \gamma + \beta * j), \quad j = 0, 1, \end{split} \end{equation*}\]
and
\[\begin{equation*} \begin{split} \lambda_{ij} P_{ij} &= P_{ij} \exp(\gamma + \alpha_i + \beta * j) \\ &= \exp(\log(P_{ij}) + \gamma + \alpha_i + \beta * j), \quad i = 62, 63, \ldots, 80; \; j = 0, 1. \end{split} \end{equation*}\]
This calculation shows that it is the log of the population sizes
(\(\log(P_{ij}\)) that is the correct offset to use in the Poisson regression.
First we want age to be a factor (no restrictions like linearity),
then the R function glm (“generalized linear model”) is used to
fit a Poisson regression model.
swe07$age <- factor(swe07$age)
fit <- glm(deaths ~ offset(log.pop) + sex + age, family = poisson,
data = swe07)
drop1(fit, test = "Chisq")## Single term deletions
##
## Model:
## deaths ~ offset(log.pop) + sex + age
## Df Deviance AIC LRT Pr(>Chi)
## <none> 10.2 382.8
## sex 1 1428.0 1798.5 1417.7 < 2.2e-16 ***
## age 19 9764.6 10099.2 9754.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The function drop1 is used above to perform two LR tests. The results
are that both variables are highly statistically significant, meaning that
both are very important in describing the variation in mortality over sex
and age. To know how, we present the parameter estimates.
##
## Call:
## glm(formula = deaths ~ offset(log.pop) + sex + age, family = poisson,
## data = swe07)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.94763 -0.40528 0.00104 0.32469 1.00890
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.41436 0.03777 -143.358 < 2e-16 ***
## sexmale 0.46500 0.01246 37.311 < 2e-16 ***
## age62 0.04018 0.05171 0.777 0.43705
## age63 0.15654 0.05019 3.119 0.00181 **
## age64 0.28218 0.04896 5.764 8.23e-09 ***
## age65 0.36374 0.04834 7.524 5.30e-14 ***
## age66 0.48712 0.04786 10.178 < 2e-16 ***
## age67 0.61210 0.04755 12.873 < 2e-16 ***
## age68 0.62987 0.04868 12.938 < 2e-16 ***
## age69 0.81418 0.04742 17.168 < 2e-16 ***
## age70 0.81781 0.04744 17.239 < 2e-16 ***
## age71 0.93264 0.04690 19.887 < 2e-16 ***
## age72 1.03066 0.04659 22.122 < 2e-16 ***
## age73 1.12376 0.04618 24.334 < 2e-16 ***
## age74 1.27287 0.04548 27.985 < 2e-16 ***
## age75 1.37810 0.04507 30.577 < 2e-16 ***
## age76 1.45245 0.04481 32.413 < 2e-16 ***
## age77 1.61461 0.04366 36.979 < 2e-16 ***
## age78 1.73189 0.04317 40.115 < 2e-16 ***
## age79 1.83958 0.04266 43.125 < 2e-16 ***
## age80 2.00326 0.04217 47.502 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 10876.873 on 39 degrees of freedom
## Residual deviance: 10.232 on 19 degrees of freedom
## AIC: 382.81
##
## Number of Fisher Scoring iterations: 3
The results so tell us that males have a distinctly higher mortality than
females, and that mortality steadily increases with age (no
surprises!). The parameter estimates for age also opens up the
possibility to simplify the model by assuming a linear or quadratic effect
of age on mortality. We leave that possibility for later.
One question remains: Is the female advantage relatively the same over ages? To answer that question, we introduce an interaction:
fit1 <- glm(deaths ~ offset(log.pop) + sex * age,
family = poisson, data = swe07)
drop1(fit1, test = "Chisq")## Single term deletions
##
## Model:
## deaths ~ offset(log.pop) + sex * age
## Df Deviance AIC LRT Pr(>Chi)
## <none> 0.000 410.58
## sex:age 19 10.232 382.81 10.232 0.947
Note that the drop1 function only tests the interaction term. This is
because, as we saw before, the main effects tend to be meaningless in the
presence of interaction. However, here there is no sign of interaction, and
we can conclude that in a survival model with sex as covariate, we
have proportional hazards! This is most easily seen in
graphs. The first plot is based on the estimated Poisson model
without interaction. Some calculations are first needed.
beta <- coefficients(fit)[2]
alpha <- coefficients(fit)[-2] # Remove sex
alpha[2:length(alpha)] <- alpha[2:length(alpha)] + alpha[1]
lambda.females <- exp(alpha)
lambda.males <- exp(alpha + beta)Then the plot of the hazard functions by
plot(61:80, lambda.males, ylim = c(0, 0.06), xlab = "Age",
type = "s")
lines(61:80, lambda.females, type = "s", lty = 2)
abline(h = 0)Note the parameter ylim. It sets the scale on the \(y\) axis, and some
trial and error may be necessary to get a good choice.
The function line adds a curve to an already
existing plot, and the function abline
adds a straight line; the argument h = 0 results in a
horizontal line. See the help pages for these functions for further
information.
The result is shown in Figure 4.4.
FIGURE 4.4: Model based hazard functions for males (upper) and females (lower).
We can also make a plot based only on raw data. For that it is only necessary to calculate the occurrence-exposure rates. An occurrence-exposure rate is simply the ratio between the number of occurrences (of some event) divided by total exposure (waiting) time. In the present example, mean population size during one year is a very good approximation of the exact total exposure time (in years).
femal <- swe07[swe07$sex == "female", ]
mal <- swe07[swe07$sex == "male", ]
f.rate <- femal$deaths / femal$pop
m.rate <- mal$deaths / mal$popAnd finally the plot of the raw death rates is created as follows,
plot(61:80, m.rate, ylim = c(0, 0.06), xlab = "Age",
ylab = "Mortality", type = "s")
lines(61:80, f.rate, lty = 2, type = "s")
abline(h = 0)with the result as shown in Figure 4.5.
FIGURE 4.5: Hazard functions for males (upper) and females (lower) based on raw data.
The differences between Figure 4.4 and Figure 4.5 are very small, showing that the Poisson model without interaction fits the Swedish old age mortality data very well. \(\Box\)