4.2 The connection to Cox regression

There is an interesting connection between Cox regression and Poisson regression, which can be illustrated as follows.

dat <- data.frame(enter = rep(0, 4), exit = 1:4, 
                  event = rep(1, 4), x = c(0, 1, 0, 1))
dat

##   enter exit event x
## 1     0    1     1 0
## 2     0    2     1 1
## 3     0    3     1 0
## 4     0    4     1 1

We have generated a very simple data set, dat. It consists of four life lengths with a covariate $x$. A Cox regression, where the covariate $x$ is related to survival is given by

library(eha)
fit <- coxreg(Surv(enter, exit, event) ~ x, data = dat)
fit

## Call:
## coxreg(formula = Surv(enter, exit, event) ~ x, data = dat)
## 
## Covariate           Mean       Coef     Rel.Risk   S.E.    Wald p
## x                   0.600    -0.941     0.390     1.240     0.448 
## 
## Events                    4 
## Total time at risk            10 
## Max. log. likelihood       -2.87 
## LR test statistic         0.62 
## Degrees of freedom        1 
## Overall p-value           0.43248

fit$hazards

## [[1]]
##      [,1]      [,2]
## [1,]    1 0.2430560
## [2,]    2 0.3270380
## [3,]    3 0.5478007
## [4,]    4 1.0000000
## 
## attr(,"class")
## [1] "hazdata"

Note that fit$hazards is a list with one component per stratum. In this case there is only one stratum, so the list has one component. This component is a matrix with two columns; the first contains the observed distinct failure times, and the second the corresponding estimates of the hazard “atoms”.

Now, this can be replicated by Poisson regression! First the data set is summarized around each observed failure time. That is, a snapshot of the risk set at each failure time is created by the function toBinary.

library(eha)
datB <- toBinary(dat)
datB

##     event riskset risktime x orig.row
## 1       1       1        1 0        1
## 2       0       1        1 1        2
## 3       0       1        1 0        3
## 4       0       1        1 1        4
## 2.1     1       2        2 1        2
## 3.1     0       2        2 0        3
## 4.1     0       2        2 1        4
## 3.2     1       3        3 0        3
## 4.2     0       3        3 1        4
## 4.3     1       4        4 1        4

Note that three new “covariates” are created, riskset, risktime, and orig.row. In the Poisson regression to follow, the factor riskset must be included as a cluster. Formally, this is equivalent to including it as a factor covariate, but for data set with many distinct failure times, the number of levels will be too large to be run by glm. The function glmmboot in eha is better suited. However, with our small example, glm is fine:

fit2 <- glm(event ~ riskset + x, family = poisson, data = datB)
(co <- coefficients(fit2[1:4]))

## (Intercept)    riskset2    riskset3    riskset4           x 
##  -1.0227302   0.4456807   0.6931472   1.9633438  -0.9406136

co[2:4] <- co[2:4] + co[1]

The parameter estimate corresponding to $x$ is exactly the same as in the Cox regression. The baseline hazard function is estimated with the aid of the frailty estimates for the clustering based on riskset.

exp(co)

## (Intercept)    riskset2    riskset3    riskset4           x 
##   0.3596118   0.5615528   0.7192236   2.5615528   0.3903882

These numbers are the hazard atoms in the baseline hazard estimation, so they should be equal to the output from fit$hazards in the Cox regression. This is not the case, and this depends on the fact that coxreg estimates the baseline hazards at the mean of the covariate, i.e., at $x = 0.5$ in our toy example. The easiest way to make the numbers comparable is by subtracting 0.5 from $x$ in the Poisson regression.

datB$x <- datB$x - fit$means
fit2 <- glm(event ~ riskset + x, family = poisson, data = datB)
(co <- coefficients(fit2[1:4]))

## (Intercept)    riskset2    riskset3    riskset4           x 
##  -1.0227302   0.4456807   0.6931472   1.9633438  -0.9406136

co[2:4] <- co[2:4] + co[1]
exp(co)

## (Intercept)    riskset2    riskset3    riskset4           x 
##   0.3596118   0.5615528   0.7192236   2.5615528   0.3903882

This is exactly what coxreg produced.

This correspondence between Poisson and Cox regression was pointed out by Johansen (1983).

References

Johansen, S. 1983. “An Extension of Cox’s Regression Model.” International Statistical Review 51: 165–74.