4.2 The connection to Cox regression
There is an interesting connection between Cox regression and Poisson regression, which can be illustrated as follows.
## enter exit event x
## 1 0 1 1 0
## 2 0 2 1 1
## 3 0 3 1 0
## 4 0 4 1 1
We have generated a very simple data set, dat. It consists of four
life lengths with a covariate \(x\). A Cox regression, where the covariate
\(x\) is related to survival is given by
## Call:
## coxreg(formula = Surv(enter, exit, event) ~ x, data = dat)
##
## Covariate Mean Coef Rel.Risk S.E. Wald p
## x 0.600 -0.941 0.390 1.240 0.448
##
## Events 4
## Total time at risk 10
## Max. log. likelihood -2.87
## LR test statistic 0.62
## Degrees of freedom 1
## Overall p-value 0.43248
## [[1]]
## [,1] [,2]
## [1,] 1 0.2430560
## [2,] 2 0.3270380
## [3,] 3 0.5478007
## [4,] 4 1.0000000
##
## attr(,"class")
## [1] "hazdata"
Note that fit$hazards is a list with one component per
stratum. In this case there is only one stratum, so the list has one
component. This component is a matrix with two columns; the first contains
the observed distinct failure times, and the second the corresponding
estimates of the hazard “atoms”.
Now, this can be replicated by Poisson regression! First the data set
is summarized around each observed failure time. That is, a snapshot of the
risk set at each failure time is created by the function toBinary.
## event riskset risktime x orig.row
## 1 1 1 1 0 1
## 2 0 1 1 1 2
## 3 0 1 1 0 3
## 4 0 1 1 1 4
## 2.1 1 2 2 1 2
## 3.1 0 2 2 0 3
## 4.1 0 2 2 1 4
## 3.2 1 3 3 0 3
## 4.2 0 3 3 1 4
## 4.3 1 4 4 1 4
Note that three new “covariates” are created, riskset, risktime,
and orig.row. In the Poisson regression to follow, the factor
riskset must be included as a cluster. Formally, this is
equivalent to including it as a factor covariate, but for data set with
many distinct failure times, the number of levels will be too large to be
run by glm. The function glmmboot in eha is better suited. However, with
our small example, glm is fine:
## (Intercept) riskset2 riskset3 riskset4 x
## -1.0227302 0.4456807 0.6931472 1.9633438 -0.9406136
The parameter estimate corresponding to \(x\) is exactly the same as in the
Cox regression. The baseline hazard function is estimated with the aid of
the frailty estimates for the clustering based on riskset.
## (Intercept) riskset2 riskset3 riskset4 x
## 0.3596118 0.5615528 0.7192236 2.5615528 0.3903882
These numbers are the hazard atoms in the baseline hazard estimation, so
they should be equal to the output from fit$hazards in the Cox
regression. This is not the case, and this depends on the fact that
coxreg estimates the baseline hazards at the mean of the covariate,
i.e., at \(x = 0.5\) in our toy example. The easiest way to make the numbers
comparable is
by subtracting 0.5 from \(x\) in the Poisson regression.
datB$x <- datB$x - fit$means
fit2 <- glm(event ~ riskset + x, family = poisson, data = datB)
(co <- coefficients(fit2[1:4]))## (Intercept) riskset2 riskset3 riskset4 x
## -1.0227302 0.4456807 0.6931472 1.9633438 -0.9406136
## (Intercept) riskset2 riskset3 riskset4 x
## 0.3596118 0.5615528 0.7192236 2.5615528 0.3903882
This is exactly what coxreg produced.
This correspondence between Poisson and Cox regression was pointed out by Johansen (1983).
References
Johansen, S. 1983. “An Extension of Cox’s Regression Model.” International Statistical Review 51: 165–74.