10.4 Regression
It is also possible to include covariates in the estimating procedure.
\[\begin{equation*}
P_k(t) = 1 - \exp\left\{-\Gamma_k(t)\exp\left(X\beta^{(k)}\right)\right\}, \quad
t > 0, \quad k = 1, 2, 3.
\end{equation*}\]
These equations are estimated separately. In R, this can be done with the package
cmprsk (Fine and Gray 1999; Gray 2011).
People born in Skellefteå in northern Sweden around the year 1800 are followed over time from 15 years of age to out-migration or death. Right censoring occurs for those who are still present on January 1, 1870. At exit the cause is noted, see Figure 10.2.
FIGURE 10.2: Mortality and emigration, Skellefteå, 1800–1870.
The data at hand looks like this:
## Pnr Mpnr Fpnr birthdate enter exit event parity ab
## 1 801000758 775000541 771000493 1801.023 15 39.47241 2 3 leg
## 2 801000759 758000309 753000242 1801.028 15 50.00000 0 4 leg
## 3 801000760 770000458 765000452 1801.031 15 50.00000 0 10 leg
## 4 801000763 777000494 775000404 1801.039 15 26.10868 2 3 leg
## 5 801000768 780000572 774000453 1801.061 15 16.43292 2 2 leg
## 6 801000770 774000442 776000421 1801.072 15 48.59237 1 2 leg
A summary:
## birthdate enter exit event parity
## Min. :1801 Min. :15 Min. :15.00 Min. :0.0000 Min. : 1.000
## 1st Qu.:1835 1st Qu.:15 1st Qu.:20.47 1st Qu.:0.0000 1st Qu.: 2.000
## Median :1856 Median :15 Median :28.34 Median :0.0000 Median : 3.000
## Mean :1852 Mean :15 Mean :31.82 Mean :0.7138 Mean : 3.916
## 3rd Qu.:1872 3rd Qu.:15 3rd Qu.:44.82 3rd Qu.:2.0000 3rd Qu.: 6.000
## Max. :1886 Max. :15 Max. :50.00 Max. :4.0000 Max. :19.000
## ab
## leg :16423
## illeg: 850
##
##
##
##
##
## 0 1 2 3 4
## 10551 2102 4097 59 464
The variable event is coded as shown in Table 10.1.
| Code | Meaning | Frequency |
|---|---|---|
| 0 | Censored | 10543 |
| 1 | Dead | 2102 |
| 3 | Moved to other parish in Sweden | 4097 |
| 4 | Moved to other Scandinavian country | 59 |
| 5 | Moved outside tha Scandinavian countries | 464 |
The code for the function comp is displayed at the end of this
chapter. It is not necessary to understand before reading on. It may be
useful as a template for those who wants to analyze competing risks.
FIGURE 10.3: Cause-specific exit probabilities; mortality and migration from Skellefteå.
And with covariates and cmprsk; the syntax to get it done is a little
bit nonstandard. Here moving to other parish in Sweden is analysed
(failcode = 3).
library(cmprsk)
xx <- model.matrix(~ -1 + parity, zz)
fit <- crr(zz$exit, zz$event, xx, failcode = 3)
fit## convergence: TRUE
## coefficients:
## parity
## 0.03927
## standard errors:
## [1] 0.04489
## two-sided p-values:
## parity
## 0.38
Note that this is not Cox regression (but close!). The corresponding Cox regression is
Covariate Mean Coef Rel.Risk S.E. LR p
parity 3.890 0.045 1.046 0.048 0.363
Events 59
Total time at risk 290569
Max. log. likelihood -548.83
LR test statistic 0.83
Degrees of freedom 1
Overall p-value 0.363362
References
Fine, J. P., and R. J. Gray. 1999. “A Proportional Hazards Model for the Subdistribution of a Competing Risk.” Journal of the American Statistical Association 94: 496–509.
Gray, Bob. 2011. Cmprsk: Subdistribution Analysis of Competing Risks. http://CRAN.R-project.org/package=cmprsk.