Transforms a "survival" data frame into a data frame suitable for binary (logistic) regression

The result of the transformation can be used to do survival analysis via logistic regression. If the cloglog link is used, this corresponds to a discrete time analogue to Cox's proportional hazards model.

toBinary(
  dat,
  surv = c("enter", "exit", "event"),
  strats,
  max.survs = NROW(dat)
)

Arguments

dat	A data frame with three variables representing the survival response. The default is that they are named `enter`, `exit`, and `event`
surv	A character vector with the names of the three variables representing survival.
strats	An eventual stratification variable.
max.survs	Maximal number of survivors per risk set. If set to a (small) number, survivors are sampled from the risk sets.

Value

Returns a data frame expanded risk set by risk set. The three "survival variables" are replaced by a variable named event (which overwrites an eventual variable by that name in the input). Two more variables are created, riskset and orig.row.

event

Indicates an event in the corresponding risk set.

riskset

Factor (with levels 1, 2, ...) indicating risk set.

risktime

The 'risktime' (age) in the corresponding riskset.

orig.row

The row number for this item in the original data frame.

Details

toBinary calls risksets in the eha package.

Note

The survival variables must be three. If you only have exit and event, create a third containing all zeros.

Author

Göran Broström

Examples


enter <- rep(0, 4)
exit <- 1:4
event <- rep(1, 4)
z <- rep(c(-1, 1), 2)
dat <- data.frame(enter, exit, event, z)
binDat <- toBinary(dat)
dat
#>   enter exit event  z
#> 1     0    1     1 -1
#> 2     0    2     1  1
#> 3     0    3     1 -1
#> 4     0    4     1  1
binDat
#>     event riskset risktime  z orig.row
#> 1       1       1        1 -1        1
#> 2       0       1        1  1        2
#> 3       0       1        1 -1        3
#> 4       0       1        1  1        4
#> 2.1     1       2        2  1        2
#> 3.1     0       2        2 -1        3
#> 4.1     0       2        2  1        4
#> 3.2     1       3        3 -1        3
#> 4.2     0       3        3  1        4
#> 4.3     1       4        4  1        4
coxreg(Surv(enter, exit, event) ~ z, method = "ml", data = dat)
#> Call:
#> coxreg(formula = Surv(enter, exit, event) ~ z, data = dat, method = "ml")
#> 
#> Covariate             Mean       Coef     Rel.Risk   S.E.    Wald p
#> z                     0.200    -0.634     0.531     0.639     0.321 
#> 
#> Events                    4 
#> Total time at risk            10 
#> Max. log. likelihood      -5.0071 
#> LR test statistic         1.08 
#> Degrees of freedom        1 
#> Overall p-value           0.299563
## Same as:
summary(glm(event ~ z + riskset, data = binDat, family = binomial(link = cloglog)))
#> 
#> Call:
#> glm(formula = event ~ z + riskset, family = binomial(link = cloglog), 
#>     data = binDat)
#> 
#> Deviance Residuals: 
#>        1         2         3         4       2.1       3.1       4.1       3.2  
#>  1.37207  -0.52778  -0.99448  -0.52778   1.85143  -1.18770  -0.63032   0.70448  
#>      4.2       4.3  
#> -0.92387   0.00019  
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)
#> (Intercept)  -1.3378     1.0363  -1.291    0.197
#> z            -0.6336     0.6330  -1.001    0.317
#> riskset2      0.3551     1.4858   0.239    0.811
#> riskset3      1.1198     1.4147   0.792    0.429
#> riskset4      4.8520   264.2825   0.018    0.985
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 13.460  on 9  degrees of freedom
#> Residual deviance: 10.014  on 5  degrees of freedom
#> AIC: 20.014
#> 
#> Number of Fisher Scoring iterations: 15
#>