Event History Analysis with R, Second Edition
The first edition of this book was published in 2012, nine years ago. Since then the field of event history and survival analysis has grown and developed rapidly, both in terms of available scientific data and of software development for analyzing data, not the least in the R environment. We have also seen public need of analyzing what is going on around the world, and the present (2021) state of the COVID-19 pandemic is a striking example of that.
So the time for a second edition of the book is now. The basic chapters on Cox regression and proportional hazards modeling are much the same as in the first edition, but they have been updated. There are two new chapters, Chapter 4, Explanatory Variables and Regression, and Chapter 7, Register-Based Survival Data Models. Compared to the first edition, chapters have been reordered so that the logical flow is clearer. On the other hand, the appendices C and D are somewhat shorter now because there are a lot of excellent sources on-line today covering their topic, which anyway is a little bit off here.
Since the publication of the first edition in 2012, focus has gradually shifted towards the analysis of large and huge data sets, where Cox regression favorably can be replaced by parametric proportional hazards models with piecewise constant baseline hazard hazard functions. With huge data sets with excessively many events, this way of tabulating data leads to a significant reduction of necessary efforts in producing reliable results with the same precision and power a full analysis would have had. This result relies on reduction by the mathematical sufficiency principle, all irrelevant (so defined by the model) noise is eliminated. A word of warning is that this noise (residuals) may be relevant in model evaluation.
A second point of importance is how to present results from a regression analysis,
and especially how to present estimated p-values. Two important issues in this area
are (i) present only relevant ones, and (ii) make sure they are of the right kind,
that is, likelihood ratio based. This important topic is addressed in the book,
and supported by new summary functions in the R package
eha (G. Broström 2021).
The writing of this book has been done in parallel with the development of the R package
eha. Almost all the data sets used in the examples in this book are available in
so you can easily play around with them on your own. Some data sets will also be published
on the home page of the package, http://ehar.se/r/eha/.
I had, as usual, invaluable support from the publisher, CRC Press, Ltd. I especially want to thank Vaishali Singh and Rob Calver for their interest in my work and their encouragement in the project.
The first edition of the book was written in LaTeX with support of the R package
Sweave, but with the second edition we decided to do the writing in Rmarkdown
using the R packages
knitr. The reason was mainly that it allowed for
the production of output in both HTML (for the website) and PDF (the printed book).
I am indepted to Yihui Xie for his important work in this area, which made this
Adding to the list of people who gave valuable input to the First Edition of the book: Kristian Hindberg and Glenn Sandström have contributed with suggestions that have improved the text in this Second Edition. Many thanks goes to Elisabeth Engberg, director of the Centre for Demographic and Ageing Research (CEDAR), Umeå University, for letting me use the facilities that made this work possible.
Umeå, April 2021Göran Broström
CEDAR, Umeå University