# Event History Analysis with R, Second Edition

*2021-09-02*

# Preface

The first edition of this book was published in 2012, nine years ago. Since then
the field of event history and survival analysis has grown and developed rapidly,
both in terms of available scientific data and
of software development for analyzing data, not the least in the **R** environment.
We have also seen the public need of analyzing what is going on around the world, and the
present (2021) state of the COVID-19 pandemic is a striking example of that.

So the time for a second edition of the book is now. The basic chapters on Cox regression
and proportional hazards modeling are much the same as in the first edition, but they
have been updated. There are two new chapters, Chapter 4, *Explanatory Variables and Regression*,
and Chapter 7, *Register-Based Survival Data Models*. Compared to the first edition,
chapters have been reordered so that the logical flow is clearer. On the other hand,
the appendices C and D are somewhat shorter now because there are a lot of excellent sources
on-line today covering their topic, which anyway is a little bit off here.

Since the publication of the first edition in 2012, focus has gradually shifted towards
the analysis of large and huge data sets, where Cox
regression favorably can be replaced by parametric proportional hazards models with
piecewise constant baseline hazard hazard functions. With huge data sets with
excessively many events, this way of tabulating data leads to a significant reduction of
necessary efforts in producing reliable results with the same precision and power a full
analysis would have had. This result relies on reduction by the mathematical *sufficiency
principle*, all irrelevant (so defined by the model) noise is eliminated. A word of
warning is that this noise (*residuals*) may be relevant in *model evaluation*.

A second point of importance is how to present results from a regression analysis,
and especially how to present estimated *p*-values. Two important issues in this area
are (i) present only *relevant* ones, and (ii) make sure they are of the right kind,
that is, *likelihood ratio based*. This important topic is addressed in the book,
and supported by new summary functions in the **R** package `eha`

(Broström 2021).

The writing of this book has been done in parallel with the development of the **R** package
`eha`

. Almost all the data sets used in the examples in this book are available in `eha`

,
so you can easily play around with them on your own. Some data sets will also be published
on the home page of the package, https://ehar.se/r/eha/.

I had, as usual, invaluable support from the publisher, CRC Press, Ltd. I especially want to thank Vaishali Singh and Rob Calver for their interest in my work and their encouragement in the project.

The first edition of the book was written in *LaTeX* with support of the **R** package
`Sweave`

, but with the second edition we decided to do the writing in Rmarkdown
using the **R** packages `bookdown`

and `knitr`

. The reason was mainly that it allowed for
the production of output in both *HTML* (for the website) and *PDF* (the printed book).
I am indepted to Yihui Xie for his important work in this area, which made this
approach possible.

Adding to the list of people who gave valuable input to the First Edition of the book: Kristian Hindberg and Glenn Sandström have contributed with suggestions that have improved the text in this Second Edition. Many thanks goes to Elisabeth Engberg, director of the Centre for Demographic and Ageing Research (CEDAR), Umeå University, for letting me use the facilities that made this work possible.

Umeå, April 2021

Göran Broströmprofessor emeritus

CEDAR, Umeå University

### References

*Eha: Event History Analysis*. http://ehar.se/r/eha/.