1.5 Event history data
Event history data arise, as the name suggests, by following subjects over time and making notes about what happens and when. Usually the interest is concentrated to a few specific kinds of events. The main application in this book is demography and epidemiology, hence events of primary interest are births, deaths, marriages and migration.
As a rather complex example, let us look at marital fertility in 19th century Sweden, see Figure 1.4.
FIGURE 1.4: Marital fertility.
In a marital fertility study, women are typically followed over time from the time of their marriage until the time the marriage is dissolved or her fertility period is over, say at age 50, whichever comes first. The marriage dissolution may be due to the death of the woman or of her husband, or it may be due to a divorce. If the study is limited to a given geographical area, women may get lost to follow-up due to out-migration. This event gives rise to a right-censored observation.
During the follow-up, the exact timings of child births are recorded. Interest in the analysis may lie in investigating which factors, if any, that affect the length of birth intervals. A data set may look like this:
## id parity age year next.ivl event prev.ivl ses
## 1 1 0 24 1825 0.411 1 NA farmer
## 2 1 1 25 1826 22.348 0 0.411 farmer
## 3 2 0 18 1821 0.304 1 NA unknown
## 4 2 1 19 1821 1.837 1 0.304 unknown
## 5 2 2 21 1823 2.546 1 1.837 unknown
## 6 2 3 23 1826 2.541 1 2.546 unknown
## 7 2 4 26 1828 2.431 1 2.541 unknown
## 8 2 5 28 1831 2.472 1 2.431 unknown
## 9 2 6 31 1833 3.173 0 2.472 unknown
This is the first 9 rows, corresponding to the first two mothers in the data file. The variable id is mother’s id, a label that uniquely identifies each individual.
A birth interval has a start point (in time) and an end point. These points are the time points of births, except for the first interval, where the start point is time of marriage, and the last interval, which is open to the right. However, the last interval is stopped at the time of marriage dissolution or when the mother becomes 50, whatever comes first. The variable parity is zero for the first interval, between date of marriage and date of first birth, one for the next interval, and so forth. The last (highest) number is thus equal to the total number of births for a woman during her first marriage (disregarding twin births, etc.).
Here is a description variable by variable of the data set.
- id The mother’s unique id.
- parity Order of previous birth, see above for details. Starts at zero.
- age Mother’s age at the event defining the start of the interval.
- year Calendar year for the birth defining the start of the interval.
- next.ivl measures the time in years from the birth at parity to the birth at parity + 1, or, for the woman’s last interval, to the age of right censoring.
- event is an indicator for the interval ending with a birth. It is always equal to 1, except for the last interval, which always has event equal to zero.
- prev.ivl is the length of the interval preceding this one. For the first interval of a woman, it is always NA (Not Available).
- ses Socio-economic status (based on occupation data).
Just to make it clear: The first woman has id 1. She is represented by two records, meaning that she gave birth to one child. She waited 0.411 years from marriage to the first birth, and 22.348 years from the first birth to the second, which never happened. The second woman (2) is represented by seven records, implying that she gave birth to six children. And so on.
Of course, in an analysis of birth intervals we are interested in causal effects; why are some intervals short while others are long? The dependence of the history can be modeled by lengths of previous intervals (for the same mother), parity, survival of earlier births, and so on. Note that all relevant covariate information must refer to the past. More about that later.
The first interval of a woman is different from the others, since it starts with marriage. It therefore makes sense to analyze these intervals separately. The last interval of a woman is also special; it always ends with a right censoring, at the latest when the woman is 50 years of age. You should think of data for a woman generated sequentially in time, starting at the day of her marriage. Follow-up is made to the next birth, as long as she is alive, the marriage is still alive, and she is younger than 50 years of age. If there is no next birth, i.e., she reaches 50, or the marriage is dissolved (most often by death of one of the spouses), the interval is censored at the duration when she still was under observation. Censoring can also occur by emigration, and reaching the end of follow-up, in this case November 5, 1901.\(\Box\)
Another useful setup is the illness-death model, see Figure 1.5.
FIGURE 1.5: The illness-death model.
Individuals may move back and forth between the states Healthy and Diseased, and from each of these two states there is a pathway to Dead, which is an absorbing state, meaning that once in that state, you never leave it.\(\Box\)