Chapter 3 Proportional Hazards and Cox Regression

Proportional hazards is a property of survival models that is fundamental for the development of non-parametric regression models. After defining this property, we move slowly from the two-sample case, via the \(k\)-sample case, to the general regression model. On this route, we start with the log-rank test and end up with Cox regression (Cox 1972).

3.1 Proportional Hazards

The property of proportional hazards is fundamental in Cox regression. It is in fact the essence of Cox’s simple, yet ingenious idea. The definition is as follows:

Definition 3.1 (Proportional hazards) If \(h_1(t)\) and \(h_0(t)\) are hazard functions from two separate distributions, we say that they are proportional if

\[\begin{equation} h_1(t) = \psi h_0(t), \quad \text{for all } t \ge 0, \tag{3.1} \end{equation}\] for some positive constant \(\psi\) and all \(t \ge 0\). Further, if holds, then the same property holds for the corresponding cumulative hazard functions \(H_1(t)\) and \(H_0(t)\).

\[\begin{equation} H_1(t) = \psi H_0(t), \quad \text{for all } t \ge 0, \tag{3.2} \end{equation}\] with the same proportionality constant \(\psi\) as in (3.1).\(\ \Box\)

Strictly speaking, the second part of this definition follows easily from the first (and vice versa), so more correct would be to state one part as a definition and the other as a corollary. The important part of this definition is “\(\textit{for all }t \ge 0\),” and that the constant \(\psi\) does not depend on \(t\).

Think of the hazard functions as age-specific mortality for two groups, e.g., women and men. It is “well known” that women have lower mortality than men in all ages. It would therefore be reasonable to assume proportional hazards in that case. It would mean that the female relative advantage is equally large in all ages. See Example 3.1 for an investigation of this statement.

Example 3.1 (Male and female mortality, Sweden 2001–2020)

This is a continuation of Example 2.1, and here we involve the male mortality alongside with the female, see Figure 3.1. We also expand the calendar time period to the years 2001–2020. The reason for this is that random fluctuations in low ages, caused by the extremely low mortality rates, blur the picture.

FIGURE 3.1: Female and male mortality by age, Sweden 2001-2020. The right hand panel shows the ratio between the male and female mortality by age.

As you can see, the “common knowledge” of proportional hazards is disproved by the right hand panel in Figure 3.1. On the other hand, the left hand panel gives a rather different impression, and this illustrates the need to choose good approaches to graphical presentation, depending on what you want to show.

It must be emphasized that the proportional hazards assumption is an assumption that always must be carefully checked. In many situations, it would not be reasonable to assume proportional hazards. If in doubt, check data by plotting the Nelson-Aalen estimates for each group in the same plot. The left hand panel of Figure 3.1 would suit this purpose better if drawn on a log scale, see Figure 3.2.

FIGURE 3.2: Female and male mortality by age, Sweden 2001–2020. Log scale.

The advantage of the log scale is twofold: (i) Small numbers are magnified so you can see them, and (ii) on the log scale, proportional hazards implies a constant vertical distance between the curves, which is easier to see for the human eye. \(\Box\)

For an example of a perfect fit to the proportional hazards model, see Figure 3.3 (two Weibull hazard functions with the proportionality constant \(\psi = 2\)).

FIGURE 3.3: Two hazard functions that are proportional. The proportionality constant is 2.

In the right hand panel of Figure 3.3, note that both dimensions are on a log scale. This type of plot, constructed from empirical data, is called a Weibull plot in reliability applications: If the lines are straight lines, then data are well fitted by a Weibull distribution. Additionally, if the the slope of the line is 1 (45 degrees), then an exponential model fits well.

To summarize Figure 3.3: (i) The hazard functions are proportional because on the log-log scale, the vertical distance is constant, (ii) Both hazard functions represent a Weibull distribution, because both lines are straight lines, and (iii) neither represents an exponential distribution, because the slopes are not one. This latter fact may be difficult to see because of the different scales on the axes (the aspect ratio is not one).

Figure 3.4 shows the relationships between the cumulative hazards functions, the density functions, and the survival functions when the hazard functions are proportional. Note that the cumulative hazards functions are proportional by implication, with the same proportionality constant (\(\psi = 2\) in this case). On the other hand, for the density and survival functions, proportionality does not hold; it is in fact theoretically impossible except in the trivial case that the proportionality constant is unity.

FIGURE 3.4: The effect of proportional hazards on the density and survival functions.

3.2 The Log-Rank Test

The log-rank test is a \(k\)-sample test of equality of survival functions. It is a powerful test against proportional hazards alternatives, but may be very weak otherwise. We first look at the two-sample case, that is, \(k = 2\).

Suppose that we have the small data set illustrated in Figure 3.5. There are two samples, the letters (A, B, C, D, E) and the numbers (1, 2, 3, 4, 5).

FIGURE 3.5: Two-sample data, the letters (dashed) and the numbers (solid). Circles denote censored observations, plusses events.

The data in Figure 3.5 can be presented i tabular form, see Table ??.

We are interested in investigating whether letters and numbers have the same survival chances or not. Therefore, the hypothesis

\[\begin{equation*} H_0: \text{No difference in survival between numbers and letters} \end{equation*}\] is formulated. In order to test \(H_0\), we make five tables, one for each observed event time, see Table ??, where the the first table, relating to failure time \(t_{(1)} = 1\), is shown.

Let us look at the table at failure time \(t_{(1)} = 1\), Table ??, from the viewpoint of the numbers.

The observed number of deaths among numbers: \(1\).
The expected number of deaths among numbers: \(2 \times 5 / 10 = 1\).

The expected number is calculated under \(H_0\), i.e., as if there is no difference between letters and numbers regarding mortality. It is further assumed that the two margins (Total) are given (fixed) and that the number of deaths follows a hypergeometric distribution, which gives the variance.

Then, given two deaths in total and five out of ten observations are from the group numbers, the expected number of deaths is calculated as above.

This procedure is repeated for each of the five tables, and the results are summarized in Table ??.

Finally, the observed test statistic \(T\) is calculated as

\[\begin{equation*} T = \frac{0.2^2}{1.41} \approx 0.028 \end{equation*}\] Under the null hypothesis, this is an observed value from a \(\chi^2(1)\) distribution, and \(H_0\) should be rejected for large values of \(T\). Using a level of significance of 5%, the cutting point for the value of \(T\) is 3.84, far from our observed value of 0.028. The conclusion is therefore that there is no (statistically significant) difference in survival chances between letters and numbers. Note, however, that this result depends on asymptotic (large sample) properties, and in this toy example, these properties are not valid.

For more detail about the underlying theory, see Appendix A.

Let us now look at a real data example, the old age mortality data set oldmort in eha. See Table ?? for a sample of five records with selected columns.

We are interested in comparing male and female mortality in the ages 60–85 with a log-rank test, and for that purpose we use the logrank function:

library(eha)
fit <- logrank(Surv(enter, exit, event), group = sex, data = om)
fit


     The log-rank test

Call:
logrank(Y = Surv(enter, exit, event), group = sex, data = om)

 X-squared =  19.1971 , df =  1 , p-value =  1.1789e-05

The result of the log-rank test is given by the \(p\)-value 0.0012 per cent, a very small number. Thus, there is a very significant difference in mortality between men and women. But how large is the difference? The log-rank test has no answer to this question.

Remember that this result depends on the proportional hazards assumption. We can graphically check it, see Figure 3.6.

FIGURE 3.6: Cumulative hazards for women and men in ages 60–85, 19th century Sweden.

The proportionality assumption appears to be a good description looking at the left hand panel, but it looks more doubtful on the log scale. It is an often observed phenomenon that differences in mortality between human groups, not only women and men, tend to vanish in the extremely high ages. The validity of our test result, however, is not affected by this deviation from proportionality, since the result was rejection. We can be confident in the fact that there is a significant difference, since violation of the proportionality assumption in itself implies a difference.

3.2.1 Several samples

The result for the two-sample case is easily extended to the \(k\)-sample case. Instead of one \(2 \times 2\) table per observed event time we get one \(k\times 2\) table per observed event time and we have to calculate expected and observed numbers of events for \((k-1)\) groups at each failure time. The resulting test statistic will have \((k-1)\) degrees of freedom and still be approximately \(\chi^2\) distributed. This is illustrated with the same data set, oldmort, as above, but with the covariate civ, which is a factor with three levels (unmarried, married, widow), instead of sex. Furthermore, the investigation is limited to male mortality.

fit <- logrank(Surv(enter, exit, event), group = civ,  
                     data = om[om$sex == "male", ])
fit


     The log-rank test

Call:
logrank(Y = Surv(enter, exit, event), group = civ, data = om[om$sex == 
    "male", ])

 X-squared =  19.8622 , df =  2 , p-value =  4.86372e-05

The degrees of freedom for the score test is now 2, equal to the number of levels in civ minus one. Figure 3.7 shows the cumulative hazards for each group.

FIGURE 3.7: Old age male mortality by civil status, cumulative hazards.

There is obviously nothing much that indicates non-proportionality in this case either. Furthermore, the ordering is what one would expect.

We do not go deeper into this matter here, mainly because the log-rank test is a special case of Cox regression.

3.3 Cox Regression Models

Starting with the definition of proportional hazards in Section 3.1, the concept of Cox regression is introduced in steps.

3.3.1 Two groups

The definition of proportionality, given in equation (3.1), can equivalently be written as

\[\begin{equation} h_x(t) = \psi^x h_0(t), \quad t > 0, \; x = 0, 1, \; \psi > 0. \tag{3.3} \end{equation}\]

It is easy to see that this “trick” is equivalent to (3.1): When \(x = 0\) it simply says that \(h_0(t) = h_0(t)\), and when \(x = 1\) it says that \(h_1(t) = \psi h_0(t)\). Since \(\psi > 0\), we can calculate \(\beta = \log(\psi)\), and rewrite (3.3) as

\[\begin{equation*} h_x(t) = e^{\beta x} h_0(t), \quad t > 0; \; x = 0, 1; \; -\infty < \beta < \infty, \end{equation*}\] or with a slight change in notation,

\[\begin{equation} h(t; x) = e^{\beta x} h_0(t), \quad t > 0; \; x = 0, 1; \; -\infty < \beta < \infty. \tag{3.4} \end{equation}\]

The sole idea by this rewriting is to pave the way for the introduction of Cox’s regression model (Cox 1972), which in its elementary form is a proportional hazards model. In fact, we can already interpret equation (3.4) as a Cox regression model with an explanatory variable \(x\) with corresponding regression coefficient \(\beta\) (to be estimated from data). The covariate \(x\) is still only a dichotomous variate, but we will now show how it is possible to generalize this to a situation with explanatory variables of any form. The first step is to go from the two-sample situation to a \(k\)-sample one.

3.3.2 Many groups

Can we generalize the results of Section 3.3.1 to \((k + 1)\) groups, \(k \ge 2\)? Yes, by expanding that procedure as follows:

\[\begin{eqnarray*} h_0(t) &\sim& \mbox{group 0} \\ h_1(t) &\sim& \mbox{group 1} \\ \cdots & & \cdots \\ h_k(t) &\sim& \mbox{group k} \end{eqnarray*}\]

The underlying model is: \(h_j(t) = \psi_j h_0(t), \quad t \ge 0\), \(j = 1, 2, \ldots, k\). That is, with \((k+1)\) groups, we need \(k\) proportionality constants \(\psi_1, \ldots, \psi_k\) in order to define proportional hazards. Note also that in this formulation (there are others), one group is “marked” as a reference group, that is, to this group there is no proportionality constant attached. All relations are relative to the reference group. Note also that it doesn’t matter which group is chosen as the reference. This choice does not change the model itself, only its representation.

With \((k + 1)\) groups, we need \(k\) indicators. Let

\[\begin{equation*} \mathbf{x} = (x_{1}, x_{2}, \ldots, x_{k}). \end{equation*}\] Then

\[\begin{eqnarray*} \mathbf{x} = (0, 0, \ldots, 0) & \Rightarrow & \mbox{group 0} \\ \mathbf{x} = (1, 0, \ldots, 0) & \Rightarrow & \mbox{group 1} \\ \mathbf{x} = (0, 1, \ldots, 0) & \Rightarrow & \mbox{group 2} \\ \cdots & & \cdots \\ \mathbf{x} = (0, 0, \ldots, 1) & \Rightarrow & \mbox{group k} \end{eqnarray*}\]

and

\[\begin{equation*} h(t; \mathbf{x}) = h_0(t) \prod_{\ell = 1}^k \psi_{\ell}^{x_{\ell}} = \left\{ \begin{array}{ll} h_0(t), & \mathbf{x} = (0, 0, \ldots, 0) \\ h_0(t) \psi_j, & x_{j} = 1, \quad j = 1, \ldots k \end{array}\right. \end{equation*}\] With \(\psi_j = e^{\beta_j}, \; j = 1, \ldots, k\), we get

\[\begin{equation} h(t; \mathbf{x}) = h_0(t) e^{x_{1}\beta_1 + x_{2} \beta_2 + \cdots + x_{k} \beta_k} = h_0(t) e^{\mathbf{x} \boldsymbol{\beta}}, \tag{3.5} \end{equation}\]

where \(\boldsymbol{\beta} = (\beta_1, \beta_2, \ldots, \beta_k)\).

3.3.3 The general Cox regression model

We may now generalize (3.5) by letting the components of \(\mathbf{x}_i\) take any value. Let data and model take the following form:

Data:

\[\begin{equation} (t_{i0}, t_i, d_i, \mathbf{x}_i), \; i = 1, \ldots, n, \end{equation}\]

where \(t_{i0}\) is the left truncation time point (if \(t_{i0} = 0\) for all \(i\), then this variable may be omitted), \(t_i\) is the end time point, \(d_i\) is the “event indicator” (\(1\) or TRUE if event, else \(0\) or FALSE), and \(\mathbf{x}_i\) is a vector of explanatory variables.

Model:

\[\begin{equation} h(t; \mathbf{x}_i) = h_0(t) e^{\mathbf{x}_i \boldsymbol{\beta}}, \quad t > 0. \tag{3.6} \end{equation}\]

This is a regression model where the response variable is \((t_0, t, d)\) (we will call it a survival object) and the explanatory variable is \(\mathbf{x}\), possibly (often) vector valued.

In equation (3.6) there are two components to estimate, the regression coefficients \(\boldsymbol{\beta}\), and the baseline hazard function \(h_0(t), \; t > 0\). For the former task, the partial likelihood (Cox 1975) is used. See Appendix A for a brief summary.

3.4 Estimation of the Baseline Cumulative Hazard Function

The usual estimator (continuous time) of the baseline cumulative hazard function is

\[\begin{equation} \hat{H}_0(t) = \sum_{j:t_j \le t} \frac{d_j}{\sum_{m \in R_j} e^{\mathbf{x}_m \hat{\boldsymbol{\beta}}}}, \tag{3.7} \end{equation}\]

where \(d_j\) is the number of events at \(t_j\) and \(\hat{\boldsymbol{\beta}}\) is the result of maximizing the partial likelihood. Note that if \(\hat{\boldsymbol{\beta}} = 0\), this reduces to

\[\begin{equation} \hat{H}_0(t) = \sum_{j:t_j \le t} \frac{d_j}{n_j}, \tag{3.8} \end{equation}\]

the Nelson-Aalen estimator. In equation (3.8), \(n_j\) is the size of \(R_j\).

In the R package eha, the baseline hazard is estimated at the value zero of the covariates (but at the reference category for a factor covariate). This is different from practice in other packages, where some average value of covariate values are chosen as reference. However, this is generally a bad habit, because it will frequently lead to a variation in the reference value as subsets of the original data set is taken. Instead, the practitioner must exercise judgment in choosing relevant reference values for (continuous) covariates. There are two instances when these choices make a difference: (i) When interactions are present, main effect estimates will vary with choice (more about this later), and (ii) estimation of baseline hazards and survival functions.

In order to calculate the cumulative hazards function for an individual with a specific covariate vector \(\mathbf{x}\), use the formula

\[\begin{equation*} \hat{H}(t; \mathbf{x}) = \hat{H}_0(t) e^{\mathbf{x}\hat{\boldsymbol{\beta}}}. \end{equation*}\]

The corresponding survival functions may be estimated by the relation

\[\begin{equation*} \hat{S}(t; \mathbf{x}) = \exp\bigl(-\hat{H}(t; \mathbf{x})\bigr) \end{equation*}\]

It is also possible to use the terms in the sum in equation (3.7) to build an estimator analogous to the Kaplan-Meier estimator given in equation (2.9). In practice, there is no big difference between the two methods. For more on interpretation of parameter estimates and model selection, see the appropriate chapter.

3.5 Proportional Hazards in Discrete Time

In discrete time, the hazard function is, as we saw earlier, a set of conditional probabilities, and so its range is restricted to the interval \((0, 1)\). Therefore, the definition of proportional hazards used for continuous time is unpractical; the multiplication of a probability by a constant may result in a quantity larger than one.

One way of motivating proportional hazards in discrete time is to regard the discreteness as a result of grouping true continuous time data, for which the proportional hazards assumption hold. For instance, in a follow-up study of human mortality, we may only have data recorded once a year, and so life length can only be measured in years. Thus, we assume that there is a true exact life length \(T\), but we can only observe that it falls in an interval \((t_i, t_{i+1})\).

Assume continuous proportional hazards, and a partition of time:

\[ 0 = t_0 < t_1 < t_2 < \cdots < t_k = \infty. \] Then \[\begin{multline} P(t_{j-1} \le T < t_j \mid T \ge t_{j-1}; \; \mathbf{x}) = \frac{S(t_{j-1}\mid \mathbf{x}) - S(t_j\mid \mathbf{x})}{S(t_{j-1} \mid \mathbf{x})} \\ = 1 - \frac{S(t_j \mid \mathbf{x})}{S(t_{j-1} \mid \mathbf{x})} = 1 - \left(\frac{S_0(t_j)}{S_0(t_{j-1})}\right)^{\exp(\boldsymbol{\beta}\mathbf{x})} \\ = 1 - (1 - h_j)^{\exp(\boldsymbol{\beta}\mathbf{x})} \tag{3.9} \end{multline}\] with \(h_j = P(t_{j-1} \le T < t_j \mid T \ge t_{j-1}; \; \mathbf{x} = \mathbf{0})\), \(j = 1, \ldots, k\). We take equation (3.9) as the definition of proportional hazards in discrete time.

3.5.1 Logistic regression

It turns out that a proportional hazards model in discrete time, according to definition (3.9), is nothing else than a logistic regression model with the cloglog link (cloglog is short for “complementary log-log” or \(\boldsymbol{\beta}\mathbf{x} =\log(-\log(p))\)). In order to see this, let

\[\begin{equation} (1 - h_j) = \exp(-\exp(\alpha_j)), \; j = 1, \ldots, k, \end{equation}\] and

\[\begin{equation*} X_j = \left\{ \begin{array}{ll} 1, & t_{j-1} \le T < t_j \\ 0, & \text{otherwise}%t_1 \le T < t_2 \\ \end{array} \right., \quad j = 1, \ldots, k. \end{equation*}\] Then

\[\begin{equation*} \begin{split} P(X_1 = 1; \; \mathbf{x}) &= 1 - \exp(-\exp\big(\alpha_1 + \boldsymbol{\beta}\mathbf{x})\big) \\ P(X_j = 1 \mid X_1 = \cdots = X_{j-1} = 0; \; \mathbf{x}) &= 1 - \exp(-\exp\big(\alpha_j + \boldsymbol{\beta}\mathbf{x})\big), \\ \quad j = 2, \ldots, k. \end{split} \end{equation*}\] This is logistic regression with a cloglog link. Note that extra parameters \(\alpha_1, \ldots, \alpha_k\) are introduced, one for each potential event time (interval). They correspond to the baseline hazard function in continuous time, and are be estimated simultaneously with the regression parameters.

3.6 Doing It in R

We utilize the child mortality data, Skellefteå 1850–1884, to illustrate some aspects of an ordinary Cox regression. Newborn in the parish are sampled between 1820 and 1840 and followed to death or reaching the age of 15, when they are right censored. So in effect child mortality in the ages 0–15 is studied. The data set is named child and available in eha.

There are two R functions that are handy for a quick look at a data frame, str and head. The function head prints the first few lines (observations) of a data frame (there is also a corresponding tail function that prints a few of the last rows).

library(eha) #Loads also the data frame 'mort'.
ch <- child[, c("birthdate", "sex", "socBranch", 
                "enter", "exit", "event")] # Select columns.
head(ch)

     birthdate    sex socBranch enter   exit event
3   1853-05-23   male   farming     0 15.000     0
42  1853-07-19   male   farming     0 15.000     0
47  1861-11-17   male    worker     0 15.000     0
54  1872-11-16   male   farming     0 15.000     0
78  1855-07-19 female    worker     0  0.559     1
102 1855-09-29   male   farming     0  0.315     1

The function str gives a summary description of the structure of an R object, often a data frame

str(ch)

'data.frame':   26574 obs. of  6 variables:
 $ birthdate: Date, format: "1853-05-23" ...
 $ sex      : Factor w/ 2 levels "male","female": 1 1 1 1 2 1 1 1 2 2 ...
 $ socBranch: Factor w/ 4 levels "official","farming",..: 2 2 4 2 4 2 2 2 2 2 ...
 $ enter    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ exit     : num  15 15 15 15 0.559 0.315 15 15 15 15 ...
 $ event    : num  0 0 0 0 1 1 0 0 0 0 ...

First, there is information about the data frame: It is a data.frame, with six variables measured on 26574 objects. Then each variable is individually described: name, type, and a few of the first values. The values are usually rounded to a few digits. The Factor lines are worth noticing: They describe of course factor covariates and the levels. Internally, the levels are coded 1, 2, …, respectively.

Also note the variable birthdate: It is of the Date type, and it has some quirks when used in regression models.

res <- coxreg(Surv(exit, event) ~ sex + socBranch + birthdate, 
              data = ch)
print(summary(res), digits = 4)

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
sex                                                         0.0019 
            male      0.510     0         1 (reference)
          female      0.490    -0.083     0.920     0.027
socBranch                                                   0.0001 
        official      0.021     0         1 (reference)
         farming      0.710    -0.017     0.983     0.092
        business      0.011     0.330     1.391     0.141
          worker      0.258     0.099     1.104     0.094
birthdate        1869-07-13    -0.000     1.000     0.000   0.0000 

Events                    5616 
Total time at risk        325030 
Max. log. likelihood      -56481 
LR test statistic         67.10 
Degrees of freedom        5 
Overall p-value           4.11227e-13

Note that the coefficient for birthdate is equal to 0.0000 (with four decimals), yet it is significantly different from zero. The explanation is that the time unit behind Dates is day, and it would be more reasonable to use year as time unit. It can be accomplished by creating a new covariate, call it cohort, which is birth year. It is done with the aid of the functions toTime (in eha) and floor:

ch$cohort <- floor(toTime(ch$birthdate))
fit <- coxreg(Surv(exit, event) ~ sex + socBranch + cohort, 
              data = ch)
print(summary(fit), digits = 4, short = TRUE)

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
sex                                                         0.0018 
            male      0.510     0         1 (reference)
          female      0.490    -0.083     0.920     0.027
socBranch                                                   0.0001 
        official      0.021     0         1 (reference)
         farming      0.710    -0.017     0.984     0.092
        business      0.011     0.330     1.390     0.141
          worker      0.258     0.099     1.104     0.094
cohort             1869.035    -0.008     0.992     0.001   0.0000

Note the argument short = TRUE in the print statement. It suppresses the general statistics that were printed in the previous output below the table of coefficients.

3.6.1 The estimated baseline cumulative hazard function

The estimated baseline cumulative function (see equation (3.7)) is preferably reported by a graph using the plot command, see Figure 3.8.

FIGURE 3.8: Estimated cumulative baseline hazard function.

The figure shows the baseline cumulative hazard function for an individual with the value zero on all covariates. What does it mean here? There are two covariates, the first is ses with reference value lower, and the second is birthdate, with reference value 0! So Figure 3.8 shows the estimated cumulative hazards function for a man born on January 1 in the year 0 (which does not exist, by the way) and who belongs to a lower social class. This is not a reasonable extrapolation, and we need to get a new reference value for birthdate. It should be within the range of the observed values, which is given by the range function:

range(ch$cohort)

[1] 1850 1884

So a reasonable value to subtract is 1860 (or any meaningful value in the range):

ch$cohort <- ch$cohort - 1860
res3 <- coxreg(Surv(exit, event) ~ sex + socBranch + cohort, 
               data = ch)
print(summary(res3), digits = 4, short = TRUE)

Covariate             Mean       Coef     Rel.Risk   S.E.    LR p
sex                                                         0.0018 
            male      0.510     0         1 (reference)
          female      0.490    -0.083     0.920     0.027
socBranch                                                   0.0001 
        official      0.021     0         1 (reference)
         farming      0.710    -0.017     0.984     0.092
        business      0.011     0.330     1.390     0.141
          worker      0.258     0.099     1.104     0.094
cohort                9.035    -0.008     0.992     0.001   0.0000

As you can see, nothing changes regarding the parameter estimates (because no interactions are present). What about the baseline cumulative hazards graph?

FIGURE 3.9: Estimated cumulative baseline hazard function, centered birthdate at 1810.

The two Figures 3.8 and 3.9 are identical, except for the numbers on the \(y\) axis. This would not hold for the corresponding plots of the baseline survival function (try it!).

References

Cox, D. R. 1972. “Regression Models and Life Tables.” Journal of the Royal Statistical Society Series B (with Discussion) 34: 187–220.

———. 1975. “Partial Likelihood.” Biometrika 62: 269–76.