3.4 Estimation of the Baseline Hazard

The usual estimator (continuous time) of the baseline cumulative hazard function is

\[\begin{equation} \hat{H}_0(t) = \sum_{j:t_j \le t} \frac{d_j}{\sum_{m \in R_j} e^{\mathbf{x}_m \hat{\boldsymbol{\beta}}}}, \tag{3.7} \end{equation}\]

where \(d_j\) is the number of events at \(t_j\). Note that if \(\hat{\boldsymbol{\beta}} = 0\), this reduces to

\[\begin{equation} \hat{H}_0(t) = \sum_{j:t_j \le t} \frac{d_j}{n_j}, \tag{3.8} \end{equation}\]

the Nelson-Aalen estimator. In (3.8), \(n_j\) is the size of \(R_j\).

In the R package eha, the baseline hazard is estimated at the means of the covariates (or, more precisely, at the means of the columns of the design matrix; this makes a difference for factors).

\[\begin{equation} \hat{H}_0(t) = \sum_{j:t_j \le t} \frac{d_j}{\sum_{m \in R_j} e^{(\mathbf{x}_m - \bar{\mbox{$\mathbf{x}$}})\hat{\boldsymbol{\beta}}}}, \tag{3.9} \end{equation}\]

In order to calculate the cumulative hazards function for an individual with a specific covariate vector \(\mathbf{x}\), use the formula

\[\begin{equation*} \hat{H}(t; \mathbf{x}) = \hat{H}_0(t) e^{(\mathbf{x} - \bar{\mbox{$\mathbf{x}$}})\hat{\boldsymbol{\beta}}}. \end{equation*}\]

The corresponding survival functions may be estimated by the relation

\[\begin{equation*} \hat{S}(t; \mathbf{x}) = \exp\bigl(-\hat{H}(t; \mathbf{x})\bigr) \end{equation*}\]

It is also possible to use the terms in the sum (3.7) to build an estimator analogous to the Kaplan-Meier estimator (??). In practice, there is no big difference between the two methods.