3.8 Proportional hazards in discrete time
In discrete time, the hazard function is, as we saw earlier, a set of conditional probabilities, and so its range is restricted to the interval \((0, 1)\). Therefore, the definition of proportional hazards used for continuous time is unpractical; the multiplication of a probability by a constant may result in a quantity larger than one.
One way of introducing proportional hazards in discrete time is to regard the discreteness as a result of grouping true continuous time data, for which the proportional hazards assumption hold. For instance, in a follow-up study of human mortality, we may only have data recorded once a year, and so life length can only be measured in years. Thus, we assume that there is a true exact life length \(T\), but we can only observe that it falls in an interval \((t_i, t_{i+1})\).
Assume continuous proportional hazards, and a partition of time:
\[ 0 = t_0 < t_1 < t_2 < \cdots < t_k = \infty. \] Then \[\begin{multline}\label{eq:dischaz} P(t_{j-1} \le T < t_j \mid T \ge t_{j-1}; \; \mathbf{x}) = \frac{S(t_{j-1}\mid \mathbf{x}) - S(t_j\mid \mathbf{x})}{S(t_{j-1} \mid \mathbf{x})} \\ = 1 - \frac{S(t_j \mid \mathbf{x})}{S(t_{j-1} \mid \mathbf{x})} = 1 - \left(\frac{S_0(t_j)}{S_0(t_{j-1})}\right)^{\exp(\boldsymbol{\beta}\mathbf{x})} \\ = 1 - (1 - h_i)^{\exp(\boldsymbol{\beta}\mathbf{x})} \end{multline}\] with \(h_i = P(t_{j-1} \le T < t_j \mid T \ge t_{j-1}; \; \mathbf{x} = \mathbf{0})\), \(j = 1, \ldots, k\). We take as the of proportional hazards in discrete time.
3.8.1 Logistic regression
It turns out that a proportional hazards model in discrete time, according to definition , is nothing else than a logistic regression model with the cloglog link (cloglog is short for “complementary log-log” or \(\boldsymbol{\beta}\mathbf{x} =\log(-\log(p))\)). In order to see that, let
\[\begin{equation} (1 - h_j) = \exp(-\exp(\alpha_j)), \; j = 1, \ldots, k \end{equation}\] and
\[\begin{equation*} X_j = \left\{ \begin{array}{ll} 1, & t_{j-1} \le T < t_j \\ 0, & \text{otherwise}%t_1 \le T < t_2 \\ \end{array} \right., \quad j = 1, \ldots, k \end{equation*}\] Then
\[\begin{equation*} \begin{split} P(X_1 = 1; \; \mathbf{x}) &= 1 - \exp(-\exp\big(\alpha_1 + \boldsymbol{\beta}\mathbf{x})\big) \\ P(X_j = 1 \mid X_1 = \cdots = X_{j-1} = 0; \; \mathbf{x}) &= 1 - \exp(-\exp\big(\alpha_j + \boldsymbol{\beta}\mathbf{x})\big), \\ \quad j = 2, \ldots, k. \end{split} \end{equation*}\] This is logistic regression with a cloglog link. Note that extra parameters \(\alpha_1, \ldots, \alpha_k\) are introduced, one for each potential event age. They correspond to the baseline hazard function in continuous time, but are be estimated simultaneously with the regression parameters.