A.3 Model selection

In regression models, there is often several competing models for describing data. In general, there are no strict rules for ``correct selection’’. However, for nested models, there are some formal guidelines. For a precise definition of this concept, see Appendix A.

A.3.1 Comparing nested models

The meaning of nesting of models is best described by an example.

Example A.1 (Two competing models)
  1. \({\cal M}_2:\; h(t; (x_1, x_2)) = h_0(t) \exp(\beta_1 x_1 + \beta_2 x_2)\)
  2. \({\cal M}_1:\; h(t; (x_1, x_2)) = h_0(t) \exp(\beta_1 x_1)\): \(x_2\) has no effect.

Thus, the model \({\cal M}_1\) is a special case of \({\cal M}_2\) (\(\beta_2 = 0\)). We say that \({\cal M}_1\) is nested in \({\cal M}_2\). Now, assume that \({\cal M}_2\) is true. Then, testing the hypothesis \(H_0: \; {\cal M}_1\) is true (as well) is the same as testing the hypothesis \(H_0;\; \beta_2 = 0\).

The formal theory for and procedure for performing the likelihood ratio test (LRT) can be summarized as follows:

  1. Maximize \(\log L(\beta_1, \beta_2)\) under \({\cal M}_2\); gives \(\log L(\hat{\beta}_1, \hat{\beta}_2)\).

  2. Maximize \(\log L(\beta_1, \beta_2)\) under \({\cal M}_1\), that is, maximize \(\log L(\beta_1, 0)\); gives \(\log L(\beta_1^*, 0)\).

  3. Calculate the test statistic \[\begin{equation*} T = 2\big(\log L(\hat{\beta}_1, \hat{\beta}_2) - \log L(\beta_1^*, 0)\big) \end{equation*}\]

  4. Under \(H_0\), \(T\) has a \(\chi^2\) (chi-square) distribution with \(d\) degrees of freedom: \(T \sim \chi^2(d)\), where \(d\) is the difference in numbers of parameters in the two competing models, in this case \(2-1=1\).

  5. Reject \(H_0\) if \(T\) is large enough. Exactly how much that is depends on the level of significance; if it is \(\alpha\), choose the limit \(t_d\) equal to the \(100 (1 - \alpha)\) percentile of the \(\chi^2(d)\) distribution.

This result is a large sample approximation.

The Wald test is theoretically performed as follows:

  1. Maximize \(\log L(\beta_1, \beta_2)\) under \({\cal M}_2\); this gives \(\log L(\hat{\beta}_1, \hat{\beta}_2)\), and \(\hat{\beta}_2\), se(\(\hat{\beta}_2\)).

  2. Calculate the test statistic \[\begin{equation*} T_W = \frac{\hat{\beta}_2}{\mbox{se}(\hat{\beta}_2)} \end{equation*}\]

  3. Under \(H_0\), \(T_W\) has a standard normal distribution: \(T_W \sim N(0, 1)\).

    Reject \(H_0\) if the absolute value of \(T_W\) is larger than 1.96 on a significance level of 5%.

This is a large sample approximation, with the advantage that it is automatically available in all software. In comparison to the LRT, one model less has to be fitted. This saves time and efforts, unfortunately on the expense of accuracy, because it may occasionally give nonsensic results. This phenomenon is known as the Hauck-Donner effect (Hauck and Donner 1977). \(\Box\)

A.3.2 Comparing non-nested models

Non-nested models cannot be compared by a likelihood ratio test, but there are a couple of alternatives that are based on comparing maximized likelihood values modified with consideration of the number of parameters that needs to be estimated. One such alternative is the Akaike Information Criterion (AIC), see Leeuw (1992).

References

Hauck, W. W., and A. Donner. 1977. “Wald’s Test as Applied to Hyptheses in Logit Analysis.” Journal of the American Statistical Association 72: 851–53.

Leeuw, J. de. 1992. “Introduction to Akaike (1973) Information Theory and an Extension of the Maximum Likelihood Principle.” In Breakthroughs in Statistics I, edited by S. Kotz and N. L. Johnson, 599–609. Springer.