A.1 Statistical inference

Statistical inference is the science that help us draw conclusions about real world phenomena by observing and analyzing samples from them. The theory rests on probability theory and the concept of random sampling. The statistical analysis never gives absolute truths, but only statements coupled to certain measures of their validity. These measures are almost always probability statements.

The crucial concept is that of a model, despite the fact that the present trend in statistical inference is towards nonparametric statistics. It is often stated that with today’s huge data taster’s, statistical models are unnecessary, but nothing could be more wrong.

The important idea in a statistical model is the concept of a parameter. It is often confused with its estimator from data. For instance, when we talk about mortality in a population, it is a hypothetical concept that is different from the ratio between the observed number of deaths and the population size (or any other measure based on data). The latter is an estimate (at best) of the former. The whole idea about statistical inference is to extract information about a population parameter from observing data.

A.1.1 Point estimation

The case in *point estimation is to find the best guess (in some sense) of a population parameter from data. That is, we try to find the best single value that is closest to the true, but unknown, value of the population parameter.

Of course, a point estimator is useless if it is not connected to some measure of its uncertainty. That takes us to the concept of *interval estimation.

A.1.2 Interval estimation

The philosophy behind interval estimation is that a guess on a single value of the unknown population parameter is useless without an accompanying measure of the uncertainty of that guess. A confidence interval is an interval, in which we say that the true value of the population parameter lies with a certain probability (often 95 per cent).

A.1.3 Hypothesis testing

We are often interested in a specific value of a parameter, and in regression problems this value is almost always zero (0). The reason is that regression parameters measure effects, and to test for an effect is then equivalent to testing that the corresponding parameter has value zero.

There is a link between interval estimation and hypothesis testing: To test the hypothesis that a parameter value is zero can be done through constructing a confidence interval for the parameter. The test rule is then: If the interval does not cover zero, reject the hypothesis, otherwise do not.

A.1.3.1 The log-rank test

The general hypothesis testing theory behind the log-rank test builds on the hyper-geometric distribution. The calculations under the null hypothesis of no difference in survival chances between the two groups are performed conditional on both margins. In Table A.1, if
the margins are fixed there is only one degree of freedom left; for a given value of (say) \(d_1\), the three values \(d_2\), \((n_1 - d_1)\), and \((n_2 - d_2)\) are determined.

TABLE A.1: General table in a log rank test.
Group	Deaths	Survivors	Total
I	d1	n1 - d1	n1
II	d2	n2 - d2	n2
Total	d	n - d	n

Utilizing the fact that, under the null, \(d_1\) is hyper-geometrically distributed, results in the following algorithm for calculating a test statistic as follows:

Observe \(O = d_1\)
Calculate the expected value \(E\) of \(O\) (under the null): \[ E = d \frac{n_1}{n}. \]
Calculate the variance \(V\) of \(O\) (under the null): \[V = \frac{(n - d) d n_1 n_2}{n^2 (n - 1)}.\]
Repeat 1. - 3. for all tables and aggregate according to equation (A.1).

The log rank test statistic \(T\) is

\[\begin{equation} T = \frac{\sum_{i=1}^k \left(O_i - E_i\right)}{\sqrt{\sum_{i=1}^k V_i}} \tag{A.1} \end{equation}\]

Note carefully that this procedure is not equivalent to aggregating all tables of raw data!

Properties of the log rank test;

The test statistic \(T^2\) is approximately distributed as \(\chi^2(1)\).
It is available in most statistical software.
It can be generalized to comparisons of more than two groups.
For \(s\) groups, the test statistic is approximately \(\chi^2(s-1)\).
The test has high power against alternatives with proportional hazards, but can be weak against non-pro-por-tional alternatives.