4.1 The Poisson Distribution

The Poisson distribution is used for count data, i.e., when the result may be any positive integer 0, 1, 2, …, without upper limit. The probability density function (pdf) \(P\) of a random variable \(X\) following a Poisson distribution is

\[\begin{equation}\label{eq:poisdist} P(X = k) = \frac{\lambda^k}{k!}\exp(-\lambda), \quad \lambda > 0; \; k = 0, 1, 2, \ldots, \end{equation}\]

The parameter \(\lambda\) is both the mean and the variance of the distribution. In Figure 4.1 the pdf is plotted for some values of \(\lambda\).

The Poisson cdf for different values of the mean.

FIGURE 4.1: The Poisson cdf for different values of the mean.

Note that when \(\lambda\) increases, the distribution looks more and more like a normal distribution.

In R, the Poisson distribution is represented by four functions, dpois ppois, qpois, and rpois, representing the probability density function (pdf), the cumulative distribution function (cdf), the quantile function (the inverse of the cdf), and random number generation, respectively. See the help page for the Poisson distribution for more detail. In fact, this is the scheme present for all probability distributions available in R.

For example, the upper left bar plot in Figure 4.1 is produced in R by

barplot(dpois(0:5, lambda = 0.5), axes = FALSE, 
        main = expression(paste(lambda, " = ", 0.5)))

Note that the function dpois is vectorizing:

dpois(0:5, lambda = 0.5)
## [1] 0.6065306597 0.3032653299 0.0758163325 0.0126360554 0.0015795069
## [6] 0.0001579507
Example 4.1 (Marital fertility)

As an example where the Poisson distribution may be relevant, we look at the number of children born to a woman after marriage. The data frame fert in eha can be used to calculate the number of births per married woman i Skellefteå during the 19th century; however, this data set contains only marriages with one or more births. Let us instead count the number of births beyond one.

library(eha)
## Loading required package: survival
f0 <- fert[fert$event == 1, ]
kids <- tapply(f0$id, f0$id, length) - 1
barplot(table(kids))

The result is shown in Figure 4.2.

Number of children beyond one for married women with at lest one child.

FIGURE 4.2: Number of children beyond one for married women with at lest one child.

The question is: Does this look like a Poisson distribution? One way of checking this is to plot the theoretic distribution with the same mean (the parameter \(\lambda\)) as the sample mean in the data.

lam <- mean(kids)
barplot(dpois(0:12, lambda = lam))

The result is shown in Figure 4.3.

Theoretical Poisson distribution.

FIGURE 4.3: Theoretical Poisson distribution.

Obviously, the fertility data do not follow the Poisson distribution so well. It is in fact over-dispersed compared to the Poisson distribution. A simple way to check that is to calculate the sample mean and variance of the data. If data come from a Poisson distribution, these numbers should be equal (theoretically) or reasonably close.

mean(kids)
## [1] 4.548682
var(kids)
## [1] 8.586838

They are not very close, which also is obvious from comparing the graphs. \(\Box\)