4.1 The Poisson Distribution
The Poisson distribution is used for count data, i.e., when the result may be any positive integer 0, 1, 2, …, without upper limit. The probability density function (pdf) \(P\) of a random variable \(X\) following a Poisson distribution is
\[\begin{equation}\label{eq:poisdist} P(X = k) = \frac{\lambda^k}{k!}\exp(-\lambda), \quad \lambda > 0; \; k = 0, 1, 2, \ldots, \end{equation}\]
The parameter \(\lambda\) is both the mean and the variance of the distribution. In Figure 4.1 the pdf is plotted for some values of \(\lambda\).
FIGURE 4.1: The Poisson cdf for different values of the mean.
Note that when \(\lambda\) increases, the distribution looks more and more like a normal distribution.
In R, the Poisson distribution is represented by four functions, dpois
ppois, qpois, and rpois, representing the
probability density function (pdf), the cumulative distribution function
(cdf), the quantile function (the inverse of the cdf), and random number
generation,
respectively. See the help page for the Poisson distribution for more
detail. In fact, this is the scheme present for all probability
distributions available in R.
For example, the upper left bar plot in Figure 4.1 is produced in R by
Note that the function
dpois is vectorizing:
## [1] 0.6065306597 0.3032653299 0.0758163325 0.0126360554 0.0015795069
## [6] 0.0001579507
As an example where the Poisson distribution may be relevant, we look at
the number of children born to a woman after marriage. The data frame
fert in eha can be used to calculate the number of births per married
woman i Skellefteå during the 19th century; however, this data set
contains only marriages with one or more births. Let us instead count the
number of births beyond one.
## Loading required package: survival
The result is shown in Figure 4.2.
FIGURE 4.2: Number of children beyond one for married women with at lest one child.
The question is: Does this look like a Poisson distribution? One way of checking this is to plot the theoretic distribution with the same mean (the parameter \(\lambda\)) as the sample mean in the data.
The result is shown in Figure 4.3.
FIGURE 4.3: Theoretical Poisson distribution.
Obviously, the fertility data do not follow the Poisson distribution so well. It is in fact over-dispersed compared to the Poisson distribution. A simple way to check that is to calculate the sample mean and variance of the data. If data come from a Poisson distribution, these numbers should be equal (theoretically) or reasonably close.
## [1] 4.548682
## [1] 8.586838
They are not very close, which also is obvious from comparing the graphs. \(\Box\)