Demystifying the Poisson Distribution

19 Aug 2015 . Machine Learning . Comments

I could never wrap my head around the Poisson distribution, and I think its mainly because I was always presented with scary looking formula and how to use it. But how we got that formula, what are the assumptions of using it, it never registered in my memory. Yet, I encounter it very often in problems, that needs modelling a rate of some process. In this post, I will try and demystify the poisson distribution by giving example, relating it with binomial distribution and giving a formal proof. Let’s get started.

Prerequisite - the Binomial Distribution

This distribution deals with $n$ number of trials with binary outcome. The probablity of success (one of the binary outcome) in each trial is given by $p$.

If we define a random variable $X=\textrm{number of successes in }n\textrm{ trials}$. Then for binomial distibution, $P(X=k) = \binom{n}{k} p^k(1-p)^{n-k}$. It is easy to prove that the expectation of $X$ is given by

Poisson Distibution

Suppose there is a process in which some event occurs repeatedly and we find that empirically average empiral rate of occurence of this event is $\lambda \textrm{ events per unit time duration}$. Poisson distribution is used to answer the question “What is the probablity of $k$ events happening in a unit time duration given the averate rate $\lambda$ per unit time duration?”.

This cool distribution and its derivation is best explained using an example. Suppose you are the manager of a particular bank branch. Following many complaints from customers over long ques, you decide to add more serving counters in the branch. You somehow figured out that on an average, $\lambda$ people enter the bank per minute. To find the optimum number of counters, you wish to know what is the probablity of $k$ customers entering the bank branch at any minute. Suppose that this rate was uniform all through out the minute irrespective the time of day, day of week, etc. In this case, you could model this using the binomial distibution. Let $X$ denote the random variable such that

We are given the $E[X] = \lambda$ per minute. We know that for binomial distribution, $\lambda = np$. We can write $\lambda \text{ per minute} = 60 \text{ seconds per hour} \times \frac{\lambda}{60} \text{ per second}$. This means, that we have broken a minute into 60 seconds where each second is like an experiment with probablity of success being $\frac{\lambda}{60}$. In this case,

But in above case, what if more than one people pass in any second. Obvious solution is to start doing experiment every millisecond instead of second. In that case,

However, one could still ask what if more than 1 person enter the bank. I know in this example, it is infeasible but concenptually it is possible. Basically, we wish to increase the number of experiments each minute go to as high as possible. This is exactly what Poisson distibution is. Poisson distibution is Binomial distribution in the limit $n \to \infty$. That is

Low and behold. We have just derived the poisson distribution where we have used that result that $\lim_{x\to\infty}\left(1+\frac{1}{x}\right)^x = e$. Thus, $\lim_{x\to\infty}\left(1+\frac{a}{x}\right)^x = e^a$