Probability Distributions every Data Scientist should know (2/3)


The second distribution in the series is Poisson Distribution. After Bernoulli and Binomial distributions, Poisson is the last on our list of discrete distributions.

Poisson Distribution

Context

Poisson Distribution is a limiting case of Binomial Distribution. If the number of trials ‘n’ is very large and the probability of success ‘p’ is very small so that the product n × p = λ is non-negative and finite then the RV X follows a Poisson Distribution.

Denotation

If RV X follows a Poisson distribution then –
X ~ P(λ)

Condition

  1. The number of trials (n) is indefinitely large i.e. n → ∞
  2. For each trial, the probability of success is very small i.e. p → 0
  3. λ = np is finite and non-negative i.e. λ > 0


Probability Mass Function

The probability mass function is given by –

Characteristics


You can find the mean to be equal to np which is also the value of parameter λ. In the PMF function, we can replace parameter λ with μ.
  1. In Poisson Distribution, only one parameter λ is needed to determine the probability of an event.
  2. The values of mean and variance are equal and the same as parameter λ.
  3. Poisson Distribution follows additive property which states that if X & Y are two independent Poisson Distributions with parameters λ_1 and λ_2 respectively then (X + Y) also follows the Poisson Distribution with parameter (λ_1 + λ_2)
  4. Poisson distribution can never be negatively skewed. But however, it can be symmetrical about its mean as we increase λ. By increasing the parameter λ, sample size n gets increased, and according to the central limit theorem, it converges to normal distribution.

Examples

  1. The number of phone calls received by a telephone operator in one hour.
  2. The number of spelling errors on each page of a document.
  3. The number of deaths due to cancer in a year.
Here, we come to the end of this post. We learnt about the second distribution, that is, Poisson distribution out of the three distributions that we are going to cover in this series. Note that, Poisson distribution gives the same result (up to 2~3 decimal places) when used in areas instead of the binomial distribution. But, when the number of samples i.e. n tends to infinity then in such cases we need to use Poisson distribution. Due to its simplicity, it is easy to calculate (since it consists only of one parameter, mean).

Comments