Probability Distributions every Data Scientist should know (3/3)



In our last two blogs, we covered Binomial and Poisson as part of our discrete probability distributions. In this blog which happens to be the third and final distribution that every Data Scientist must know, we want to introduce Normal Distributions. This is the most important distribution of all and used too frequently. It is a continuous distribution unlike the previous two which were of discrete category and it is also symmetric in nature. So let's dig deep and find out what this distribution is all about.

Normal Distribution

Context

Normal Distribution is a continuous distribution with symmetric about its mean. It is also referred to as the Gaussian distribution or the normal curve of error. Most of the discrete distributions such as binomial, Poisson, etc. tend to a normal distribution as n increases i.e. n → ∞

Denotation

The mean μ and standard deviation σ are called the parameters of Normal distribution. The normal distribution is expressed by –
X  ~  N( μ, σ^2 )

The Normal Curve



  1. The distribution is continuous in nature.
  2. The distribution is bell-shaped, symmetric about the mean.
  3. The distribution is asymptotic to x-axis which means it does not touch the x-axis and it goes forever in either direction.
  4. The distribution is unimodal in nature that it has a single peak value (maximum) representing the mean.
  5. Total area under the distribution curve is 1.
  6. The first and the third quartile are equidistant from the mean.

Conditions

  1. For a Binomial distribution, if the number of trials, n → ∞ , and neither p nor q is very small then the distribution tends to normal distribution.
  2. For a Poisson distribution, if the parameter λ → ∞ then the distribution tends to normal distribution.

Probability Mass Function

A continuous random variable X is said to follow a normal distribution with mean μ and standard deviation σ if its probability density function is –


Statistical Characteristics


The maximum probability of the normal curve is equal to –
As standard deviation σ increases, the curve becomes more and more flat and vice-versa.


Examples

  1. Human characteristics such as height, weight, etc. follow a normal distribution.
  2. House prices in a particular area.
  3. Test marks of students in a classroom.

Note: Standard Normal Distribution is a normal distribution with a mean value equal to 0 and a standard deviation equal to one. We would like to cover it in detail in another post.

We have completed all the three major distributions which can help you get started in your journey. However, there are several other distributions that one should try to explore which can help you understand if your data do follow any of the distributions, it will help you build a better model.

Comments