What is Confusion Matrix in Machine Learning ? What is Type 1 and Type 2 Error ?

 


Why do we need the confusion matrix? Well, if you don't know then let me put it plainly for now. It is one of the techniques about how you can measure the performance of your model. By creating a confusion matrix, you can calculate recall, precision, f-measure, and accuracy as well. It is really simple to create a confusion matrix once you are done with your predictions.


What is a confusion matrix?

A confusion matrix is one of the evaluation techniques for machine learning models in which you compare the results of all the predicted and actual values.

Let us consider a binary target variable consisting of 0s and 1s, where 1 resembles Positive or True case scenarios and 0 resembles negative or False case scenarios.

In the above example --

  1. There are 5 instances when the actual value (y) is 1 and the predicted value (ŷ) is also 1. This is called a True Positive case where True means that values are the same (1 & 1) and Positive means that it is a true scenario.

  2. There are 4 instances when the actual value (y) is 0 and the predicted value (ŷ) is also 0. This is called a True Negative case where True means that values are the same (0 & 0) and Negative means that it is a negative scenario.

  3. There are 3 instances when the actual value (y) is 0 and the predicted value (ŷ) is 1. This is called a False Positive case where False means that the values are different (0 & 1) and Positive means that the predicted value is positive or 1.

  4. There are 2 instances when the actual value (y) is 1 and the predicted value (ŷ) is 0. This is called a False Negative case where False means that the values are different (1 & 0) and Negative means that the predicted value is negative or 0.
In the example matrix, the values in green are correctly identified by the model and the values in red are wrongly identified by the model.


Type 1 Error and Type 2 Error

  • Type 1 Error arises when the predicted value is positive while it is actually negative (False Positive).
    eg. If your device predicts that it will rain today but in reality it did not rain today.
  • Type 2 Error arises when the predicted value is negative while it is actually positive (False Negative).
    eg. If your device predicts that it will not rain today but in reality id did rain today.
Note - In the above two examples for type 1 and type 2 error, we have considered raining to be a positive case.

Complete summary of the confusion matrix


In the below example for covid19 test, the four cases can summarize in the following way --

-- True Positive --

The covid test is positive and the patient is suffering from covid19.

-- True Negative --

The covid test is negative and the patient is not suffering from covid19.

-- False Positive --

The covid test is positive but the patient is not suffering from covid19.

-- False Negative --

The covid test is negative but the patient is suffering from covid19.

From the above four cases, the fourth case i.e. False Negative (Type 2 error) is dangerous as it can cause the life of the patient due to error in the test. So, generally, False Negative cases are considered to be more dangerous than False Positive but in few applications like software testing, False Positive cases (Type 2 error) are tried to be minimized.

In the next post, we will cover how we can calculate the accuracy, precision, recall, f-measure from a confusion matrix, and what does this value signifies.

Follow at

Comments