After building the model, it is also important to define which metrics would be more suitable for the model. For simple linear regression where we have just one dependent and one independent variable, finding a correlation between them can do the job in finding out how much accuracy factor the model can provide. But the same is not the case with multiple linear regression. For multiple linear regression, one should go for the F1 score because using correlation simply can not justify the accuracy of the model.
The following are the three metrics that one can use to find the accuracy of a KNN (K-Nearest Neighbors) model.
- Jaccard Index
- F1 - Score
- Log Loss
Let us discuss each of them one by one in detail.
Jaccard Index
Jaccard Index or also known as Jaccard similarity coefficient. If y is the actual label and ŷ is the predicted value then we can define Jaccard index as the size of the intersection by the size of the union of two labeled sets.
Consider if you have a total of 50 observations, out of which your model predicts 41 of them correctly, then the Jaccard index is given by -
The jaccard index of 0.69 defines that the model predicts on the test set with an accuracy of 69%. So a jaccard index ranges from 0 to 1 where an index value of 1 implies maximum accuracy.
F1 - Score
F1-Score is also known as F-Measure or F-score. F1-score is the harmonic average value of precision and recall. It is a good way to show the model that the model has good value for recall and precision. Since this metric makes use of the harmonic mean it takes care of extreme values (since Arithmetic mean performs poor for outliers).
Precision
Out of all the positive classes we have predicted correctly, how many are actually positive.
Recall
Out of all the positive classes, how much we predicted correctly.
Log Loss
Logg loss or logarithmic loss measures the performance of a classifier where predicted output is a probability value between 0 and 1.
In the above example, the model predicted a probability of 0.21 where the actual label is 1. This is a poor prediction and will result in a higher log loss.
We can calculate the log loss using the log loss equation which measures how far each prediction is from the actual label. Then we calculate average log loss across each row of the test set.
The value of log loss ranges from 0 to 1. It is obvious that most ideal classifiers have a lower value of log loss. So the classifier with lower log loss has better accuracy.
Note that, the Jaccard index and the F1-score metrics can also be used for multi-class classifiers.
Good
ReplyDelete