Machine Learning Classification Metrics

The ML metrics table you’ll need

I haven’t talked much about machine model classification metrics aside from the confusion matrix. Let’s do a short rehash on that and add on some more useful metrics that are derived from the confusion matrix.

Confusion Matrix

Despite its name, it’s actually pretty simple to understand. Let’s assume we have three groups, tall, medium, and short, we are trying to classify. Let’s arbitrarily label the tall group positive and any group that isn’t tall is negative. This means the medium and short groups will be labeled negative. True positives (TP) are the number of correctly classified observations e.g. the number of correctly predicted observations that are tall. True negatives (TN) are the number of correctly rejected observations, e.g. the number of correctly predicted observations that aren’t tall. False positives (FP) are the number of incorrectly classified observations e.g. the number of incorrectly predicted observations that are tall. False negatives (FN) are the number of incorrectly rejected observations, e.g. the number of incorrectly predicted observations that aren’t tall.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN

Other Metrics

Alright so we got the confusion matrix down! It looks pretty helpful but it would be even more helpful if we could further quantify our model’s performance.

Machine learning model classification metric descriptions
NameDescriptionEquation
TRNumber of Correct Positive PredictionsNA
TNNumber of Correct Negative PredictionsNA
FPNumber of Incorrect Positive PredictionsNA
FNNumber of Incorrect Negative PredictionsNA
Sensitivity (Recall)Proportion of Correct Positive Predictions$\frac {TP}{TP+FN}$
SpecificityProportion of Correct Negative Predictions$\frac {TN}{TN+FP}$
AccuracyPercent of Correctly Predicted Observations$\frac {TP + TN}{TP + TN + FP + FN}$
Balanced AccuracyUnbiased Accuracy$\frac {Sens + Spec}{2}$
Precision (PPV)Proportion of True Positives$\frac {TP}{TP+FP}$
Negative Predictive Value (NPV)Proportion of True Negatives$\frac {TN}{TN+FN}$
F1Harmonic Mean of Sensitivity and PPV$\frac {2 * PPV * Sens}{PPV + Sens}$

Sensitivity (Recall)

Sensitivity tells us how well our model did at predicting the number of observations who were actually tall. High sensitivity means our model is good finding observations that are tall and doesn’t have many false negatives.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$Sensitivity = \frac {TP}{TP + FN}$$

Specificity

Specificity tells us how well our model did at predicting the number of observations that weren’t tall. High specificity means our model is good finding observations that aren’t tall with few false positives.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$Specificity = \frac {TN}{TN + FP}$$

Accuracy

Accuracy is the number of correct predictions (TP & TN) divided by all the predictions made by the model. This will give a percentage out of 100. I generally don’t use accuracy because it becomes extremely biased when the groups you are trying to predict are equal. I.e. if tall has 60 observations, medium has 30 observations, and small has 10 observations, accuracy will be reliable in cases like this. This is a problem I run into a lot with public clinical neuroimaging datasets.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$Accuracy = \frac {TP+ TN}{TP + TN + FP + FN}$$

Balanced Accuracy

Balanced Accuracy is not biased by unequal groups like accuracy is. It does this by taking the average of specificity and sensitivity.$$BalancedAccuracy = \frac {Specificity + Sensitivity}{2}$$

Precision (PPV)

Precision tells us how well model did at predicting true observations e.g. how many tall observations are there actually. High precision means our model is good at finding tall observations and doesn’t have many false positives.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$Precision = \frac {TP}{TP + FP}$$

Negative Predictive Value (NPV)

NPV tells us how well model did at predicting false observations e.g. how many non-tall observations are there actually. High NPV means our model is good finding observations that aren’t tall and doesn’t have many false negatives

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$NPV = \frac {TN}{TN + FN}$$

F1

A high F1 will mean your model is good at identifying tall observations while not having many false positive or false negatives.

Confusion Matrix
Actual PositiveActual Negative
Predicted PositiveTPFP
Predicted NegativeFNTN
$$F1 = \frac {2 * PPV * Sensitivity}{PPV + Sensitivity}$$

What Metric Matters?

What metrics you use to tell how good your model is, is dependent on the problem you’re trying to solve. For the example in this post, trying to identify tall observations, precision and recall would be good to use, or the combination of both, the F1 metric. Which in my opinion, in most cases, is the most useful metric. High precision would mean our model is good at identifying tall observations and isn’t incorrectly identifying non-tall observations as tall. High recall would mean our model is good at identifying tall observations, without incorrectly identifying tall observations. So a high F1 will mean your model is good at identifying tall observations while not having many false positive or false negatives. Again, this is very dependent on the problem you’re solving and low metrics in some of measures are acceptable in different contexts.

Avatar
Mohan Gupta
Psychology PhD Student

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn in both motor and declarative learning .

Related