Machine Learning Classification Metrics

The ML metrics table you’ll need

Sep 10, 2019 5 min read Statistics, Machine Learning

I haven’t talked much about machine model classification metrics aside from the confusion matrix. Let’s do a short rehash on that and add on some more useful metrics that are derived from the confusion matrix.

Confusion Matrix

Despite its name, it’s actually pretty simple to understand. Let’s assume we have three groups, tall, medium, and short, we are trying to classify. Let’s arbitrarily label the tall group positive and any group that isn’t tall is negative. This means the medium and short groups will be labeled negative. True positives (TP) are the number of correctly classified observations e.g. the number of correctly predicted observations that are tall. True negatives (TN) are the number of correctly rejected observations, e.g. the number of correctly predicted observations that aren’t tall. False positives (FP) are the number of incorrectly classified observations e.g. the number of incorrectly predicted observations that are tall. False negatives (FN) are the number of incorrectly rejected observations, e.g. the number of incorrectly predicted observations that aren’t tall.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

Other Metrics

Alright so we got the confusion matrix down! It looks pretty helpful but it would be even more helpful if we could further quantify our model’s performance.

Machine learning model classification metric descriptions
Name	Description	Equation
TR	Number of Correct Positive Predictions	NA
TN	Number of Correct Negative Predictions	NA
FP	Number of Incorrect Positive Predictions	NA
FN	Number of Incorrect Negative Predictions	NA
Sensitivity (Recall)	Proportion of Correct Positive Predictions	$\frac {TP}{TP+FN}$
Specificity	Proportion of Correct Negative Predictions	$\frac {TN}{TN+FP}$
Accuracy	Percent of Correctly Predicted Observations	$\frac {TP + TN}{TP + TN + FP + FN}$
Balanced Accuracy	Unbiased Accuracy	$\frac {Sens + Spec}{2}$
Precision (PPV)	Proportion of True Positives	$\frac {TP}{TP+FP}$
Negative Predictive Value (NPV)	Proportion of True Negatives	$\frac {TN}{TN+FN}$
F1	Harmonic Mean of Sensitivity and PPV	$\frac {2 * PPV * Sens}{PPV + Sens}$

Sensitivity (Recall)

Sensitivity tells us how well our model did at predicting the number of observations who were actually tall. High sensitivity means our model is good finding observations that are tall and doesn’t have many false negatives.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$Sensitivity = \frac {TP}{TP + FN}$$

Specificity

Specificity tells us how well our model did at predicting the number of observations that weren’t tall. High specificity means our model is good finding observations that aren’t tall with few false positives.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$Specificity = \frac {TN}{TN + FP}$$

Accuracy

Accuracy is the number of correct predictions (TP & TN) divided by all the predictions made by the model. This will give a percentage out of 100. I generally don’t use accuracy because it becomes extremely biased when the groups you are trying to predict are equal. I.e. if tall has 60 observations, medium has 30 observations, and small has 10 observations, accuracy will be reliable in cases like this. This is a problem I run into a lot with public clinical neuroimaging datasets.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$Accuracy = \frac {TP+ TN}{TP + TN + FP + FN}$$

Balanced Accuracy

Balanced Accuracy is not biased by unequal groups like accuracy is. It does this by taking the average of specificity and sensitivity.$$BalancedAccuracy = \frac {Specificity + Sensitivity}{2}$$

Precision (PPV)

Precision tells us how well model did at predicting true observations e.g. how many tall observations are there actually. High precision means our model is good at finding tall observations and doesn’t have many false positives.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$Precision = \frac {TP}{TP + FP}$$

Negative Predictive Value (NPV)

NPV tells us how well model did at predicting false observations e.g. how many non-tall observations are there actually. High NPV means our model is good finding observations that aren’t tall and doesn’t have many false negatives

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$NPV = \frac {TN}{TN + FN}$$

F1

A high F1 will mean your model is good at identifying tall observations while not having many false positive or false negatives.

Confusion Matrix
	Actual Positive	Actual Negative
Predicted Positive	TP	FP
Predicted Negative	FN	TN

$$F1 = \frac {2 * PPV * Sensitivity}{PPV + Sensitivity}$$

What Metric Matters?

What metrics you use to tell how good your model is, is dependent on the problem you’re trying to solve. For the example in this post, trying to identify tall observations, precision and recall would be good to use, or the combination of both, the F1 metric. Which in my opinion, in most cases, is the most useful metric. High precision would mean our model is good at identifying tall observations and isn’t incorrectly identifying non-tall observations as tall. High recall would mean our model is good at identifying tall observations, without incorrectly identifying tall observations. So a high F1 will mean your model is good at identifying tall observations while not having many false positive or false negatives. Again, this is very dependent on the problem you’re solving and low metrics in some of measures are acceptable in different contexts.

Statistics Machine Learning

Mohan Gupta

Postdoctoral Scholar

My research interests include the what are the best ways to learn, why those are the best ways, and can I build computational models to predict what people will learn in both motor and declarative learning .