Understanding Machine Learning Performance Metrics in Simple Terms
4 min | March 10, 2025
Introduction
Machine learning models are used to make predictions, but how do we know if they are doing a good job? Just like how we measure a student's performance using test scores, we have different ways to measure the success of a machine learning model. These measurements are called performance metrics.
In this blog, we'll break down key performance metrics into simple, easy-to-understand explanations, so you don't need to be a tech expert to get the idea!
Metrics for Predicting Numbers (Regression Models)
Imagine you're trying to predict house prices based on factors like size and location. The actual price might be 95,000. How do we measure how close or far off these predictions are?
1. Mean Absolute Error (MAE)
Think of this as the average mistake your model makes. If MAE is 5,000 off from the actual prices.
Formula:
Python Code:
python1from sklearn.metrics import mean_absolute_error 2mae = mean_absolute_error(y_true, y_pred) 3print("MAE:", mae)
👉 Lower MAE = Better predictions
2. Mean Squared Error (MSE)
This metric is like MAE but with a twist: It squares the errors before averaging them. This means that big mistakes are penalized more than small ones.
Formula:
Python Code:
python1from sklearn.metrics import mean_squared_error 2mse = mean_squared_error(y_true, y_pred) 3print("MSE:", mse)
👉 Smaller MSE = Fewer large mistakes
3. Root Mean Squared Error (RMSE)
RMSE is just the square root of MSE. It also punishes big mistakes but brings the final number back to the original unit (like dollars instead of squared dollars).
Formula:
Python Code:
python1rmse = mean_squared_error(y_true, y_pred, squared=False) 2print("RMSE:", rmse)
👉 Lower RMSE = Better accuracy with a focus on big mistakes
4. R-squared (R² Score)
This tells us how well the model explains the data. If R² is 0.90, it means 90% of the changes in house prices can be explained by the model.
Formula:
Python Code:
python1from sklearn.metrics import r2_score 2r2 = r2_score(y_true, y_pred) 3print("R² Score:", r2)
👉 Higher R² (closer to 1) = Model is doing a good job
Metrics for Categorizing Things (Classification Models)
Now, imagine you’re designing a model to detect whether an email is spam or not spam. How do we measure if the model is working well?
1. Accuracy
This is the simplest metric—what percentage of predictions were correct?
Formula:
Python Code:
python1from sklearn.metrics import accuracy_score 2accuracy = accuracy_score(y_true, y_pred) 3print("Accuracy:", accuracy)
👉 Higher accuracy = More correct predictions
2. Precision
Precision focuses on how many of the emails predicted as spam were actually spam.
Formula:
Python Code:
python1from sklearn.metrics import precision_score 2precision = precision_score(y_true, y_pred) 3print("Precision:", precision)
👉 High precision = Fewer false alarms
3. Recall (Sensitivity)
Recall measures how many of the actual spam emails were caught by the model.
Formula:
Python Code:
python1from sklearn.metrics import recall_score 2recall = recall_score(y_true, y_pred) 3print("Recall:", recall)
👉 High recall = Model catches more spam emails
4. F1 Score
F1 Score balances precision and recall.
Formula:
Python Code:
python1from sklearn.metrics import f1_score 2f1 = f1_score(y_true, y_pred) 3print("F1 Score:", f1)
👉 Higher F1 Score = Better balance between precision and recall
5. Confusion Matrix
A confusion matrix is like a report card for the model.
Python Code:
python1from sklearn.metrics import confusion_matrix 2cm = confusion_matrix(y_true, y_pred) 3print("Confusion Matrix:\n", cm)
👉 Lower false positives and false negatives = Better performance
6. ROC Curve & AUC Score
This is a fancy way to check how well the model separates spam from non-spam emails.
Python Code:
python1from sklearn.metrics import roc_auc_score 2auc = roc_auc_score(y_true, y_pred_prob) 3print("AUC Score:", auc)
👉 Higher AUC (closer to 1) = Better model
Conclusion
Performance metrics help us understand how well a machine learning model is working. The choice of metric depends on what we are trying to achieve: ✅ If predicting numbers, focus on MAE, MSE, RMSE, or R². ✅ If classifying categories, use accuracy, precision, recall, F1 Score, confusion matrix, and AUC.
The key is to pick the right metric for the right job. Hopefully, this guide makes machine learning metrics a little less intimidating!