Regression Metrics

Introduction

This post will be more theoretical and would explain in detail about the different Regression Metrics involved in Regression Models and what are their advantages and disadvantages.

While we discuss about the different Regression Metrics in this post, take a while to also go through this post which discusses about the mathematical assumptions we make while solving problems using Regression.

Let’s start with an example, so that we could explain the concepts better:

Example 1:

We have built a regression model to predict the salaries for 5 employees. The actual vs the predicted salaries of the 5 employees are as follows:

EmployeesActualPredicted
Person_1$1000$1000
Person_2$1400$1350
Person_3$2500$2480
Person_4$3000$3000
Person_5$4000$3950

Let’s calculate the overall prediction error that we got:

Overall Predicted Total: 1000 + 1350 + 2480 + 3000 + 3950 = 11,780      

Overall Actual Total: 1000 + 1400 + 2500 + 3000 + 4000 = 11,900

Difference between the Actual and Predicted salaries = 11,900 – 11,780 = 120

Example 2:

Let’s say we used another model and predicted the salaries of the 5 employees as below:

EmployeesActualPredicted
Person_1$1000$1000
Person_2$1400$1400
Person_3$2500$2500
Person_4$3000$3000
Person_5$4000$4120

Let’s calculate the overall prediction error that we got:

Overall Predicted Total: 1000 + 1400 + 2500 + 3000 + 4120 = 12020   

Overall Actual Total: 1000 + 1400 + 2500 + 3000 + 4000 = 11,900

Difference between the Actual and Predicted salaries = 11,900 – 12020 = -120

Let’s have a close look at the results of both the regression models and ask ourselves:

  1. Does the sign (positive or negative) make a difference. Which one would you prefer here – 120 or -120?
  2. Even though we got the absolute difference in both the case to be the same, but the predictions were very different. The predictions for the 2nd model had all the values to be the except the last one which cannot be seen in the absolute difference.

So long story short, how do you better analyze the performance of the model here. What is the secret sauce for it?

Well, there is indeed a secret sauce to it which is the model performance matrices. I guess this sounded serious and boring. Well actually it isn’t. In fact, it’s pretty interesting.

Mostly the metrices which we use are as follows:

  1. Mean Absolute Error
  2. Mean Squared Error
  3. Root Mean Squared Error
  4. Root Mean Squared Logarithmic Error
  5. R- Squared.
  6. Adjusted R-Squared

1. Mean Absolute Error (MAE)

Mean Absolute Error is a measure of the average of the absolute difference between each Actual and predicted value.  The formula for Mean Absolute Error is as follows:

In the above equation, y-hat is the predicted value and y is the actual value.

Let’s take the below example  and calculate the Mean absolute Error. Later we will also see how MAE is calculated programatically.

EmployeesActualPredicted
Person_1$3000$3800
Person_2$4400$4600
Person_3$2500$2300
Person_4$3000$3400
Person_5$4000$4120

|3000 – 3800| = 800

|4400 – 4600| = 200

|2500 – 2300|= 200

|3000 – 3400|= 400

|4000 – 4120| = 120

Average of the summed-up error = (800 + 200 + 200 + 400 + 120) / 5 = 344

Now, what does $344 mean. The question which come in mind is what can be concluded from this.

How did my model perform?

So, before jumping to that, let’s get few things cleared:

The result that we got basically says that we should be expecting our predictions to be off by $344 from the Actual value.

Okay, let’s understand now, whether our model performance was at par or not:

  • Basically, evaluating how well a model performed depends on the scenarios and dataset. You could say that evaluating the performance is more subjective than objective. To understand it better, we could say that our model performed great if the person’s salary was between $100 to $1M, but if the range is between $3000 to $4400, then $344 signifies that our model has underperformed.
  • For some problems we might have been assigned the MAE threshold. This will give you more perspective about your model performance and could help you gauge performance. If we are not specified the threshold value, then the Dependent Variable will help us access our model performance. In the above example the range is between $3000 and $4400, so MAE of $344 is pretty high given the range of our Dependent Variable.

Let’s implement this in code as see how it works:

from sklearn.metrics import mean_absolute_error
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
print("Mean Absolute Error: ",mean_absolute_error(actual_values,predicted_values))
O/P: Mean Absolute Error: 0.5

2. Mean Squared Error (MSE)

  • Takes the average of the square of the difference between the original values and the predicted values.
  • As we take square of the error, the effect of larger errors (sometimes outliers) become more pronounced then smaller error as a result of which model will be penalized more for making predictions that differ greatly from the corresponding actual value.
  • Before applying MSE, we must eliminate all nulls/infinites from the input.
  • Not robust to outliers
  • Range (0, + infinity]
EmployeesActualPredicted
Person_1$3000$3800
Person_2$4400$4600
Person_3$2500$2300
Person_4$3000$3400
Person_5$4000$4120

(3000 – 3800)2 = 640.000

(4400 – 4600)2 = 40,000

(2500 – 2300)2 = 40,000

(3000 – 3400)2 = 160,000

(4000 – 4120)2 = 14,400

The average of the squared error summations is:

(640,000 + 40,000 + 40,000 + 160,000 + 14,400) / 5 = 178,880

Let’s implement this in code as see how it works:

from sklearn.metrics import mean_squared_error
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
print("Mean Squared Error: ",mean_squared_error(actual_values,predicted_values))
O/P: Mean Squared Error: 0.375

3. Root Mean Squared Error (RMSE)

Root mean Squared Error is defined as the square root of the average of the squared difference between the predicted values and actual values.

Let’s take the below example to understand this:

EmployeesActualPredicted
Person_1$3000$3800
Person_2$4400$4600
Person_3$2500$2300
Person_4$3000$3400
Person_5$4000$4120

(3000 – 3800)2 = 640.000

(4400 – 4600)2 = 40,000

(2500 – 2300)2 = 40,000

(3000 – 3400)2 = 160,000

(4000 – 4120)2 = 14,400

The average of the squared error summations is:

(640,000 + 40,000 + 40,000 + 160,000 + 14,400) / 5 = 178,880

Square root of the square error summation:

Square root of 208,125 = ~423

Notice that the RMSE is greater than MAE. The reason is RMSE is squaring the difference between the predictions and the ground truth, any significant difference is made more substantial when it is being squared. RMSE is more sensitive to outliers.

So, if you are not concerned about the outliers then RMSE is better at evaluating the model performance. Also, like MAE, the smaller the value of RMSE, the better is the performance of your model.

Also, MAE and RMSE are in the same units as the dependent variable. Hence, we can compare MAE/RMSE with the dependent variable.

Let’s implement this in code as see how it works:

from sklearn.metrics import mean_squared_error
from math import sqrt
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
a = mean_squared_error(actual_values,predicted_values)
root_mean_squared_error = sqrt(a)
print("Root Mean Squared Error:", root_mean_squared_error)
O/P: Root Mean Squared Error: 0.6123724356957945

4. Root Mean Squared Logarithmic Error

  • In this metric, we take the log of the predictions and actual values.
  • What changes are the variance that we are measuring.
  • RMSLE is usually used when we don’t want to penalize huge differences in the predicted and the actual values when both predicted and actual values are huge numbers.
  • Firstly if both predicted and actual values are small: RMSE and RMSLE are same.
  • Secondly if either predicted or the actual value is big: RMSE > RMSLE
  • Lastly, if both predicted and actual values are big: RMSE > RMSLE (RMSLE becomes almost negligible)

Let’s implement this in code and see how it works:

from sklearn import metrics
import numpy as np
X = [1,2,3]
actual_values = [3, 0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
print("Root Mean Squared Logarithmic Error:", metrics.mean_squared_log_error(actual_values, predicted_values))
O/P: Root Mean Squared Logarithmic Error: 0.0490263575494607

5. R-Squared

R-squared or R2 explains the degree to which your input variables explain the variation of your output / predicted variable. So, if R-square is 0.8, it means 80% of the variation in the output variable is explained by the input variables. So, in simple terms, higher the R squared, the more variation is explained by your input variables and hence better is your model.

However, the problem with R-squared is that it will either stay the same or increase with addition of more variables, even if they do not have any relationship with the output variables.

Let’s implement this in code and see how it works:

from sklearn import metrics
import numpy as np
X = [1,2,3]
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
print(metrics.r2_score(actual_values,predicted_values))
O/P: 0.9486081370449679

6. Adjusted R-Squared

This is where “Adjusted R square” comes to help. Adjusted R-square penalizes you for adding variables which do not improve your existing model.

Therefore, if you are building Linear regression on multiple variable, it is always suggested that you use Adjusted R-squared to judge goodness of model. In case you only have one input variable, R-square and Adjusted R squared would be exactly same.

Typically, the more non-significant variables you add into the model, the gap in R-squared and Adjusted R-squared increases. Below is how the formula of Adjusted R-Squared looks:

Let’s implement this in code and see how it works:

We will calculated Adjusted R-Squared by using the value of R-Squared from the above:

adjusted_r_squared = 1 - (1-(metrics.r2_score(actual_values,predicted_values)))*(len(actual_values)-1)/(len(actual_values)-len(X)+1)
print("Adjusted R-Squared: ",adjusted_r_squared)
O/P: Adjusted R-Squared: 0.9229122055674519

Reference:

  1. https://scikit-learn.org/stable/modules/model_evaluation.html
  2. https://en.wikipedia.org/wiki/Mean_squared_error
  3. https://www.h2o.ai/blog/regression-metrics-guide/