Regression Model Scoring with Scikit-Learn

Share on:

Table of Contents

Introduction

In the last tutorial, we looked at the various ways in which classification models may be evaluated, and the subtle ways in which each metric may be interpreted.

In this tutorial, we look at the evaluation of regression models, which is luckily, much more straightforward. There are no nerve-wracking notions such as ‘false positives’ or metrics that have different names depending on the context such as recall, a.k.a. true positive rate, a.k.a, sensitivity.

All regression metrics are just different ways of comparing the actual and predicted arrays of values—like in the case of classification evaluation—so we don’t need to be concerned with model selection and the fitting of data, although we use a linear prediction for illustration.

We will cover key five metrics that we can use to evaluate the performance of a regression model:

  • Mean Absolute Error (MAE) - The easiest to reason about, but a bit blunt.
  • Mean Squared Error (MSE) - The ‘standard’ that punishes outliers for good—and bad sometimes!
  • Root Mean Squared Error (RMSE) - Just like MSE but without ‘upscaling’ the values.
  • Median Absolute Error (MedAE) - A better MAE for cases when Bill Gates gets on the bus …
  • R2 Score - The go-to metric for all use cases unless there’s a good reason not to.

For each of the metrics, we will display three graphs, reflecting three data sets/scenarios:

  1. One ‘baseline’ fairly linear data set with values close to the mean
  2. The same baseline data set with positive outliers
  3. The same baseline data set with negative outliers

Every time we discuss one of the metrics, we will display the results for all five of them, against the three data sets, so that we can always compare them.

Obligatory Imports and Boilerplate

 1import random
 2import numpy as np
 3import pandas as pd
 4import matplotlib.pyplot as plt
 5from IPython.display import display, Markdown
 6from sklearn.model_selection import train_test_split
 7from sklearn.metrics import r2_score
 8from sklearn.metrics import mean_squared_error
 9from sklearn.metrics import mean_absolute_error
10from sklearn.metrics import median_absolute_error
11from sklearn.linear_model import LinearRegression

Data Sets and Visualisation

The following code below creates three sample data sets, and the results for each of the five discussed three metrics. You don’t need to understand the below code for this tutorial, it is here just for convenience, so feel free to skip ahead.

 1from sklearn.metrics import r2_score
 2from sklearn.metrics import mean_squared_error
 3from sklearn.metrics import mean_absolute_error
 4from sklearn.metrics import median_absolute_error
 5
 6samples = 100
 7np.random.seed(0)
 8
 9def add_outliers(array, step):
10    return ([ v+(step/41) if i % step == 0 else v 
11             for i,v in enumerate(array) ])
12
13x = np.linspace(0.0,1.0,samples) 
14r = np.random.rand(samples)    
15y = np.array(list(map(lambda t: t[0]/2 + t[1]/5, zip(x,r))))
16y_plus  = add_outliers(y,15)
17y_minus = add_outliers(y,-15)
18X = x.reshape(samples,-1) 
19
20# Baseline
21X_train, X_test, y_train, y_test = train_test_split(X,y,random_state=0)
22model = LinearRegression().fit(X_train,y_train)
23y_pred = model.predict(X_test)
24
25# Positive Outliers
26X_train_plus, X_test_plus, y_train_plus, y_test_plus = train_test_split(X,y_plus,random_state=0)
27model_plus = LinearRegression().fit(X_train_plus,y_train_plus)
28y_pred_plus = model.predict(X_test_plus)
29
30# Negative Outliers
31X_train_minus, X_test_minus, y_train_minus, y_test_minus = train_test_split(X,y_minus,random_state=0)
32model_minus = LinearRegression().fit(X_train_minus,y_train_minus)
33y_pred_minus = model.predict(X_test_minus)
34
35def visualise():
36    plt.rcParams['figure.figsize'] = [16, 4]
37    plt.rcParams['figure.dpi'] = 100 
38    
39    ax = plt.subplot(1, 3, 1)
40    plt.title("Baseline")
41    plt.scatter(X,y,s=0.5)
42    plt.plot(X,model.predict(X),'-r', label="Prediction")
43    plt.legend(loc='upper left')
44    
45    plt.subplot(1, 3, 2, sharey=ax)
46    plt.title("Positive outliers")
47    plt.scatter(X,y_plus,s=0.5)
48    plt.plot(X,model_plus.predict(X),'-r', label="Prediction")
49    plt.legend(loc='upper left')
50    
51    plt.subplot(1, 3, 3, sharey=ax)
52    plt.title("Negative outliers")
53    plt.scatter(X,y_minus,s=0.5)
54    plt.plot(X,model_minus.predict(X),'-r', label="Prediction")
55    plt.legend(loc='upper left')
56    plt.show()
57    
58    print("                   MAE        MSE        RMSE       MedAE      R2")
59    print("         Baseline: {:.8f} {:.8f} {:.8f} {:.8f} {:.8f}"
60        .format(mean_absolute_error(y_test, y_pred),
61                mean_squared_error(y_test,y_pred),
62                mean_squared_error(y_test,y_pred,squared=False),
63                median_absolute_error(y_test,y_pred),
64                r2_score(y_test,y_pred)))
65    print("Positive Outliers: {:.8f} {:.8f} {:.8f} {:.8f} {:.8f}"
66        .format(mean_absolute_error(y_test_plus, y_pred_plus),
67                mean_squared_error(y_test_plus,y_pred_plus),
68                mean_squared_error(y_test_plus,y_pred_plus,squared=False),
69                median_absolute_error(y_test_plus,y_pred_plus),
70                r2_score(y_test_plus,y_pred_plus)))
71    print("Negative Outliers: {:.8f} {:.8f} {:.8f} {:.8f} {:.8f}"       
72        .format(mean_absolute_error(y_test_minus, y_pred_minus),
73                mean_squared_error(y_test_minus,y_pred_minus),
74                mean_squared_error(y_test_minus,y_pred_minus,squared=False),
75                median_absolute_error(y_test_plus,y_pred_minus),
76                r2_score(y_test_minus,y_pred_minus)))    

Regression Evaluation Metrics

Mean Absolute Error (MAE)

Mean absolute error (MAE) is obtained using the mean_absolute_error(y_actual,y_pred) function. As an error function, the closer to zero, the better.

This is the simplest metric for evaluating regression models since it consists on calculating the average of all absolute distances between actual values and predicted values.

Calculating MAE for two values

Let’s now look at the effect of using MAE with a larger data set:

1visualise()

                   MAE        MSE        RMSE       MedAE      R2
         Baseline: 0.05046426 0.00339777 0.05829042 0.05499852 0.84103010
Positive Outliers: 0.08362021 0.01648675 0.12840072 0.05757718 0.53651638
Negative Outliers: 0.09174612 0.02243254 0.14977495 0.05757718 0.36711532

This metric is suitable when the impact of the difference between observations and predictions is linear. For example, if the actual value is 2, and the prediction was 7, it would be off by ‘5’ points. If the prediction was 12, it would be off by ‘10’ points.

We can calculate the mean absolute error ourselves using the below formula:

1def mae(y_actual,y_pred):
2    r = np.mean([ abs(ya-yp) for ya,yp in zip(y_actual,y_pred)  ])
3    # r = np.mean(abs(y_actual-y_pred)) # Pandas-idiomatic way
4    return r 
5mae(y_test,y_pred) # baseline 
0.0504642646765226

Mean Squared Error (MSE)

Mean square error (MSE) is obtained using the mean_squared_error(y_actual,y_pred) function. As an error function, the closer to zero, the better.

This metric is sort of like MAE except that the difference between the actual and predicted values is converted to a positive result by squaring the result rather than by applying abs()—sorry math folk for the pythonic explanation!

Calculating MSE for two values

By squaring the differences, outliers penalise the resulting score more than MAE, but only in the case of non-fractional differences. More on this in a moment.

Let’s first take a look at MSE applied to a larger data set.

1visualise()

                   MAE        MSE        RMSE       MedAE      R2
         Baseline: 0.05046426 0.00339777 0.05829042 0.05499852 0.84103010
Positive Outliers: 0.08362021 0.01648675 0.12840072 0.05757718 0.53651638
Negative Outliers: 0.09174612 0.02243254 0.14977495 0.05757718 0.36711532

We can calculate the mean squared error ourselves using the below formula:

1def mse(y_actual,y_pred):
2    r = np.mean([ (ya-yp)**2 for ya,yp in zip(y_actual,y_pred)  ])
3    #r = np.mean((y_actual - y_pred)**2) # Numpy-idiomatic way
4    return r
5mse(y_test,y_pred) # baseline
0.0033977735623684923

The Issue with Fractional Distances

If we look at the distance between actual values and predicted values, we can observe that most of them are fractional numbers (i.e., values lower than 1.0).

1distances = [ (ya-yp) for ya,yp in zip(y_test,y_pred) ]
2distances[:10]
[-0.08240320001531051,
 0.02421857651852455,
 -0.002778311611712825,
 -0.0640012559810067,
 -0.07821132666637365,
 0.06640940270316664,
 -0.11213731522943875,
 0.03389879138769458,
 -0.05499852206107747,
 -0.039196338275046205]

When we square a fractional number, we obtain a smaller number, rather than a larger one!

110**2
100
10.1**2
0.010000000000000002

Therefore, the squared distances are smaller than the absolute ones!

1pd.DataFrame({"distances" : distances[:10],
2              "absolute": list(map(abs, distances))[:10],
3              "squared": list(map(lambda x: x**2, distances))[:10]
4})

distances absolute squared
0 -0.082403 0.082403 0.006790
1 0.024219 0.024219 0.000587
2 -0.002778 0.002778 0.000008
3 -0.064001 0.064001 0.004096
4 -0.078211 0.078211 0.006117
5 0.066409 0.066409 0.004410
6 -0.112137 0.112137 0.012575
7 0.033899 0.033899 0.001149
8 -0.054999 0.054999 0.003025
9 -0.039196 0.039196 0.001536

See that, if we scale the values up by 100, we ‘fix’ the ‘issue’, and MSE reports a harsher (further away from 0.0) score than MAE, which is normally the expectation:

1mean_absolute_error(y_test * 100, y_pred * 100)
5.0464264676522586
1mean_squared_error(y_test * 100, y_pred * 100)
33.97773562368493

In conclusion, while MSE is known to allow outliers to exert more influence, compared to MAE, this is not the case when the distances are below 1.0.

Root Mean Squared Error (RMSE)

This is exactly like MSE except that the squared root is applied to the result first, so that the error is in a range commensurate to the data set’s scale.

For RMSE we only need to pass the squared=False extra argument to mean_square_root(y_actual,y_pred) as follows:

1mean_squared_error(y_test,y_pred,squared=False) # baseline
0.0582904242767926

Let’s now see RMSE in action.

1visualise()

                   MAE        MSE        RMSE       MedAE      R2
         Baseline: 0.05046426 0.00339777 0.05829042 0.05499852 0.84103010
Positive Outliers: 0.08362021 0.01648675 0.12840072 0.05757718 0.53651638
Negative Outliers: 0.09174612 0.02243254 0.14977495 0.05757718 0.36711532

If we want to implement RMSE ourselves, it is the same code as mse() plus the sqrt function:

1def rmse(y_actual,y_pred):
2    return np.sqrt(mse(y_actual,y_pred))
3rmse(y_test,y_pred) # baseline
0.0582904242767926

Median Absolute Error (MedAE)

Median absolute error (MedAE) is obtained using the median_absolute_error(y_actual,_y_pred) function. As an error function, the closer to zero, the better.

1visualise()

                   MAE        MSE        RMSE       MedAE      R2
         Baseline: 0.05046426 0.00339777 0.05829042 0.05499852 0.84103010
Positive Outliers: 0.08362021 0.01648675 0.12840072 0.05757718 0.53651638
Negative Outliers: 0.09174612 0.02243254 0.14977495 0.05757718 0.36711532

MedAE may be seen as another variation of MAE but the variation is not in how the differences between actuals and predictions are made positive, but in simply summarising the results using median(), rather than mean()—my apologies again math people.

This approach minimises the influence of outliers that may sit at the extremes, but it may cause problems if we do want those outliers to be accounted in the score…

If Bill Gets gets on a bus, the average passenger becomes a millionaire, but the median passenger’s net worth remains largely unchanged.

We can calculate MedAE ourselves using the below formula:

1def medae(y_actual,y_pred):
2    r = np.median([ abs(ya-yp) for ya,yp in zip(y_actual,y_pred)  ])
3    #r = np.median(abs(y_actual-y_pred)) # Pandas-idiomatic way
4    return r
5medae(y_test,y_pred) # baseline
0.05499852206107747

R-Squared (R2)

This is the ‘de facto’ metric for evaluating regression models, and the one used by model.score(), where model may be Linear, SVC, etc. It can also be used directly as r2_score(y_actual,y_pred). Unlike the error-wise metrics, the score is better the closer it gets to 1.

1visualise()

                   MAE        MSE        RMSE       MedAE      R2
         Baseline: 0.05046426 0.00339777 0.05829042 0.05499852 0.84103010
Positive Outliers: 0.08362021 0.01648675 0.12840072 0.05757718 0.53651638
Negative Outliers: 0.09174612 0.02243254 0.14977495 0.05757718 0.36711532

A peculiarity of this metric is that it can produce negative scores. The bottom limit isn’t 0.0.

r2 is similar in spirit to MSE in the sense that it uses squared differences, but the formula is more complex as it can be appreciated below:

1def r2(y_actual,y_pred):
2    mean = np.mean(y_actual)
3    rss = sum([ (ya-yp)**2 for ya,yp in zip(y_actual,y_pred)  ])
4    tss = sum([ (ya-mean)**2 for ya,yp in zip(y_actual,y_pred)  ])
5    return 1-rss/tss
6r2(y_test,y_pred) # baseline
0.8410301048023623

Conclusion

We looked at five different metrics to evaluate regression models:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Median Absolute Error (MedAE)
  • R2 Score

Given that R2 is the only metric that provides a consistent score range with an upper limit of 1.0, similarly to most classification metrics, it is not wonder that it is the most popular one, and the one implemented by most models when invoking the model.score() method.

The choice of other, ’error’ metrics, boils down to a choice between the following trade-offs:

Trade-off #1

Whether to allow big outliers to exert greater influence (MSE) or not (MAE)

1# MAE
2mean_absolute_error([1,2,3,4,5,6,7,8,9],
3                    [1,2,3,4,5,6,7,8,25])
1.7777777777777777
1# MSE
2mean_squared_error([1,2,3,4,5,6,7,8,9],
3                   [1,2,3,4,5,6,7,8,25])
28.444444444444443

Trade-off #2

Whether to ignore outliers that sit at the extremes (MedAE) or not (MAE)

1# MAE
2mean_absolute_error([1,2,3,4,5,6,7,8,9],
3                    [1,2,3,4,5,6,7,8,25])
1.7777777777777777
1# MedAE
2median_absolute_error([1,2,3,4,5,6,7,8,9],
3                      [1,2,3,4,5,6,7,8,25])
0.0

Before You Leave

🤘 Subscribe to my 100% spam-free newsletter!

website counters