SVM with Scikit-Learn

May 29, 2022 data coding machine learning python

Introduction

Support Vector Machines (SVM) present themselves with a scary name, suggesting that something somewhat sophisticated—or macabre—might be at play. Here, instead, we will look at SVM from a practical perspective, rather than a theoretical one, using Scikit-Learn.

SVMs are typically used as a more accurate means for classification, compared to logistic regression, as we will see.

Imports and Boilerplate

 1import numpy as np
 2import pandas as pd
 3import matplotlib.pyplot as plt
 4from sklearn.model_selection import train_test_split
 5from sklearn.preprocessing import MinMaxScaler
 6from sklearn.linear_model import LinearRegression
 7from sklearn.linear_model import LogisticRegression
 8from sklearn.svm import SVC
 9
10plt.rcParams['figure.figsize'] = [8, 7]
11plt.rcParams['figure.dpi'] = 100

A Classification Problem. Red Dots vs Blue Dots.

We use classification to predict the likelihood of a class vs another, given one or more features.

Let us contemplate two classes (i.e., a binary/binomial classification), consisting of red and blue dots. Do not worry too much about the code below; jump straight to the plot and we’ll continue the conversation.

 1np.random.seed(0)
 2samples = 80
 3red_x = np.linspace(1,8,samples)
 4red_y = [ v + 0.5 + np.random.rand()*(10-v-2) for v in red_x]
 5
 6blue_x = np.linspace(1,10,samples)
 7blue_y = [ v-(np.random.rand()*v) for v in blue_x]
 8
 9X = np.concatenate((
10                    np.array( [ [x,y] for (x,y) in zip(red_x,red_y)]),
11                    np.array( [ [x,y] for (x,y) in zip(blue_x,blue_y)])
12                  ))
13y = np.concatenate((
14                    np.repeat(0,len(red_x)),
15                    np.repeat(1,len(blue_x))
16                   ))
17
18def visualise(models,names):
19    plt.scatter(red_x,red_y, color='red')
20    plt.scatter(blue_x,blue_y, color='blue')
21    plt.xticks(range(0,11))
22    plt.yticks(range(0,11))
23    plt.xlabel("X")
24    plt.ylabel("Y")
25    for i,m in enumerate(models):
26        class_boundary = [ (x,y) 
27                           for x in np.linspace(0,10,200) 
28                           for y in np.linspace(0,10,200)
29                           if abs((m.predict_proba([[x,y]])[0][1])-0.5)
30                             <= 0.001 ]
31        plt.plot([ t[0] for t in class_boundary ], 
32                 [ t[1] for t in class_boundary ],
33                 label=names[i])
34    if len(models) > 0:
35        plt.legend(loc='upper left')
36    
37
38visualise([],[])

Ok, above we have X and Y axes (both in the 0..10 range), and depending on the coordinate, it is more likely that we will get either red or blue dots.

Logistic Regression

An easy predictive model that we can use is logistic regression (check my tutorial), which will predict (barring regularisation, etc.) either red or blue dots, depending on whether they fall above or below the teal decision boundary line.

1model_logistic = LogisticRegression()
2model_logistic.fit(X,y)
3visualise([model_logistic],["Logistic"])

As you can see above, the logistic regression model has done a good job. Normally, you don’t need a model more complicated than this.

But, there’s something not quite right….intuitively, you know you could draw the decision boundary line yourself without overlapping or touching any dots, right?

Enter support-vector machines!

Support-Vector Machines (SVM)

Support Vectors

Well, the support vectors in SVM, are—in layman terms—exactly that, the lines that are closer to either class, without touching any dots! Let us see.

1model_svm = SVC(kernel = 'linear',probability=True)
2model_svm.fit(X,y)
3visualise([model_logistic,model_svm],["Logistic","Linear SVM"])

Now you can see the difference between the two. The decision boundary suggested by the SVM model sits perfectly between the two clusters of dots, without touching or overlapping with any of them.

Let us not jump to the conclusion that SVM is necessarily “superior” to a simple logistic model.

Purpose

The SVM model is suitable when what matters is not the aggregate influence of the features in each class. Instead, in SVM the goal is to define the decision boundary as further away as possible from the most proximate dots—in visualisation terms.

This ‘being away from the dots’ affair is what the statisticians behind Skynet call the margin, as shown in the figure featuring a lovely T-800.

The margin is the width between—again, in layman terms—the support vector ‘above’, and the support vector ‘below’.

If this is still too cryptic, you can imagine that in logistic regression all the dots exert a gravitational force over the decision boundary line, whereas in SVM, only the dots that are on the edge are relevant for drawing the decision boundary line.

Regularisation

Like most other models, SVM can be regualarised using the C parameter. The lowest the value, the more biassed the model becomes. In layman terms, the least “obedient” it becomes, in the presence of more dots on either side. The lower the value, the higher the bias.

1model_svm2 = SVC(kernel = 'linear',probability=True, C=0.01)
2model_svm2.fit(X,y)
3visualise([model_logistic,model_svm,model_svm2],
4          ["Logistic","SVM", "SVM with regularisation"])

Please note that for our data set, the SVM model without regularisation is already sufficient. The accentuated green line is on purpose.

Non-Linear Models and Kernels

You might have heard about kernelised support vector machines. What does Linus Torvalds have to do with SVM? Nothing. As there weren’t enough cryptic words in SVM already! Relax, the math folk love to confound mere mortals!

Remember polynomial regression (see my previous blog post), where, when you had a curve, you could still use linear regression (in the form of polynomial regression) in which you keep the same model but just transform the features.

Well, the general method of not changing the model itself, but tinkering with the features, before they are fed into the underlying model, is called a kernel.

These transformed features result in, what machine learning experts call, a ‘higher-dimensional space’. It reminds me of this.

Now, the good news is that the kernel implementation is completely encapsulated in Scikit-Learn, so we don’t really need to tinker with all of this twilight zone affair.

Let us cut with the rant, and look at an example, consisting of green and purple dots, for a change. As usual, don’t worry about the code below, just skip to the plot.

 1np.random.seed(1)
 2samples = 200
 3
 4dataset = np.array([ (x,2+(np.random.rand()*2), 
 5                      1 if 5.5 > x > 4 else 0 ) 
 6                      for x in np.linspace(0,8,samples) ])
 7
 8green_x = [ x for (x,y,v) in dataset if not v ]
 9green_y = [ y for (x,y,v) in dataset if not v ]
10purple_x = [ x for (x,y,v) in dataset if v ]
11purple_y = [ y for (x,y,v) in dataset if v ]
12
13X = np.array([ [x,y] for (x,y,_) in dataset ])
14y = np.array([ z for (_,_,z) in dataset ])
15
16def visualise(models,names):
17    plt.scatter(green_x, green_y, color='green')
18    plt.scatter(purple_x, purple_y, color='purple')
19    plt.yticks([0,2,4,6])
20    plt.xticks([0,2,4,6,8])
21    plt.xlabel("X")
22    plt.ylabel("Y")
23    for i,m in enumerate(models):
24        class_boundary = [ (x,y) 
25                           for x in np.linspace(0,10,200) 
26                           for y in np.linspace(0,70,200)
27                           if abs((m.predict_proba([[x,y]])[0][1])-0.5)
28                             <= 0.01 ]
29        plt.plot([ t[0] for t in class_boundary ], 
30                 [ t[1] for t in class_boundary ],
31                 label=names[i])
32    if len(models) > 0:
33        plt.legend(loc='upper left')
34
35visualise([],[])

Things got certainly more interesting. Unlike the red and blue dots seen a few moments ago, the above green and purple dots are not all living happily in their own neighbourhoods. We got greens, then purples, then greens again.

In this particular instance, we try the Radial Basis Function (RFB) Kernel, without any extra hyperparameters, regularisation, etc.

1np.random.seed(1)
2
3model_svm = SVC(kernel = 'rbf',probability=True,gamma='auto')
4model_svm.fit(X,y)
5  
6visualise([model_svm],["SVM RBF"])
7display(model_svm.score(X,y))

0.985

We won’t delve into the math underlying RBF, but it suffices to say that it is helpful for data sets that exhibit a sort of Gaussian distribution. In the above example, we can see how RBF achieves a model that separates the middle purple dots from the green ones.

Conclusion

We saw the practical application of Support Vector Machines (SVM). When used for classification use cases, the key advantage of SVM, over traditional logistic regression is that it focuses on the distance (margin) between the classes, as opposed to the influence exerted by every single data point, as in the case when using the least squares method.

We demonstrated the usefulness of the linear kernel, and an example of a more advanced kernel in the form of RBF. There are other supported kernels by Scikit-Learn such as ‘poly’, ‘sigmoid’, and ‘precomputed’. These might be topics for future posts.

Before You Leave

🤘 Subscribe to my 100% spam-free newsletter!