Support Vector Machine(SVM) — Dual Form

This article describes about arriving to dual form problem given the constrained optimization problem of SVM through Lagrangian function.

Why Support Vector Machine?

Consider a binary classification in 2D where we have positive and negative points, and we are supposed to identify a line to separate the positive and negative points. Though we identify one line, there will be many lines that are possible to classify the points, and the question is among all the possible lines which one is the best line, and the solution is given by Support Vector Machine and the answer is , the line which have Maximum Margin. The theory about how to derive SVM objective function and constraint is not described as we are going to focus only on dual formulation.

SVM selects the best line(hyperplane) such that if we get a new datapoint during test/prediction is very near to the classifying line it should be classified correctly, so a line having good margin between the two different data points.

Dual Formulation

To find such a hyperplane which has maximum margin, we need to solve the below SVM optimization problem with constraints.

which can be rewritten as

The idea to solve problem A is to convert the above to Lagrangian function.

Lagrangian Method

One of the way to solve constraint optimisation problem is to use Lagrange Multiplier. The idea of that is to add one more variable(called Lagrange Multipier) for each constraint and solve it.

We may think that it will become more unsolvable problem when we add new variable, but the variable we add will basically connects the Constraints with the optimization function.

To get high level understanding about Lagrangian function, we can describe Lagrangian function as

L=(OPTIMIZATION_FUNCTION ) -for_each_constraint(VARIABLE *(CONSTRAINT))

The variable here is what we call LAGRANGE MULTIPLIER usually denoted by alpha α.

We take the optimization function and subtract it with sum of all the constraints multiplied by lagrangian multiplier. Each constraint will have its own Langrangian Multiplier.

L=(OPTIMIZATION_FUNCTION ) - Σ α *(CONSTRAINT)

In our case , The Lagrangian is defined as

Steps to solve

  1. Define the Langrangian function
  2. Solve the below set of equation

3. Solving the above equations will give value for w which can be substituted in Lagrangian Equation C

4. Substitute D in C

In C we have |w|² which is nothing but dot product of w and w i.e w.w

5. Rearranging the terms

Multiplying the second term with yi as well as with minus one

Bringing similar terms together and multiplying the minus value with ‘b’ as well.

The third term is Zero as per the equation E also the first two terms are of the form 1/2(something) - 1(something) = -1/2(something)

written as

This is the dual form of SVM, we can observe there is dot product of xi and xj which is a similary value between feature i and feature j. If we solve the above optimization problem, we will get values for alpha, and alpha will be zero for other areas but have non zero for the support vector points.

So at this point we have alpha, x(feature) y(labels) and this can be plugged into equation D to get the weight vectors.

Advantages of Dual Problem

The advantage of dual problem is the existence of the similarity matrix of features represented as xi and xj which is useful in kernalization , helping the optimization problem easy to solve

Credits

Banner image from : https://www.vectorstock.com/royalty-free-vector/svm-support-graph-statistic-vector-4724676

To further explore, there are many online resources to know about KKT Conditions, Kernal trick e.t.c

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Solomon

Passionate about Data Science and applying Machine Learning,Deep Learning algorithms