Support Vector Machine(SVM) — Dual Form
This article describes about arriving to dual form problem given the constrained optimization problem of SVM through Lagrangian function.
Why Support Vector Machine?
Consider a binary classification in 2D where we have positive and negative points, and we are supposed to identify a line to separate the positive and negative points. Though we identify one line, there will be many lines that are possible to classify the points, and the question is among all the possible lines which one is the best line, and the solution is given by Support Vector Machine and the answer is , the line which have Maximum Margin. The theory about how to derive SVM objective function and constraint is not described as we are going to focus only on dual formulation.
SVM selects the best line(hyperplane) such that if we get a new datapoint during test/prediction is very near to the classifying line it should be classified correctly, so a line having good margin between the two different data points.
To find such a hyperplane which has maximum margin, we need to solve the below SVM optimization problem with constraints.
which can be rewritten as
The idea to solve problem A is to convert the above to Lagrangian function.
One of the way to solve constraint optimisation problem is to use Lagrange Multiplier. The idea of that is to add one more variable(called Lagrange Multipier) for each constraint and solve it.
We may think that it will become more unsolvable problem when we add new variable, but the variable we add will basically connects the Constraints with the optimization function.
To get high level understanding about Lagrangian function, we can describe Lagrangian function as
L=(OPTIMIZATION_FUNCTION ) -for_each_constraint(VARIABLE *(CONSTRAINT))
The variable here is what we call LAGRANGE MULTIPLIER usually denoted by alpha α.
We take the optimization function and subtract it with sum of all the constraints multiplied by lagrangian multiplier. Each constraint will have its own Langrangian Multiplier.
L=(OPTIMIZATION_FUNCTION ) - Σ α *(CONSTRAINT)
In our case , The Lagrangian is defined as
Steps to solve
- Define the Langrangian function
- Solve the below set of equation
3. Solving the above equations will give value for w which can be substituted in Lagrangian Equation C
4. Substitute D in C
In C we have |w|² which is nothing but dot product of w and w i.e w.w
5. Rearranging the terms
Multiplying the second term with yi as well as with minus one
Bringing similar terms together and multiplying the minus value with ‘b’ as well.
The third term is Zero as per the equation E also the first two terms are of the form 1/2(something) - 1(something) = -1/2(something)
This is the dual form of SVM, we can observe there is dot product of xi and xj which is a similary value between feature i and feature j. If we solve the above optimization problem, we will get values for alpha, and alpha will be zero for other areas but have non zero for the support vector points.
So at this point we have alpha, x(feature) y(labels) and this can be plugged into equation D to get the weight vectors.
Advantages of Dual Problem
The advantage of dual problem is the existence of the similarity matrix of features represented as xi and xj which is useful in kernalization , helping the optimization problem easy to solve
To further explore, there are many online resources to know about KKT Conditions, Kernal trick e.t.c