So this post is not about some great technical material on any of the mentioned topics. I have been trying to use cvxopt to implement an SVM-type max-margin classifier for an unrelated problem on Reinforcement Learning. While doing that, I had trouble figuring out how to use the cvxopt library to correctly implement a quadratic programming solver for SVM. Since I eventually figured it out, I am just sharing that here. Let us get into it then.
We will generate linearly separable, 2-class data using 2-dimensional Gaussians. Below is the entire Python code to generate, visualize and save the data.
The code has comments and should be an easy read. It will generate a pickle file with the generated data and a plot that looks like below.
Now for the second part, let us look at the SVM formulation and the interface that CVXOPT provides. Below is the primal SVM objective.
$$\begin{eqnarray} \min_{w}\frac{1}{2}||w||^{2} \nonumber \\\ \textrm{s.t.}\quad y_{i}(w^{T}x_{i} + b) \ge 1 \quad \forall i \end{eqnarray}$$
And this is the corresponding dual problem.
$$\begin{eqnarray} \min_{\alpha}\frac{1}{2} \alpha^{T}K\alpha - 1^{T}\alpha \nonumber \\\ \textrm{s.t.}\quad \alpha_{i} \ge 0 \quad \forall i \\\ \textrm{and}\quad y^{T}\alpha = 0 \end{eqnarray}$$
This is all good. Now below is the interface that cvxopt provides. They have a QP solver and it can be called as cvxopt.solvers.qp(P, q[, G, h[, A, b[, solver[, initvals]]]])
. The problem that this solves is-
$$\begin{eqnarray} \min_{x}\frac{1}{2} x^{T}Px - q^{T}x \nonumber \\\ \textrm{s.t.}\quad Gx \preceq h \\\ \textrm{and}\quad Ax = b \end{eqnarray}$$
All we need to do is to map our formulation to the cvxopt interface. We are already almost there. \(\alpha\)s are the \(x\)s, \(K\) is the \(P\), \(q\) is a vector of ones, \(G\) will be an identity matrix with \(-1\)s as its diagonal so that our greater than is transformed into less than, \(h\) is vector of zeros, \(A\) is \(y^{T}\) and \(b\) is 0. That is all we need to do. So below is the code that does that given the training data x
and labels y
.
Note that sol['x']
contains the \(x\) that was part of cvxopt interface - it is our \(\alpha\). Now if you are familiar with SVMs, you will know that only a few of the alphas should be non-zero and they will be our support vectors. Using these alphas, we can obtain \(w\) and \(b\) from our original SVM problem. Once we do that, they will together define the decision boundary. \(w\) is equal to \(\Sigma_{i}\alpha_{i}y_{i}x_{i}\) and \(b\) is equal to \(y_{i} - w^{T}x_{i}\) for any \(i\) such that \(\alpha_{i} \gt 0\). So we do the following to obtain them.
Finally, we’ll plot the decision boundary for good visualizaiton. Since it will be a line in this case, we need to obtain the slope and intercept of the line from the weights and bias. Here is the code.
The entire code is on my github. Once we run it, we get the following final plot.
Looks pretty neat and is good for visualizing 2-D stuff.