Won’t implement one in raw code but it is very similar to a Logistic Regressor. However, the key difference is that an SVM tries to maximize the margin between the hyperplane (linear separator) and the datapoints that are nearest the hyperplane in each category.
The most common implementation of an SVM.
A way to train these is to use the Pegasos Algorithm (Primal Estimated Sub-Gradient Solver for SVM) - this was introduced in 2007 and uses an iterative stochastic sub-gradient descent approach to updating.
Essentially you start with zero-weights (similar to a perceptron) and randomly sample datapoints. At each datapoint:
$$ \theta_{t+1} = (1-\eta_t \lambda)\theta_t + \eta_ty_tx_t $$
$$ \theta_{t+1} = (1-\eta_t \lambda)\theta_t $$
This utilizes a “scaling down” factor $(1-\eta_t \lambda)$ that is applied to the weights at each step regardless of if the data is correct or not.
We typically use hinge loss as the loss function in SVMs.
SVMs can be leveraged for multi-class classification problems although they are inherently designed for binary classification. The method of extending these typically involves one of the following approaches: