Overview of Support Vector Machines

Chew, Hong Gunn

19 November, 1998

Vapnik first introduced Support Vector Machines (SVMs) in 1979. However, it has only gained attention in the last few years. SVM has been shown to be suitable for pattern recognition.

Figure 1

The training of a classification SVM involves the fitting of a hyperplane such that the largest margin is formed between 2 classes of vectors while minimising the effects of classification errors (figure 1). This is solved by formulating it as a convex quadratic programming problem. Without going into the details, this gives the Lagrangian:

(1)

whereL_D is maximised with respect to

subject to

The parameter C is to be chosen by the user, where a larger C will assign a higher penalty to classification errors. The training vectors for which > 0 are termed the support vectors.

Thus to classify a vector x, we just calculate the sign of

(2)

where N_s is the number of support vectors
s_i is support vector i with class y_i
b can be found by choosing a support vector with

< C and using:

For completeness, the hyperplane is defined as:

(3)

where

The reason SVMs have been able to perform so well is in the addition of kernel functions to the above algorithm. Kernel functions map the input vectors to some higher dimensioned space such that a more suitable hyperplane can be found with minimal classification errors. The kernel function is defined as

(4)

where

maps the vector x to some other Euclidean space.

Since we can replace the dot product x_i·x_j with K(x_i,x_j) in the algorithm, there would never be a need to explicitly define what is. Thus the training Lagrangian becomes:

(5)

and the classification function becomes:

(6)

where b can be found by choosing a support vector with

< C and using:

The formulation of kernel functions is only restricted by the condition that the kernel functions must satisfy the Mercer’s Condition. It states that for a function g(x), where is finite, then .

Table 1 lists some of the common kernel functions.

Type	Kernel function
Polynomial of degree p
Radial basis function
2-layer sigmoidal neural network

Table 1

This provides an overview of SVMs. Burges’ (1998) tutorial gives a more in depth look at them.

References

Burges, CJC "A Tutorial on Support Vector Machines for Pattern Recognition" Data Mining and Knowledge Discovery, Vol 2 No 2, 1998.

Vapnik, V "Estimation of Dependences Based on Empirical Data" [in Russian] Moscow: Nauka, 1997. (English translation: New York: Springer Verlag, 1982)