A novel multi-class SVM model using second-order cone constraints | Applied Intelligence
Skip to main content

A novel multi-class SVM model using second-order cone constraints

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In this work we present a novel maximum-margin approach for multi-class Support Vector Machines based on second-order cone programming. The proposed method consists of a single optimization model to construct all classification functions, in which the number of second-order cone constraints corresponds to the number of classes. This is a key difference from traditional SVM, where the number of constraints is usually related to the number of training instances. This formulation is extended further to kernel-based classification, while the duality theory provides an interesting geometric interpretation: the method finds an equidistant point between a set of ellipsoids. Experiments on benchmark datasets demonstrate the virtues of our method in terms of predictive performance compared with various other multicategory SVM approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Ñanculef R, Concha C, Allende H, Candel D, Moraga C (2009) Ad-svms: A light extension of svms for multicategory classification. Int J Hybrid Intell Syst 6(2):69–79

    MATH  Google Scholar 

  2. Ñanculef R, Frandi E, Sartori C, Allende H (2014) A novel frank-wolfe algorithm. analysis and applications to large-scale svm training. Inf Sci 285:66–99

    Article  Google Scholar 

  3. Alizadeh F, Goldfarb D (2003) Second-order cone programming. Math Program 95:3–51

    Article  MathSciNet  MATH  Google Scholar 

  4. Alvarez F, López J, Ramírez CH (2010) Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim Methods Softw 25 (6):859–881

    Article  MathSciNet  MATH  Google Scholar 

  5. Asuncion A, Newman D (2007) UCI machine learning repository. http://archive.ics.uci.edu/ml/

  6. Attouch H (1996) Viscosity solutions of minimization problems. SIAM J Optim 6(3):769–806

    Article  MathSciNet  MATH  Google Scholar 

  7. Bertsekas D (1982) Constrained optimization and lagrange multiplier methods. Academic Press, New York

    MATH  Google Scholar 

  8. Bosch P, López J, Ramírez H, Robotham H (2013) Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in chile. Expert Syst Appl 40(10):4029–4034

    Article  Google Scholar 

  9. Bravo C, Thomas L, Weber R (2015) Improving credit scoring by differentiating defaulter behaviour. J Oper Res Soc 66:771–781

    Article  Google Scholar 

  10. Bredensteiner EJ, Bennett KP (1999) Multicategory classification by support vector machines. Comput Optim Appl 12:53–79

    Article  MathSciNet  MATH  Google Scholar 

  11. Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23:219–239

    Article  MATH  Google Scholar 

  12. Friedman J (1996) Another approach to polychotomous classification. Tech rep, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z

  13. Hao PY, Chiang JH, Lin YH (2009) A new maximal-margin spherical-structured multi-class support vector machine. Appl Intell 30:98–111

    Article  Google Scholar 

  14. Hsu C, Lin C (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425

    Article  Google Scholar 

  15. Hsu CW, Chang CC, Lin CJ (2010) A practical guide to support vector classification. Tech rep, Department of Computer Science, National Taiwan University

  16. Jenssen R, Kloft M, Zien A, Sonnenburg S, Müller K (2012) A scatter-based prototype framework and multi-class extension of support vector machines. PLoS ONE 7(10):e42,947

    Article  Google Scholar 

  17. Kressel UG (1999) Advances in kernel methods. MIT Press, Cambridge, pp 255–268. USA, chap Pairwise classification and support vector machines

    Google Scholar 

  18. Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582

    MathSciNet  MATH  Google Scholar 

  19. Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34:19–27

    Article  MATH  Google Scholar 

  20. López J, Maldonado S (2015) Feature selection for multiclass support vector machines using second-order cone programming. Intelligent Data Analysis Accepted

  21. Maldonado S, López J (2014a) Alternative second-order cone programming formulations for support vector classification. Inf Sci 268:328–341

    Article  Google Scholar 

  22. Maldonado S, López J (2014b) Imbalanced data classification using second-order cone programming support vector machines. Pattern Recogn 47:2070–2079

    Article  Google Scholar 

  23. Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using ensembles of one-class and two-class svms. Intelligent Data Analysis, Special Issue on Business Analytics and Intelligent Optimization 18:95–112

    Google Scholar 

  24. Mangasarian OL (1994) Nonlinear programming. Classics in Applied Mathematics, Society for Industrial and Applied Mathematics

  25. Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446

    Article  MATH  Google Scholar 

  26. Nath S, Bhattacharyya C (2007) Maximum margin classifiers with specified false positive and false negative error rates. In: Proceedings of the SIAM International Conference on Data mining

  27. Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141

    MathSciNet  MATH  Google Scholar 

  28. Rockafellar RT (1997) Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton. reprint of the 1970 original, Princeton Paperbacks

    Google Scholar 

  29. Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press

  30. Sturm J (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(12):625–653. special issue on Interior Point Methods (CD supplement with software)

    Article  MathSciNet  Google Scholar 

  31. Tikhonov A, Arsénine V (1976) Méthodes de resolution de problèmes mal posés. Éditions Mir, Moscow. french translation of the 1974 Russian edition

  32. Vapnik V (1998) Statistical learning theory. John Wiley and Sons

  33. Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks

  34. Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. http://www.kyb.tuebingen.mpg.de/bs/people/spider/

  35. Yajima Y (2005) Linear programming approaches for multicategory support vector machines. Eur J Oper Res 162(2):514–531

    Article  MathSciNet  MATH  Google Scholar 

  36. Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinformatics 7:228

    Article  Google Scholar 

  37. Zhong P, Fukushima M (2007) Second-order cone programming formulations for robust multiclass classification. Neural Comput 19:258–282

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The first author was supported by FONDECYT project 11110188 and by CONICYT Anillo ACT1106, the second one was funded by FONDECYT project 11121196, and third author was supported by FONDECYT project 1130905. The work reported in this paper has been partially funded by the Complex Engineering Systems Institute (ICM: P-05-004-F, CONICYT: FB016).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastián Maldonado.

Appendix : A: Dual formulation for multi-class SOCP-SVM

Appendix : A: Dual formulation for multi-class SOCP-SVM

1.1 A.1 Proof of Theorem 1

Proof

The Lagrangian function associated with Problem (P θ ) is given by:

$$\begin{array}{@{}rcl@{}} L(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta) &=& \frac{1}{2}\|\textbf{Q}^{1/2}(\theta )\tilde{\textbf{w}}\|^{2} + \sum\limits_{i=1}^{K} \alpha_{i}(\kappa_{i}\|S_{i}^{\top} H^{i}\tilde{\textbf{w}}\| \\&&- (H^{i}\tilde{\textbf{w}})^{\top} {\boldsymbol\mu}_{i}-(\textbf{d}^{i})^{\top} \textbf{b})\\ &&+\sum\limits_{i=1}^{K} \alpha_{i}+\beta(\textbf{e}^{\top} \textbf{b}). \end{array} $$

Since the relationship ∥v∥= max∥u∥≤1v u holds for any v ∈ ℜn, we can rewrite the Lagrangian as follows

$$\begin{array}{@{}rcl@{}} L(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta)&=&\max\limits_{\textbf{u}^{i}}\{L_{1}(\tilde{\textbf{w}},\textbf{b}, \alpha_{i},\beta,\textbf{u}^{i}):\|\textbf{u}^{i}\|\\ &&\qquad\le 1, i=1,\ldots,K\}, \end{array} $$

where L 1 is given by

$$\begin{array}{@{}rcl@{}} &&L_{1}(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta,\textbf{u}^{i})\\&=& \frac{1}{2}\|\textbf{Q}^{1/2}(\theta )\tilde{\textbf{w}}\|^{2}\\ && + \sum\limits_{i=1}^{K} \alpha_{i}\left( \kappa_{i}(S_{i}^{\top} H^{i}\tilde{\textbf{w}})^{\top} \textbf{u}^{i}- (H^{i}\tilde{\textbf{w}})^{\top} {\boldsymbol\mu}_{i}\right)\\ &&+\sum\limits_{i=1}^{K} \alpha_{i}(1-(\textbf{d}^{i})^{\top} \textbf{b})+\beta(\textbf{e}^{\top} \textbf{b}). \end{array} $$
(34)

Thus, Problem (P θ ) can be written equivalently as

$$\min\limits_{\tilde{\textbf{w}},\textbf{b}} \max\limits_{\alpha_{i},\beta,\textbf{u}^{i}}\!\{L_{1}\!(\tilde{\textbf{w}},\textbf{b}, \alpha_{i},\beta,\textbf{u}^{i})\!\!:\!\!\|\textbf{u}^{i}\|\!\le\!1,\!\ \alpha_{i}\!\ge\!0,\! i\,=\,1,\ldots,\!K\}. $$

Hence, the dual problem (see e.g. [7, 24, 28]) of (P θ ) is given by

$$\begin{array}{@{}rcl@{}} &&\max\limits_{\alpha_{i},\beta,\textbf{u}^{i}}\min\limits_{\tilde{\textbf{w}},\textbf{b}} \{L_{1}(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta,\textbf{u}^{i}):\|\textbf{u}^{i}\|\le1,\ \alpha_{i}\\ &&\ge0,\ i=1,\ldots,K\}. \end{array} $$
(35)

The expression (35) now enables us to eliminate the primal variables to give the dual. Computing the first order optimization condition of the internal minimum problem (35) yields

$$\begin{array}{@{}rcl@{}} \nabla_{\tilde{\textbf{w}}} L_{1}&=&\textbf{Q}(\theta )\tilde{\textbf{w}} + \sum\limits_{i=1}^{K} \alpha_{i}\left( \kappa_{i}{H^{i}}^{\top} S_{i} \textbf{u}^{i}- {H^{i}}^{\top} {\boldsymbol\mu}_{i}\right)\\&=&0, \end{array} $$
(36)
$$\begin{array}{@{}rcl@{}} \nabla_{\textbf{b}} L_{1} &=& -\sum\limits_{i=1}^{K} \alpha_{i}\textbf{d}^{i}+\beta\textbf{e}=0. \end{array} $$
(37)

It follows from the previous (37) that α i = β for all i = 1,…,K, and therefore, replacing in (34) we get

$$\begin{array}{@{}rcl@{}} L_{1}(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta,\textbf{u}^{i})&=& \frac{1}{2}\|\textbf{Q}^{1/2}(\theta) \tilde{\textbf{w}}\|^{2} \\ &&+ \beta\sum\limits_{i=1}^{K} \tilde{\textbf{w}}^{\top} {H^{i\top}} (\kappa_{i}S_{i}^{\top} \textbf{u}^{i}- {\boldsymbol\mu}_{i})\\ &&+K\beta. \end{array} $$
(38)

Additionally, from (36) we get

$$ \textbf{Q}(\theta)\tilde{\textbf{w}}=\beta{\sum}_{i=1}^{K} {H^{i}}^{\top}({\boldsymbol\mu}_{i}-\kappa_{i}S_{i} \textbf{u}^{i}). $$
(39)

Replacing (39) in (38) we obtain the reduced form

$$ L_{1}(\tilde{\textbf{w}},\textbf{b},\alpha_{i},\beta,\textbf{u}^{i})=-\frac{1}{2} \|\textbf{Q}^{1/2}(\theta)\tilde{\textbf{w}}\|^{2}+K\beta. $$
(40)

In order to use (39) to compute \(\textbf {Q}^{1/2}(\theta )\tilde {\textbf {w}}\), and then to obtain the dual problem, we distinguish two main cases: θ>0, in which we have strong convexity of the objective function and we obtain good mathematical properties as uniqueness of the optimal solution; and the case θ = 0, in which most of this mathematical structure is lost but we can still derive a dual problem for this alternative by using some properties of the matrix Q(0).

Case

θ>0: In this case we note first that (see [35, Proposition 3.4] for details)

$$\textbf{Q}^{-1/2}(\theta)=\frac{1}{\sqrt{K+\theta}}I_{nK}+\frac{\sqrt{K+\theta}-\sqrt{\theta}}{K\sqrt{\theta} \sqrt{K+\theta}}{\mathcal J}, $$

then

$$ {\textbf{Q}^{-1/2}}(\theta){H^{i}}^{\top}=\frac{1}{\sqrt{K+\theta}}{H^{i}}^{\top}. $$
(41)

Therefore, multiplying (39) by Q −1/2(θ) we obtain

$$\textbf{Q}^{1/2}(\theta)\tilde{\textbf{w}}=\frac{\beta}{\sqrt{K+\theta}}\sum\limits_{i=1}^{K} {H^{i}}^{\top} \textbf{z}_{i},\text{ where }\ \textbf{z}_{i}={\boldsymbol\mu}_{i}-\kappa_{i}S_{i} \textbf{u}^{i}. $$

Replacing this expression in the reduced Lagrangian function (40), we get

$$L_{1}= K \beta-\frac{\beta^{2}}{2(K+\theta)}\|\sum\limits_{i=1}^{K} {H^{i}}^{\top}\textbf{z}_{i}\|^{2}, $$

and therefore we obtain the following dual formulation

$$ \begin{array}{rl} \max\limits_{\textbf{u}^{i}, \beta}&\ K \beta- \frac{\beta^{2}}{2(K+\theta)}\|\sum\limits_{i=1}^{K} {H^{i}}^{\top}\textbf{z}_{i}\|^{2}\\ \text{s.t.}&\ \textbf{z}_{i}={\boldsymbol\mu}_{i}-\kappa_{i}S_{i} \textbf{u}^{i},\\ & \|\textbf{u}^{i}\|\le1,\ i=1,\ldots,K,\\ &\ \beta\ge0. \end{array} $$
(42)

Note that, after some algebraic manipulation, we have

$$ \|\sum\limits_{i=1}^{K} {H^{i\top}}\textbf{z}_{i}\|^{2}=\sum\limits_{i=1}^{K}\|\textbf{z}_{i}-\textbf{p}\|^{2}, \quad\text{ with }\quad \textbf{p}=\frac{1}{K}\sum\limits_{i=1}^{K}\textbf{z}_{i}. $$
(43)

In consequence, Problem (42) can be written as

$$ \begin{array}{rl} \max\limits_{\textbf{z}_{i},\textbf{u}^{i},\beta}&\ K \beta- \frac{\beta^{2}}{2(K+\theta)}\sum\limits_{i=1}^{K}\|\textbf{z}_{i}-\textbf{p}\|^{2}\\ \text{s.t.}&\ \textbf{z}_{i}={\boldsymbol\mu}_{i}-\kappa_{i}S_{i} \textbf{u}^{i},\\ &\ \|\textbf{u}^{i}\|\le1,\ i=1,\ldots,K,\\ & \ \textbf{p}=\frac{1}{K}\sum\limits_{i=1}^{K}\textbf{z}_{i},\\ &\ \beta\ge0. \end{array} $$
(44)

Case

θ = 0: We note the following property \(\textbf {Q}(0)=\sqrt {K}\textbf {Q}^{1/2}(0)\). Replacing this in (39) one has

$$\textbf{Q}^{1/2}(0)\tilde{\textbf{w}}=\frac{\beta}{\sqrt{K}}\sum\limits_{i=1}^{K} {H^{i}}^{\top} ({\boldsymbol\mu}_{i}-\kappa_{i}S_{i} \textbf{u}^{i}). $$

Using this in (40) and proceeding in a similar way to the case θ>0, we obtain the same formulation in (44) with θ = 0.

1.2 A.2 Proof of Proposition 1

Proof

It is suffices to prove that the formulation (1) is equivalent to (17). We note that, given θ≥0, the objective function of the dual problem (17) is concave with respect to β therefore attains its maximum value at

$$ \beta=\frac{K(K+\theta)}{{\sum}_{i=1}^{K}\|\textbf{z}_{i}-\textbf{p}\|^{2}}, $$
(45)

also its optimal value is given by

$$\frac{K^{2}(K+\theta)}{2{\sum}_{i=1}^{K}\|\textbf{z}_{i}-\textbf{p}\|^{2}}. $$

Hence, replacing this in (17) and using the definition of E(μ,S,κ) we get that (17) is equivalent to problem:

$$ \begin{array}{rl} \max\limits_{\textbf{z}_{i},\textbf{u}^{i}} & \ \frac{K^{2}(K+\theta)}{2{\sum}_{i=1}^{K}\|\textbf{z}_{i}-\textbf{p}\|^{2}}\\ \text{s.t.}&\ \ \textbf{z}_{i}\in \textbf{E}({\boldsymbol\mu}_{i},S_{i},\kappa_{i}),\quad i=1,\ldots,K,\\ & \ \ \textbf{p}=\frac{1}{K}\sum\limits_{i=1}^{K}\textbf{z}_{i}, \ \end{array} $$
(46)

turn the max term to min we obtain the result. □

1.3 A.3 Proof of Proposition 2

Proof

It is not difficult to see that

$$ \sum\limits_{i=1}^{K} {H^{i}}^{\top}\textbf{z}_{i}=\frac{1}{K}\textbf{Q}(0)\tilde{\textbf{z}}, $$
(47)

where we denote \( \widetilde {\textbf {z}}=[\textbf {z}_{1}^{\top },\textbf {z}_{2}^{\top },\ldots ,\textbf {z}_{K}^{\top }]^{\top }\in \Re ^{nK}. \) Therefore, from (39) the relation between w i and z i can be written equivalently as

$$\textbf{Q}(\theta)\tilde{\textbf{w}}=\frac{\beta}{K}\textbf{Q}(0)\tilde{\textbf{z}}. $$

Since Q(θ) is invertible, when θ>0, we have

$$\tilde{\textbf{w}}=\frac{\beta}{K} \textbf{Q}(\theta)^{-1}\textbf{Q}(0)\tilde{\textbf{z}}=\frac{\beta}{K(K+\theta)} \textbf{Q}(0) \tilde{\textbf{z}}. $$

where the last inequality follows from (41) and (47). Then, we conclude that w i can be written in terms of z i and p as

$$\textbf{w}_{i}=\frac{\beta}{(K+\theta)}(\textbf{z}_{i}-\textbf{p})=\frac{K}{{\sum}_{i=1}^{K} \|\textbf{z}_{i}-\textbf{p}\|^{2}}(\textbf{z}_{i}-\textbf{p}), $$
$$i=1,\ldots,K. $$

We note that in the case θ = 0,\(\tilde {\textbf {w}}\) and \(\tilde {\textbf {z}}\), are related by

$$\textbf{Q}(0)\tilde{\textbf{w}}=\frac{\beta}{K} \textbf{Q}(0)\tilde{\textbf{z}}, $$

but the uniqueness of solution is lost, because the matrix Q(0) is symmetric positive semi-definite.

1.4 A.4 Proof of Proposition 3

Proof

The KKT conditions of the dual formulation (1) can be summarized as follows

$$\begin{array}{@{}rcl@{}} -\frac{4\kappa_{i}}{K^{2}(K+\theta)}\, S_{i}^{\top}(\textbf{z}_{i} - \textbf{p})+\gamma_{i}\textbf{u}_{i}&=&0,\\ \gamma_{i}(\|\textbf{u}_{i}\|-1)&=&0, \end{array} $$
(48)
$$\begin{array}{@{}rcl@{}} {\sum}_{i}(\textbf{z}_{i} - \textbf{p}) &=&0,\\ \|\textbf{u}_{i}\|\le1,\ \gamma_{i}&\ge&0. \end{array} $$
(49)

Since Σ i are symmetric positive definite matrices, and z i p for all i = 1,…,K, we have that z i p does not belong to the null space of \(S_{i}^{\top }\). Consequently, the Lagrangian multipliers γ i are strictly positive. This implies that ∥u i ∥=1 holds. In such situation, z i = μ i κ i S i u i belong to the boundary of the Ellipsoid E(μ i ,S i ,κ i ).

Furthermore, by (37) and (45) we have that α i = β>0 for i = 1,…,K. This implies that the constraints in (P θ ) are active. Then, since \(\bar {\textbf {w}}=0\) (cf. Remark 3) and ∥u i ∥=1, we get from (10) that

$$\kappa_{i}\|u_{i}\|\|S_{i}^{\top}\textbf{w}_{i}\|= \textbf{w}_{i}^{\top} {\boldsymbol\mu}_{i}+{b}_{i}-1 \quad i=1,\ldots,K. $$

We note at optimality \(S_{i}^{\top }(\textbf {z}_{i} - \textbf {p})\) is parallel to u i , for i = 1,…,K, thus, we have that \(\|\textbf {u}_{i}\|\|S_{i}^{\top }\textbf {w}_{i}\|=\textbf {u}_{i}^{\top } S_{i}^{\top } \textbf {w}_{i},\) obtaining from the above expression that

$$0=\textbf{w}_{i}^{\top} ({\boldsymbol\mu}_{i}-\kappa_{i}S_{i}\textbf{u}_{i})+{b}_{i}-1 \quad i=1,\ldots,K. $$

Then, we get the following conditions:

$$\textbf{w}_{i}^{\top} \textbf{z}_{i}+b_{i}=1,\quad \text{ for \(i=1,\ldots,K\)}. $$

This geometrically means that the hyperplanes \(\textbf {w}_{i}^{\top } \textbf {x}+b_{i}=1\) are tangents to the ellipsoids E(μ i ,S i ,κ i ), for i = 1,…,K. Using the above relation and (45), one can compute the value of b i obtaining the result in (22).

Finally, as decision functions are given by \(f^{i}(\textbf {x}):=\textbf {w}_{i}^{\top }\textbf {x}+b_{i},\) the result in (23) is obtained by replacing the values of b i from (22) and w i from Proposition 2. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

López, J., Maldonado, S. & Carrasco, M. A novel multi-class SVM model using second-order cone constraints. Appl Intell 44, 457–469 (2016). https://doi.org/10.1007/s10489-015-0712-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-015-0712-8

Keywords