Abstract
In this work we present a novel maximum-margin approach for multi-class Support Vector Machines based on second-order cone programming. The proposed method consists of a single optimization model to construct all classification functions, in which the number of second-order cone constraints corresponds to the number of classes. This is a key difference from traditional SVM, where the number of constraints is usually related to the number of training instances. This formulation is extended further to kernel-based classification, while the duality theory provides an interesting geometric interpretation: the method finds an equidistant point between a set of ellipsoids. Experiments on benchmark datasets demonstrate the virtues of our method in terms of predictive performance compared with various other multicategory SVM approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ñanculef R, Concha C, Allende H, Candel D, Moraga C (2009) Ad-svms: A light extension of svms for multicategory classification. Int J Hybrid Intell Syst 6(2):69–79
Ñanculef R, Frandi E, Sartori C, Allende H (2014) A novel frank-wolfe algorithm. analysis and applications to large-scale svm training. Inf Sci 285:66–99
Alizadeh F, Goldfarb D (2003) Second-order cone programming. Math Program 95:3–51
Alvarez F, López J, Ramírez CH (2010) Interior proximal algorithm with variable metric for second-order cone programming: applications to structural optimization and support vector machines. Optim Methods Softw 25 (6):859–881
Asuncion A, Newman D (2007) UCI machine learning repository. http://archive.ics.uci.edu/ml/
Attouch H (1996) Viscosity solutions of minimization problems. SIAM J Optim 6(3):769–806
Bertsekas D (1982) Constrained optimization and lagrange multiplier methods. Academic Press, New York
Bosch P, López J, Ramírez H, Robotham H (2013) Support vector machine under uncertainty: An application for hydroacoustic classification of fish-schools in chile. Expert Syst Appl 40(10):4029–4034
Bravo C, Thomas L, Weber R (2015) Improving credit scoring by differentiating defaulter behaviour. J Oper Res Soc 66:771–781
Bredensteiner EJ, Bennett KP (1999) Multicategory classification by support vector machines. Comput Optim Appl 12:53–79
Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23:219–239
Friedman J (1996) Another approach to polychotomous classification. Tech rep, Department of Statistics, Stanford University. http://www-stat.stanford.edu/~jhf/ftp/poly.ps.Z
Hao PY, Chiang JH, Lin YH (2009) A new maximal-margin spherical-structured multi-class support vector machine. Appl Intell 30:98–111
Hsu C, Lin C (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Netw 13(2):415–425
Hsu CW, Chang CC, Lin CJ (2010) A practical guide to support vector classification. Tech rep, Department of Computer Science, National Taiwan University
Jenssen R, Kloft M, Zien A, Sonnenburg S, Müller K (2012) A scatter-based prototype framework and multi-class extension of support vector machines. PLoS ONE 7(10):e42,947
Kressel UG (1999) Advances in kernel methods. MIT Press, Cambridge, pp 255–268. USA, chap Pairwise classification and support vector machines
Lanckriet G, Ghaoui L, Bhattacharyya C, Jordan M (2003) A robust minimax approach to classification. J Mach Learn Res 3:555–582
Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34:19–27
López J, Maldonado S (2015) Feature selection for multiclass support vector machines using second-order cone programming. Intelligent Data Analysis Accepted
Maldonado S, López J (2014a) Alternative second-order cone programming formulations for support vector classification. Inf Sci 268:328–341
Maldonado S, López J (2014b) Imbalanced data classification using second-order cone programming support vector machines. Pattern Recogn 47:2070–2079
Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using ensembles of one-class and two-class svms. Intelligent Data Analysis, Special Issue on Business Analytics and Intelligent Optimization 18:95–112
Mangasarian OL (1994) Nonlinear programming. Classics in Applied Mathematics, Society for Industrial and Applied Mathematics
Mercer J (1909) Functions of positive and negative type, and their connection with the theory of integral equations. Philos Trans R Soc Lond 209:415–446
Nath S, Bhattacharyya C (2007) Maximum margin classifiers with specified false positive and false negative error rates. In: Proceedings of the SIAM International Conference on Data mining
Rifkin R, Klautau A (2004) In defense of one-vs-all classification. J Mach Learn Res 5:101–141
Rockafellar RT (1997) Convex analysis. Princeton Landmarks in Mathematics. Princeton University Press, Princeton. reprint of the 1970 original, Princeton Paperbacks
Schölkopf B, Smola AJ (2002) Learning with Kernels. MIT Press
Sturm J (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones. Optim Methods Softw 11(12):625–653. special issue on Interior Point Methods (CD supplement with software)
Tikhonov A, Arsénine V (1976) Méthodes de resolution de problèmes mal posés. Éditions Mir, Moscow. french translation of the 1974 Russian edition
Vapnik V (1998) Statistical learning theory. John Wiley and Sons
Weston J, Watkins C (1999) Multi-class support vector machines. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks
Weston J, Elisseeff A, BakIr G, Sinz F (2005) The spider machine learning toolbox. http://www.kyb.tuebingen.mpg.de/bs/people/spider/
Yajima Y (2005) Linear programming approaches for multicategory support vector machines. Eur J Oper Res 162(2):514–531
Yang K, Cai Z, Li J, Lin G (2006) A stable gene selection in microarray data analysis. BMC Bioinformatics 7:228
Zhong P, Fukushima M (2007) Second-order cone programming formulations for robust multiclass classification. Neural Comput 19:258–282
Acknowledgments
The first author was supported by FONDECYT project 11110188 and by CONICYT Anillo ACT1106, the second one was funded by FONDECYT project 11121196, and third author was supported by FONDECYT project 1130905. The work reported in this paper has been partially funded by the Complex Engineering Systems Institute (ICM: P-05-004-F, CONICYT: FB016).
Author information
Authors and Affiliations
Corresponding author
Appendix : A: Dual formulation for multi-class SOCP-SVM
Appendix : A: Dual formulation for multi-class SOCP-SVM
1.1 A.1 Proof of Theorem 1
Proof
The Lagrangian function associated with Problem (P θ ) is given by:
Since the relationship ∥v∥= max∥u∥≤1v ⊤ u holds for any v ∈ ℜn, we can rewrite the Lagrangian as follows
where L 1 is given by
Thus, Problem (P θ ) can be written equivalently as
Hence, the dual problem (see e.g. [7, 24, 28]) of (P θ ) is given by
The expression (35) now enables us to eliminate the primal variables to give the dual. Computing the first order optimization condition of the internal minimum problem (35) yields
It follows from the previous (37) that α i = β for all i = 1,…,K, and therefore, replacing in (34) we get
Additionally, from (36) we get
Replacing (39) in (38) we obtain the reduced form
In order to use (39) to compute \(\textbf {Q}^{1/2}(\theta )\tilde {\textbf {w}}\), and then to obtain the dual problem, we distinguish two main cases: θ>0, in which we have strong convexity of the objective function and we obtain good mathematical properties as uniqueness of the optimal solution; and the case θ = 0, in which most of this mathematical structure is lost but we can still derive a dual problem for this alternative by using some properties of the matrix Q(0).
Case
θ>0: In this case we note first that (see [35, Proposition 3.4] for details)
then
Therefore, multiplying (39) by Q −1/2(θ) we obtain
Replacing this expression in the reduced Lagrangian function (40), we get
and therefore we obtain the following dual formulation
Note that, after some algebraic manipulation, we have
In consequence, Problem (42) can be written as
Case
θ = 0: We note the following property \(\textbf {Q}(0)=\sqrt {K}\textbf {Q}^{1/2}(0)\). Replacing this in (39) one has
Using this in (40) and proceeding in a similar way to the case θ>0, we obtain the same formulation in (44) with θ = 0.
1.2 A.2 Proof of Proposition 1
Proof
It is suffices to prove that the formulation (1) is equivalent to (17). We note that, given θ≥0, the objective function of the dual problem (17) is concave with respect to β therefore attains its maximum value at
also its optimal value is given by
Hence, replacing this in (17) and using the definition of E(μ,S,κ) we get that (17) is equivalent to problem:
turn the max term to min we obtain the result. □
1.3 A.3 Proof of Proposition 2
Proof
It is not difficult to see that
where we denote \( \widetilde {\textbf {z}}=[\textbf {z}_{1}^{\top },\textbf {z}_{2}^{\top },\ldots ,\textbf {z}_{K}^{\top }]^{\top }\in \Re ^{nK}. \) Therefore, from (39) the relation between w i and z i can be written equivalently as
Since Q(θ) is invertible, when θ>0, we have
where the last inequality follows from (41) and (47). Then, we conclude that w i can be written in terms of z i and p as
□
We note that in the case θ = 0,\(\tilde {\textbf {w}}\) and \(\tilde {\textbf {z}}\), are related by
but the uniqueness of solution is lost, because the matrix Q(0) is symmetric positive semi-definite.
1.4 A.4 Proof of Proposition 3
Proof
The KKT conditions of the dual formulation (1) can be summarized as follows
Since Σ i are symmetric positive definite matrices, and z i ≠p for all i = 1,…,K, we have that z i −p does not belong to the null space of \(S_{i}^{\top }\). Consequently, the Lagrangian multipliers γ i are strictly positive. This implies that ∥u i ∥=1 holds. In such situation, z i = μ i −κ i S i u i belong to the boundary of the Ellipsoid E(μ i ,S i ,κ i ).
Furthermore, by (37) and (45) we have that α i = β>0 for i = 1,…,K. This implies that the constraints in (P θ ) are active. Then, since \(\bar {\textbf {w}}=0\) (cf. Remark 3) and ∥u i ∥=1, we get from (10) that
We note at optimality \(S_{i}^{\top }(\textbf {z}_{i} - \textbf {p})\) is parallel to u i , for i = 1,…,K, thus, we have that \(\|\textbf {u}_{i}\|\|S_{i}^{\top }\textbf {w}_{i}\|=\textbf {u}_{i}^{\top } S_{i}^{\top } \textbf {w}_{i},\) obtaining from the above expression that
Then, we get the following conditions:
This geometrically means that the hyperplanes \(\textbf {w}_{i}^{\top } \textbf {x}+b_{i}=1\) are tangents to the ellipsoids E(μ i ,S i ,κ i ), for i = 1,…,K. Using the above relation and (45), one can compute the value of b i obtaining the result in (22).
Finally, as decision functions are given by \(f^{i}(\textbf {x}):=\textbf {w}_{i}^{\top }\textbf {x}+b_{i},\) the result in (23) is obtained by replacing the values of b i from (22) and w i from Proposition 2. □
Rights and permissions
About this article
Cite this article
López, J., Maldonado, S. & Carrasco, M. A novel multi-class SVM model using second-order cone constraints. Appl Intell 44, 457–469 (2016). https://doi.org/10.1007/s10489-015-0712-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0712-8