Abstract
We face the problem of strictly separating two sets of points by means of a sphere, considering the two cases where the center of the sphere is fixed or free, respectively. In particular, for the former we present a fast and simple solution algorithm, whereas for the latter one we use the DC-Algorithm based on a DC decomposition of the error function. Numerical results for both the cases are presented on several classical binary datasets drawn from the literature.
Similar content being viewed by others
References
An, L.T.H., Tao, P.D.: The DC (Difference of Convex Functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133, 23–46 (2005)
Astorino, A., Fuduli, A.: Nonsmooth optimization techniques for semisupervised classification. IEEE Trans. Pattern Anal. Mach. Intell. 29, 2135–2142 (2007)
Astorino, A., Fuduli, A., Gaudioso, M.: DC models for spherical separation. J. Glob. Optim. 48, 657–669 (2010)
Astorino, A., Gaudioso, M.: Polyhedral separability through successive LP. J. Optim. Theory Appl. 112, 265–293 (2002)
Astorino, A., Gaudioso, M.: Ellipsoidal separation for classification problems. Optim. Methods Softw. 20, 261–270 (2005)
Astorino, A., Gaudioso, M.: A fixed-center spherical separation algorithm with kernel transformations for classification problems. Comput. Manag. Sci. 6, 357–372 (2009)
Bagirov, A.M.: Max-min separability. Optim. Methods Softw. 20, 271–290 (2005)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011). Software available at www.csie.ntu.edu.tw/~cjlin/libsvm
Chapelle, O., Zien, A.: Semi-supervised classification by low density separation. In: Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pp. 57–64 (2005)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Fan, R.-E., Chen, P.-H., Lin, C.-J.: Working set selection using the second order information for training SVM. J. Mach. Learn. Res. 6, 1889–1918 (2005)
Fuduli, A., Gaudioso, M., Giallombardo, G.: Minimizing nonconvex nonsmooth functions via cutting planes and proximity control. SIAM J. Optim. 14, 743–756 (2004)
Gonzalez-Lima, M., Hager, W., Zhang, H.: An affine-scaling interior-point method for continuous knapsack constraints with application to support vector machines. SIAM J. Optim. 21, 361–390 (2011)
Hao, P.Y., Chiang, J.H., Lin, Y.H.: A new maximal-margin spherical-structured multi-class support vector machine. Appl. Intell. 30, 98–111 (2009)
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods—Support Vector Learning. MIT Press, Cambridge (1999)
Konno, H., Gotoh, J., Uryasev, S.: Failure discrimination by semi-definite programming. In: Pardalos, P. (ed.) Financial Engineering, Electronic Commerce and Supply Chain, pp. 379–396. Kluwer Academic Publishers, Dordrecht (2002)
Murphy, P.M., Aha, D.W.: UCI repository of machine learning databases. www.ics.uci.edu/~mlearn/MLRepository.html (1992)
Palagi, L., Sciandrone, M.: On the convergence of a modified version of the SVMlight algorithm. Optim. Methods Softw. 20, 315–332 (2005)
Odewahn, S., Stockwell, E., Pennington, R., Humphreys, R., Zumach, W.: Automated star/galaxy discrimination with neural networks. Astron. J. 103, 318–331 (1992)
Schölkopf, B., Burges, C.J.C., Smola, A.J.: Advances in Kernel Methods. Support Vector Learning. MIT Press, Cambridge (1999)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Tao, P.D., An, L.T.H.: A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J. Control Optim. 8, 476–505 (1998)
Tax, D.M.J., Duin, R.P.W.: Support vector domain description. Pattern Recognit. Lett. 20, 1191–1199 (1999)
Tax, D.M.J., Duin, R.P.W.: Uniform object generation for optimizing one-class classifiers. J. Mach. Learn. Res. 2, 155–173 (2001)
Vapnik, V.: The Nature of the Statistical Learning Theory. Springer, New York (1995)
Wang, J., Neskovic, P., Cooper, L.N.: Pattern Classification via Single Spheres. Lecture Notes in Artificial Intelligence, vol. 3735, pp. 241–252 (2005)
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section we report the proofs of Theorems 2.1, 2.2 and 2.3.
Theorem 2.1
Let S⊆ℝn be a nonempty compact set. If x 0∈S and C≥1/(2k), then there exists an optimal solution to problem (5).
Proof
If x 0∈S, then
Taking into account the definition of h, the constraint z≥q and inequality (18), we obtain
where \(D\stackrel {\triangle }{=}\sum_{l=1}^{k} D_{l}\).
From the above inequality, if \(C\geq\frac{1}{2k}\), the thesis follows by taking into account that h is coercive on the set \(\varOmega \stackrel {\triangle }{=}\{ (x_{0},z,q)~|~x_{0}\in S \mbox{ and } 0\leq q\leq z \}\). □
Theorem 2.2
There exists an optimal solution for problem (6) with z>0.
Proof
Let \((x_{0}^{*},z^{*},q^{*},\xi^{*},\mu^{*})\) be any optimal solution for problem (6) with z ∗=0. As a consequence q ∗=0, \(\xi^{*}_{i}=\|a_{i}-x_{0}^{*}\|^{2}\) for i=1,…,m and \(\mu_{l}^{*}=0\) for l=1,…,k.
We define the sets
which cannot be simultaneously empty, by the assumption that the sets \(\mathcal {A}\) and \(\mathcal {B}\) are disjoint.
Now we consider the solution
with
and the components of the vectors \(\bar{\xi}\) and \(\bar{\mu}\) defined as follows:
or
according to the three possible cases:
-
1.
(i) \(x_{0}^{*}\neq a_{i}\) for i=1,…,m and \(x_{0}^{*}\neq b_{l}\) for l=1,…,k;
-
2.
(ii) \(x_{0}^{*}= a_{j}\) for some j∈{1,…,m};
-
3.
(iii) \(x_{0}^{*}= b_{j}\) for some j∈{1,…,k}.
It is easy to show that the above solution (20), characterized by \(\bar{z}>0\), is feasible for problem (6), with an objective function value which is not worse than that one corresponding to solution \((x_{0}^{*},z^{*},q^{*},\xi^{*},\mu^{*})\). □
Theorem 2.3
Let \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\) be any optimal solution of problem (6), with C>1. Then the sphere \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), strictly separates the sets \(\mathcal {A}\) and \(\mathcal {B}\) if and only if q ∗>0.
Proof
(⇒) Assume \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), strictly separates the sets \(\mathcal {A}\) and \(\mathcal {B}\), and let q ∗=0. Thus we have
and, consequently,
Note that the optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\), together with (21), implies ξ ∗=0 and μ ∗=0 and the optimal objective function value equal to zero.
Define now a feasible solution \((\bar{x}_{0}, \bar{z}, \bar{q}, \bar{\xi}, \bar{\mu})\), to problem (6) as follows:
It is characterized by an objective function value equal to −ϵ<0, which contradicts optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\).
(⇐) Now suppose q ∗>0, and assume by contradiction that the sphere \(S(x_{0}^{*},R^{*})\), with \(R^{*}=\sqrt{z^{*}}\), does not strictly separate \(\mathcal {A}\) and \(\mathcal {B}\). Then at least one of the two possible cases occur:
-
1.
there exists \(a_{j}\in \mathcal {A}\) such that \(\|a_{j}-x_{0}^{*}\|^{2}\geq z^{*}\), which implies
$$ \xi_j^*\geq q^*-z^* + \bigl\|a_j-x_0^* \bigr\|^2 \geq q^*>0; $$(22) -
2.
there exists \(b_{j}\in \mathcal {B}\) such that \(\|b_{j}-x_{0}^{*}\|^{2}\leq z^{*}\), which implies
$$ \mu_j^*\geq q^*+z^* - \bigl\|b_j-x_0^* \bigr\|^2 \geq q^*>0. $$(23)
Case 1. The value of the objective function, in correspondence to the optimal solution is
Now we consider the solution \((\bar{x}_{0}, \bar{z}, \bar{q}, \bar{\xi}, \bar{\mu})\) defined as follows:
which can be easily proved to be feasible to problem (6), taking into account (22). Moreover, the corresponding objective function value is
If C>1 then
which contradicts the optimality of \((x_{0}^{*}, z^{*}, q^{*}, \xi^{*}, \mu^{*})\).
Case 2. Analogous considerations hold. □
Rights and permissions
About this article
Cite this article
Astorino, A., Fuduli, A. & Gaudioso, M. Margin maximization in spherical separation. Comput Optim Appl 53, 301–322 (2012). https://doi.org/10.1007/s10589-012-9486-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-012-9486-7