Abstract
Maximal margin based frameworks have emerged as a powerful tool for supervised learning. The extension of these ideas to the unsupervised case, however, is problematic since the underlying optimization entails a discrete component. In this paper, we first study the computational complexity of maximal hard margin clustering and show that the hard margin clustering problem can be precisely solved in O(n d+2) time where n is the number of the data points and d is the dimensionality of the input data. However, since it is well known that many datasets commonly ‘express’ themselves primarily in far fewer dimensions, our interest is in evaluating if a careful use of dimensionality reduction can lead to practical and effective algorithms. We build upon these observations and propose a new algorithm that gradually increases the number of features used in the separation model in each iteration, and analyze the convergence properties of this scheme. We report on promising numerical experiments based on a ‘truncated’ version of this approach. Our experiments indicate that for a variety of datasets, good solutions equivalent to those from other existing techniques can be obtained in significantly less time.
Similar content being viewed by others
References
Vapnik V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Schölkopf B., Smola A.: Learning with Kernels Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2005)
Xu, L., Schuurmans, D.: Unsupervised and semi-supervised multi-class support vector machines. In: Proceedings of the 20th National Conference on Artificial Intelligence (AAAI) (2005)
Valizadegan, H., Jin, R.: Generalized maximum margin clustering and unsupervised kernel learning. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2006)
Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (1998)
Bie, T. D., Cristianini, N.: Convex methods for transduction. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2004)
De Bie T., Cristianini N.: Fast SDP relaxations of graph cut clustering, transduction, and other combinatorial problems. J. Mach. Learn. Res. 7, 1409–1436 (2006)
Vandenberghe L., Boyd S.: Semidefinite programming. SIAM Rev. 38, 49–95 (1996)
Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. In: International Conference on Machine learning (ICML), pp. 1119–1126 (2007)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2002)
Prokopyev O.A., Busygin S., Pardalos P.M.: An optimization based approach for data classification. Optim. Methods Softw. 2, 3–9 (2007)
Xu R., Wunsch D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–648 (2005)
Sherali H.D., Desai J.: A global optimization rlt-based approach for solving the hard clustering problem. J. Global Optim. 32, 281–306 (2005)
Sherali H.D., Desai J.: A global optimization rlt-based approach for solving the fuzzy clustering approach. J. Global Optim. 33, 597–615 (2005)
Butenko S., Chaovalitwongse W., Pardalos P.M.: Clustering Challenges in Biological Networks. World Scientific, Singapore (2009)
Bradley P.S., Mangasarian O.L.: k-plane clustering. J. Global Optim. 16, 23–32 (2000)
Du D., Jung Y., Park H., Drake B.L.: A decision criterion for the optimal number of clusters in hierarchical clustering. J. Global Optim. 25, 91–111 (2003)
Shi J., Malik J.: Normalized cut and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000)
Dhillon, I.S., Guan, Y., Kulis B.: Kernel k-means, spectral clustering and normalized cuts. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 551–556 (2004)
Chen, H., Peng, J.: 0–1 semidefinite programming for graph-cut clustering: modelling and approximation. In: Pardalos, P.M., Hansen, P. (eds.) Data Mining and Mathematical Programming. CRM Proceedings and Lecture Notes of the American Mathematical Society, pp. 15–40 (2008)
Johnson W. B., Lindenstrauss J.: Extensions of lipshitz mapping into hilbert space. Contemp. Math. 26, 189–206 (1984)
Musicant, D. R.: NDC: normally distributed clustered datasets (1998) http://www.cs.wisc.edu/dmi/svm/ndc/
Achlioptas, D.: Database-friendly random projections. In: ACM Symposium on Principles of Database Systems (PODS), pp. 274–281 (2001)
Graham, D. B., Allinson, N. M.: In: Face Recognition: From Theory to Applications, vol. 163, chapter Characterizing Virtual Eigensignatures for General Purpose Face Recognition. NATO ASI Series F, Computer and Systems Sciences, pp. 446–456 (1998)
Samaria, F., Harter, A.: Parameterisation of a stochastic model for human face identification. In: Proceedings of 2nd IEEE Workshop on Applications of Computer Vision (1994)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Peng, J., Mukherjee, L., Singh, V. et al. An efficient algorithm for maximal margin clustering. J Glob Optim 52, 123–137 (2012). https://doi.org/10.1007/s10898-011-9691-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-011-9691-4