Abstract
Clustering is an automated and powerful technique for data analysis. It aims to divide a given set of data points into clusters which are homogeneous and/or well separated. A major challenge with clustering is to define an appropriate clustering criterion that can express a good separation of data into homogeneous groups such that the obtained clustering solution is meaningful and useful to the user. To circumvent this issue, it is suggested that the domain expert could provide background information about the dataset, which can be incorporated by a clustering algorithm in order to improve the solution. Performing clustering under this assumption is known as semi-supervised clustering. This work explores semi-supervised clustering through the k-medoids model. Results obtained by a Variable Neighborhood Search (VNS) heuristic show that the k-medoids model presents classification accuracy compared to the traditional k-means approach. Furthermore, the model demonstrates high flexibility and performance by combining kernel projections with pairwise constraints.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79(1–3), 191–215 (1997)
Delattre, M., Hansen, P.: Bicriterion cluster analysis. IEEE TPAMI 4, 277–291 (1980)
Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications. Chapman & Hall/CRC, Boca Raton (2013)
Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2008)
Hansen, P., Mladenović, N.: Variable neighborhood search: principles and applications. EJOR 130(3), 449–467 (2001)
Wagstaff, K., Cardie, C., Rogers, S., Schroedl, S.: Constrained k-means clustering with background knowledge. In: ICML, pp. 577–584 (2001)
Bekkerman, R., Sahami, M.: Semi-supervised clustering using combinatorial MRFs. In: ICML (2006)
Yan, B., Domeniconi, C.: An adaptive kernel method for semi-supervised clustering. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 521–532. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_49
Law, M.H.C., Topchy, A., Jain, A.K.: Model-based clustering with probabilistic constraints. In: SIAM-SDM, pp. 641–645 (2005)
Ruiz, C., Spiliopoulou, M., Menasalvas, E.: Density-based semi-supervised clustering. Data Min. Knowl. Disc. 21(3), 345–370 (2010)
Xia, Y.: A global optimization method for semi-supervised clustering. Data Min. Knowl. Disc. 18(2), 214–256 (2009)
Tuy, H.: Concave programming under linear constraints. Soviet Math. 5, 1437–1440 (1964)
Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel approach. Mach. Learn. 74(1), 1–22 (2009)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comp. 10(5), 1299–1319 (1998)
Christofides, N.: Graph Theory: An Algorithmic Approach (Computer Science and Applied Mathematics). Academic Press Inc., Orlando (1975)
Steinley, D.: K-medoids and other criteria for crisp clustering. In: Handbook of Cluster Analysis. Chapman and Hall/CRC Handbooks of Modern Statistical Methods. CRC Press (2015)
Kaufman, L., Rousseeuw, P.J.: Partitioning around medoids (program PAM). In: Finding Groups in Data: An Introduction to Cluster Analysis, pp. 68–125 (1990)
Teitz, M.B., Bart, P.: Heuristic methods for estimating the generalized vertex median of aweighted graph. Oper. Res. 16(5), 955–961 (1968)
Whitaker, R.: A fast algorithm for the greedy interchange for large-scale clustering and median location problems. INFOR 21(2), 95–108 (1983)
Hansen, P., Mladenović, N.: Variable neighborhood search for the P-median. Locat. Sci. 5(4), 207–226 (1997)
Resende, M.G.C., Werneck, R.F.: A fast swap-based local search procedure for location problems. ANOR 150(1), 205–230 (2007)
Davidson, I., Ravi, S.: Clustering with constraints: feasibility issues and the k-means algorithm. In: SIAM-SDM, pp. 138–149 (2005)
Mladenović, N., Hansen, P.: Variable neighborhood search. Comput. Oper. Res. 24(11), 1097–1100 (1997)
Costa, L.R., Aloise, D., Mladenović, N.: Less is more: basic variable neighborhood search heuristic for balanced minimum sum-of-squares clustering. Inf. Sci. 415, 247–253 (2017)
Hansen, P., Ruiz, M., Aloise, D.: A VNS heuristic for escaping local extrema entrapment in normalized cut clustering. Pattern Recog. 45(12), 4337–4345 (2012)
Hansen, P., Mladenović, N.: J-means: a new local search heuristic for minimum sum of squares clustering. Pattern Recog. 34(2), 405–413 (2001)
Santi, É., Aloise, D., Blanchard, S.J.: A model for clustering data from heterogeneous dissimilarities. EJOR 253(3), 659–672 (2016)
Kleinberg, J.: An impossibility theorem for clustering. In: Advances in Neural Information Processing Systems, pp. 463–470 (2003)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Lichman, M.: UCI machine learning repository (2013)
Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning. ICML 2004, pp. 81–88. ACM, New York (2004)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Randel, R., Aloise, D., Mladenović, N., Hansen, P. (2019). On the k-Medoids Model for Semi-supervised Clustering. In: Sifaleras, A., Salhi, S., Brimberg, J. (eds) Variable Neighborhood Search. ICVNS 2018. Lecture Notes in Computer Science(), vol 11328. Springer, Cham. https://doi.org/10.1007/978-3-030-15843-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-15843-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15842-2
Online ISBN: 978-3-030-15843-9
eBook Packages: Computer ScienceComputer Science (R0)