Abstract
This paper deals with overlapping clustering and presents two extensions of the approach OKM denoted as OKMED andWOKM. OKMED generalizes the well known k-medoid method to overlapping clustering and help in organizing data with any proximity matrix as input. WOKM (Weighted-OKM) proposes a model with local weighting of the clusters; this variant is suitable for overlapping clustering since a single data can matches with multiple classes according to different features. On text clustering, we show that OKMED has a behavior similar to OKM but offers to use metrics other than euclidean distance. Then we observe significant improvement using the weighted extension of OKM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Apté, C., Damerau, F., Weiss, S.M.: Automated learning of decision rules for text categorization. ACM Trans. Inf. Syst. 12(3), 233–251 (1994), http://doi.acm.org/10.1145/183422.183423
Banerjee, A., Krumpelman, C., Ghosh, J., Basu, S., Mooney, R.J.: Model-based overlapping clustering. In: KDD 2005: Proceeding of the eleventh ACM SIGKDD, pp. 532–537. ACM Press, New York (2005a), http://doi.acm.org/10.1145/1081870.1081932
Banerjee, A., Merugu, S., Dhillon, I., Ghosh, J.: Clustering with Bregman Divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005b)
Bertrand, P., Janowitz, M.F.: The k-weak Hierarchical Representations: An Extension of the Indexed Closed Weak Hierarchies. Discrete Applied Mathematics 127(2), 199–220 (2003)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981)
Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recognition 37(9), 1757–1771 (2004), http://dx.doi.org/10.1016/j.patcog.2004.03.009
Celleux, G., Govaert, G.: A Classification EM Algorithm for Clustering and Two Stochastic Versions. Computational Statistics and Data Analysis 14(3), 315–332 (1992)
Chan, E.Y., Ching, W.-K., Ng, M.K., Huang, J.Z.: An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognition 37(5), 943–952 (2004)
Cleuziou, G.: OKM: une extension des k-moyennes pour la recherche de classes recouvrantes. In: EGC 2007, Cépaduès edn., Namur, Belgique. Revue des Nouvelles Technologies de l’Information, vol. 2 (2007)
Cleuziou, G.: An Extended Version of the k-Means Method for Overlapping Clustering. In: 19th ICPR Conference, Tampa, Florida, USA, pp. 1–4 (2008)
Cleuziou, G., Sublemontier, J.-H.: Etude comparative de deux approches de classification recouvrante: Moc vs. Okm. In: 8èmes Journées Francophones d’Extraction et de Gestion des Connaissances, Cépaduès edn. Revue des Nouvelles Technologies de l’Information, vol. 2 (2008)
Dattola, R.: A fast algorithm for automatic classification. Technical report, Report ISR-14 to the National Science Foundation, Section V, Cornell University, Department of Computer Science (1968)
Dhillon, I.S.: Kernel k-means, spectral clustering and normalized cuts, pp. 551–556. ACM Press, New York (2004)
Diday, E.: Orders and overlapping clusters by pyramids. Technical report, INRIA num.730, Rocquencourt 78150, France (1987)
Diday, E., Govaert, G.: Classification avec distances adaptatives. RAIRO 11(4), 329–349 (1977)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)
Elisseeff, A., Weston, J.: A Kernel Method for Multi-Labelled Classification. In: Advances in Neural Information Processing Systems, vol. 14, pp. 681–687. MIT Press, Cambridge (2001)
Jardine, N., Sibson, R.: Mathematical Taxonomy. John Wiley and Sons Ltd., London (1971)
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis based on the L1 Norm, pp. 405–416 (1987)
Kohonen, T.: Self-Organization and Associative Memory. Springer, Heidelberg (1984)
Likas, A., Vlassis, N., Verbeek, J.: The Global K-means Clustering Algorithm. Pattern Recognition 36, 451–461 (2003)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Peña, J., Lozano, J., Larrañaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition Letters 20(50), 1027–1040 (1999)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and Efficient Multilabel Classification in Domains with Large Number of Labels. In: Proc. ECML/PKDD 2008 Workshop on Mining Multidimensional Data, MMD 2008 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cleuziou, G. (2010). Two Variants of the OKM for Overlapping Clustering. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-00580-0_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00579-4
Online ISBN: 978-3-642-00580-0
eBook Packages: EngineeringEngineering (R0)