Abstract
Clustering is an important tool to explore the hidden structure of large databases. There are several algorithms based on different approaches (hierarchical, partitional, density-based, model-based, etc.). Most of these algorithms have some discrepancies, e.g. they are not able to detect clusters with convex shapes, the number of the clusters should be a priori known, they suffer from numerical problems, like sensitiveness to the initialization, etc. In this paper we introduce a new clustering algorithm based on the sinergistic combination of the hierarchial and graph theoretic minimal spanning tree based clustering and the partitional Gaussian mixture model-based clustering algorithms. The aim of this hybridization is to increase the robustness and consistency of the clustering results and to decrease the number of the heuristically defined parameters of these algorithms to decrease the influence of the user on the clustering results. As the examples used for the illustration of the operation of the new algorithm will show, the proposed algorithm can detect clusters from data with arbitrary shape and does not suffer from the numerical problems of the Gaussian mixture based clustering algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Augustson, J.G., Minker, J.: An analysis of some graph theoretical clustering techniques. J. ACM 17(4), 571–588 (1970)
Backer, F.B., Hubert, L.J.: A graph-theoretic approach to goodness-of-fit in complete-link hierarchical clustering. J. Am. Stat. Assoc. 71, 870–878 (1976)
Barrow, J.D., Bhavsar, S.P., Sonoda, D.H.: Minimal spanning trees, filaments and galaxy clustering. Monthly Notices of the Royal Astronomical Society 216, 17–35 (1985)
Ben-Dor, A., Yakhini, Z.: Clustering gene expression patterns. In: Proceedings of the 3rd Annual International Conference on Computational Molecular Biology (RECOMB 1999), pp. 11–14 (1999)
Bezdek, J.C., Clarke, L.P., Silbiger, M.L., Arrington, J.A., Bensaid, A.M., Hall, L.O., Murtagh, R.F.: Validity-guided (re)clustering with applications to image segmentation. IEEE Transactions on Fuzzy Systems 4, 112–123 (1996)
Castellano, G., Fanelli, A.M., Mencar, C.: A fuzzy clustering approach for mining diagnostic rules. In: Proceedings of 2003 IEEE International Conference on Systems, Man & Cybernetics (IEEE SMC 2003), vol. 1, pp. 2007–2012 (2003)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing. Journal of Knowledge Information Systems 1(1), 5–32 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)
Dubes, R.C.: How many clusters are best? – an experiment. Pattern Recogn. 20(6), 645–663 (1987)
Dunn, J.C.: Well separated clusters and optimal fuzzy partitions. Journal Cybernetics 4, 95–104 (1974)
Forina, M., Oliveros, C., Concepcion, M., Casolino, C., Casale, M.: Minimum spanning tree: ordering edges to identify clustering structure. Analytica Chimica Acta 515, 43–53 (2004)
Gath, I., Geva, A.B.: Unsupervised Optimal Fuzzy Clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence 11, 773–781 (1989)
Gonzáles-Barrios, J.M., Quiroz, A.J.: A clustering procedure based on the comparsion between the k nearest neighbors graph and the minimal spanning tree. Statistics & Probability Letter 62, 23–34 (2003)
Gotlieb, C.C., Kumar, S.: Semantic Clustering of Index Terms. J. ACM 15(4), 493–513 (1968)
Gower, J.C., Ross, G.J.S.: Minimal Spanning Trees and Single Linkage Cluster Analysis. Applied Statistics 18, 54–64 (1969)
Heer, J., Chi, E.: Identification of Web user traffic composition using multimodal clustering and information scent. In: 1st SIAM ICDM, Workshop on Web Mining, Chicago, IL, pp. 51–58 (2001)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall advanced reference series. Prentice-Hall, Inc., Englewood Cliffs (1988)
Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. American Mathematical Society 7, 48–50 (1956)
Mitchell, T.: Machine Learning. McGraw-Hill, Inc., New York (1997)
Päivinen, N.: Clustering with a minimum spanning tree of scale-free-like structure. Pattern Recognition Letters 26(7), 921–930 (2005)
Prim, R.: Shortest connection networks and some generalizations. Bell System Technical Journal 36, 1389–1401 (1957)
Raghavan, V.V., Yu, C.T.: A comparison of the stability characteristics of some graph theoretic clustering methods. IEEE Trans. Pattern Anal. Mach. Intell. 3, 393–402 (1981)
Sander, J., Ester, M., Kriegel, H.-P., Xu, X.: Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Mining and Knowledge Discovery 2(2), 169–194 (1998)
Varma, S., Simon, R.: Iterative class discovery and feature selection using Minimal Spanning Trees. BMC Bioinformatics 5, 126–134 (2004)
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Transaction on Computers C20, 68–86 (1971)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vathy-Fogarassy, A., Kiss, A., Abonyi, J. (2006). Hybrid Minimal Spanning Tree and Mixture of Gaussians Based Clustering Algorithm. In: Dix, J., Hegner, S.J. (eds) Foundations of Information and Knowledge Systems. FoIKS 2006. Lecture Notes in Computer Science, vol 3861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11663881_18
Download citation
DOI: https://doi.org/10.1007/11663881_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31782-1
Online ISBN: 978-3-540-31784-5
eBook Packages: Computer ScienceComputer Science (R0)