Abstract
Although various dissimilarity functions for symbolic data clustering are available in the literature, little attention has thus far been paid to making a comparison between such different distance measures. This paper presents a comparative study of some well known dissimilarity functions treating symbolic data. A version of the fuzzy c-means clustering algorithm is used to create groups of individuals characterized by symbolic variables of mixed types. The proposed approach provides a fuzzy partition and a prototype for each cluster by optimizing a criterion dependent on the dissimilarity function chosen. Experiments involving benchmark data sets are carried out in order to compare the accuracy of each function. To analyse the results, we apply an external criterion that compares different partitions of a same data set.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Advanced Applications in Pattern Recognition. Springer, Heidelberg (1981)
Bobou, A., Ribeyre, F.: Mercury in the food web: accumulation and transfer mechanisms. Metal Ions in Biological Systems, 289–319 (1998)
Bock, H.: Classification and clustering: Problems for the future. New Approaches in Classification and Data Analysis, 3–24 (1993)
Bock, H.-H., Diday, E.: Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. In: Studies in classification, data analysis, and knowledge organization. Springer, Berlin (2000)
Chavent, M., Lechevallier, Y.: Dynamical clustering algorithm of interval data: Optimization of an adequacy criterion based on hausdorff distance. In: Classification, Clustering and Data Analysis, pp. 53–59 (2002)
De Carvalho, F.A.T.: Proximity coefficients between boolean symbolic objects. In: New Approaches in Classification and Data Analysis, pp. 387–394. Springer, Heidelberg (1994)
Diday, E., Brito, M.: Symbolic cluster analysis. Conceptual and Numerical Analysis of Data, 45–84 (1989)
Diday, E., Simon, J.: Clustering analysis. Digital Pattern Recogn., Commun. Cybern. 10, 47–94 (1976)
El-Sonbaty, Y., Ismail, M.A.: Fuzzy clustering for symbolic data. IEEE Transactions on Fuzzy Systems 6(2), 195–204 (1998)
Everitt, B.: Cluster Analysis. Halsted, New York (2001)
Gordon, A.D.: Monographs on Statistics and Applied Probability, Classification, 2nd edn., vol. 82. Chapman & Hall/CRC (1999) ISBN 1-58488-013-9
Gordon, A.D.: An iteractive relocation algorithm for classifying symbolic data. Data Analysis: Scientific Modeling and Practical Application, 17–23 (2000)
Gowda, K.C., Diday, E.: Symbolic clustering using a new dissimilarity measure. Pattern Recogn. 24(6), 567–578 (1991)
Gowda, K.C., Ravi, T.R.: Divisive clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition Letters 28(8), 1277–1282 (1995)
Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2, 193–218 (1985)
Ichino, H., Yaguchi, M.: Generalized minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems, Man and Cybernetics 24(4), 698–708 (1994)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)
Michalski, R.S., Stepp, R.E.: Automated construction of classifications: Conceptual clustering versus numerical taxonomy. IEEE Transactions on Pattern Analysis and Machine Intelligence 5(4), 396–409 (1983)
Milligan, G.W.: Clustering validation: Results and implications for applied analysis. Clustering and classification, 341–375 (1996)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Spaeth, H.: Cluster analysis algorithms. John Wiley and Sons, New York (1980)
Verde, R., De Carvalho, F., Lechevallier, Y.: A dynamical clustering algorithm for symbolic data. In: Tutorial on Symbolic Data Analisys, GfKl Conference, pp. 195–204 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
da Silva, A., Lechevallier, Y., de Carvalho, F. (2009). Comparing Clustering on Symbolic Data. In: Nedjah, N., de Macedo Mourelle, L., Kacprzyk, J., França, F.M.G., de De Souza, A.F. (eds) Intelligent Text Categorization and Clustering. Studies in Computational Intelligence, vol 164. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85644-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-85644-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85643-6
Online ISBN: 978-3-540-85644-3
eBook Packages: EngineeringEngineering (R0)