Abstract
This paper represents an attempt to throw some light on the quality and on the defects of some recent clustering methods, either they are incremental or not, on “real world data”. An extended evaluation of the methods is achieved through the use of textual datasets of increasing complexity. The third test dataset is a highly polythematic dataset that figures out a static simulation of evolving data. It thus represents an interesting benchmark for comparing the behaviour of incremental and non incremental methods. The focus is put on neural clustering methods but the standard K-means method is included as reference in the comparison. Generic quality measures are used for quality evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Davies, D., Bouldin, W.: A cluster separation measure. IEEE Transaction on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood for incomplete data via the em algorithm. ournal of the Royal Statistical Society, B 39, 1–38 (1977)
Frizke, B.: A growing neural gas network learns topologies. Advances in neural Information processing Systems 7, 625–632 (1995)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 56–59 (1982)
Lamirel, J.-C., Al-Shehabi, S., Francois, C., Hofmann, M.: New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping. Scientometrics 60 (2004)
Lamirel, J.-C., Boulila, Z., Ghribi, M., Cuxac, P.: A new incremental growing neural gas algorithm based on clusters labeling maximization: application to cluster- ing of heterogeneous textual data. In: The 22th Int. Conference on Industrial, Engi- neering and Other Applications of Applied Intelligent Systems (IEA-AIE), Cordoba, Spain (2010)
Lamirel, J.-C., Phuong, T.A., Attik, M.: Novel labeling strategies for hierarchical representation of multidimensional data analysis results. In: IASTED International Conference on Artificial Intelligence and Applications (AIA), Innsbruck, Austria (February 2008)
Lamirel, J.-C., Ghribi, M., Cuxac, P.: Unsupervised recall and precision measures: a step towards new efficient clustering quality indexes. In: Proceedings of the 19th Int. Conference on Computational Statistics (COMPSTAT 2010), Paris, France (August 2010)
MacQueen, J.: Some methods of classifcation and analysis of multivariate observations. In: Proc. 5th Berkeley Symposium in Mathematics, Statistics and Probability, vol. 1, pp. 281–297. Univ. of California, Berkeley (1967)
Martinetz, T., Schulten, K.: A neural gas network learns topologies. Articial Neural Networks, 397–402 (1991)
Oertzen, J.V.: Results of evaluation and screening of 40 technologies. Deliverable 04 for Project PROMTECH, 32 pages + appendix (2007)
Prudent, Y., Ennaji, A.: An incremental growing neural gas learns topology. In: 13th European Symposium on Artificial Neural Networks, Bruges, Belgium (April 2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lamirel, JC., Mall, R., Ahmad, M. (2011). Comparative Behaviour of Recent Incremental and Non-incremental Clustering Methods on Text: An Extended Study. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21822-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-21822-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21821-7
Online ISBN: 978-3-642-21822-4
eBook Packages: Computer ScienceComputer Science (R0)