Abstract
A new model for document clustering is proposed in order to manage with conceptual aspects. To measure the presence degree of a concept in a document (or even in a document collection), a concept frequency formula is introduced. This formula is based on new fuzzy formulas to calculate the synonymy and polysemy degrees between terms. To solve the several shortcomings of classical clustering algorithm a soft approach to hybrid model is proposed. The clustering procedure is implemented by two connected and tailored algorithms with the aim to build a fuzzy-hierarchical structure. A fuzzy hierarchical clustering algorithm is used to determine an initial clustering and the process is completed using an improved soft clustering algorithm. Experiments show that using this model, clustering tends to perform better than the classical approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zamir, O., Etzioni, O.: Grouper: A dynamic clustering interface to web search results. In: Proceedings of the WWW8 (1999)
Spath, H.: Clustering Analysis Algorithms for Data Reduction and Classification of Objects. Ellis Horwood, Chichester (1980)
Zhao, Y., Karypis, G.: Evaluation of hierarchical clustering algorithms for document datasets. In: Proceedings of CIKM, pp. 515–524. ACM Press, New York (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Soto, A., Olivas, J.A., Prieto, M.E.: Fuzzy Approach of Synonymy and Polysemy for Information Retrieval. In: Proceedings International Symposium on Fuzzy and Rough Sets (ISFUROS ’06), Santa Clara, Cuba (2006)
Fernandez, S., Grana, J., Sobrino, A.: A Spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In: Actas de las I Jornadas de Tratamiento y Recuperación de Información (JOTRI 2002), León, Spain (September 2002)
Mendes, M.E.S., Sacks, L.: A Scalable Hierarchical Fuzzy Clustering Algorithm for Text Mining. In: Proc. of the 4th International Conference on Recent Advances in Soft Computing, RASC’2004, Nottingham, UK, pp. 269–274 (2004)
El-Hamdouchi, A., Willet, P.: Comparison of Hierarchic Agglomerative Clustering Methods for Document Retrieval. The Computer Journal 32(3) (1989)
Akrivas, G., et al.: Context - Sensitive Semantic Query Expansion. In: Proceedings of the IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia, IEEE, Los Alamitos (2002)
King-ip, L., Ravikumar, K.: A similarity-based soft clustering algorithm for documents. In: Proc. of the Seventh Int. Conf. on Database Sys. for Advanced Applications (2001)
Olivas, J.A., Garcés, P., Romero, F.P.: An application of the FIS-CRM model to the FISS metasearcher: Using fuzzy synonymy and fuzzy generality for representing concepts in documents. International Journal of Approximate Reasoning (Soft Computing in Recognition and Search) 34, 201–219 (2003)
Beil, F., Ester, M., Xu, X.: Frequent Term-Based Clustering. In: Proceedings of the SIGKDD’02, Edmonton, Canada (2002)
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1/2), 67–88 (1999)
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: KDD-99, San Diego, California (1999)
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Buttersworth, London (1989)
Kowalski, G.: Information Retrieval Systems – Theory and Implementation. Kluwer Academic Publishers, Dordrecht (1997)
Lewis, D.: Reuters-21578 text categorization text collection 1.0. http://www.research.att.com/~lewis
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Pedrycz, W.: Conditional Fuzzy C-Means. Pattern Recognition Letters 17, 625–631 (1996)
Kohonen, T.: Self-organizing Maps. Series in Information Sciences, vol. 30. Springer, Heidelberg (1995)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Romero, F.P., Soto, A., Olivas, J.A. (2007). A Hybrid Model for Document Clustering Based on a Fuzzy Approach of Synonymy and Polysemy. In: Castillo, O., Melin, P., Ross, O.M., Sepúlveda Cruz, R., Pedrycz, W., Kacprzyk, J. (eds) Theoretical Advances and Applications of Fuzzy Logic and Soft Computing. Advances in Soft Computing, vol 42. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72434-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-540-72434-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72433-9
Online ISBN: 978-3-540-72434-6
eBook Packages: EngineeringEngineering (R0)