Abstract
The demand of accuracy and speed in the Information Retrieval processes has revealed the necessity of a good classification of the large collection of documents existing in databases and Web servers. The representation of documents in the vector space model with terms as features offers the possibility of application of Machine Learning techniques. A filter method to select the most relevant features before the classification process is presented in this paper. A Genetic Algorithm (GA) is used as a powerful tool to search solutions in the domain of relevant features. Implementation and some preliminary experiments have been realized. The application of this technique to the vector space model in Information Retrieval is outlined as future work.
Preview
Unable to display preview. Download preview PDF.
References
Baker, J.E. Adaptive Selection Methods for Genetic Algorithms. In Proc. on the First International Conference on Genetic Algorithms and their applications, pp.101–111, Grefenstette, J.J. (ed). Hillsdale, New Jersey: Lawrence Earlbaum, 1985.
Dash, M and Liu, H. Feature Selection for Classification. In Intelligent Data Analysis, vol. 1, no. 3, 1997.
Holland, J.H. Adaptation in Natural and Artificial Systems. Massachusetts: MIT Press, 1992.
John, G.H., Kohavi, R. and Pfleger, K. Irrelevant Features and the Subset Selection Problem. In Proc. of the Eleventh International Conference on Machine Learning, pp.121–129. San Francisco, CA: Morgan Kauffmann Publishers, 1994.
Langley, P. Selection of Relevant Features in Machine Learning. In Proc. of the AAAI Fall Symposium on Relevance. New Orleans, LA: AAAI Press, 1994.
Salton, G. and McGill, M.J. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Martín-Bautista, M.J., Vila, MA. (1998). Applying genetic algorithms to the feature selection problem in information retrieval. In: Andreasen, T., Christiansen, H., Larsen, H.L. (eds) Flexible Query Answering Systems. FQAS 1998. Lecture Notes in Computer Science, vol 1495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0056008
Download citation
DOI: https://doi.org/10.1007/BFb0056008
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65082-9
Online ISBN: 978-3-540-49655-7
eBook Packages: Springer Book Archive