Abstract
In this our first participation in CLEF, we applied Natural Language Processing techniques for single word and multiword term conflation. We tested several approaches at different levels of text processing in our experiments: first, we lemmatized the text to avoid inflectional variation; second, we expanded the queries through synonyms according to a fixed similarity threshold; third, we employed morphological families to deal with derivational variation; and fourth, we tested a mixed approach based on the employment of such families together with syntactic dependencies to deal with the syntactic content of the document.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Miguel A. Alonso, Jesús Vilares, and Víctor M. Darriba. On the usefulness of extracting syntactic dependencies for text indexing. In Michael O’Neill, Richard F. E. Sutcliffe, Conor Ryan, Malachy Eaton, and Niall J. L. Griffith, editors, Artificial Intelligence and Cognitive Science, volume 2464 of Lecture Notes in Artificial Intelligence, pages 3–11. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 274, 276
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, Harlow, England, 1999. 270
Fco. Mario Barcala, Jesús Vilares, Miguel A. Alonso, Jorge Graña, and Manuel Vilares. Tokenization and proper noun recognition for information retrieval. In A Min Tjoa and Roland R. Wagner (eds.), Thirteen International Workshop on Database and Expert Systems Applications. 2-6 September 2002. Aix-en-Provence, France, pp. 246-250, IEEE Computer Society Press, Los Alamitos, California, 2002. 266
J.M. Blecua (dir.), Diccionario Avanzado de Sinónimos y Antónimos de la Lengua Española, Vox, Barcelona, Spain, 1997. 267
Thorsten Brants. TNT — a statistical part-of-speech tagger. In Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP’2000), Seattle, 2000. 266
Chris Buckley, James Allan, and Gerard Salton. Automatic routing and ad-hoc retrieval using SMART: TREC 2. In D.K. Harman, editor, NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2), pages 45-56, Gaithersburg, MD, USA, 1993. 271
Santiago Fernández, Jorge Graña, and Alejandro Sobrino. A Spanish e-dictionary of synonyms as a fuzzy tool for information retrieval. In Actas del XI Congreso Español sobre Tecnologías y Lógica Fuzzy (ESTYLF-2002), León, Spain, September 2002. 267
Carlos G. Figuerola, Raquel Gómez, Angel F. Zazo, and José Luis Alonso. Stemming in Spanish: A first approach to its impact on information retrieval. In Carol Peters, editor, Working notes for the CLEF 2001 Workshop, Darmstadt, Germany, September 2001. 269
Jorge Graña, Fco. Mario Barcala, and Miguel A. Alonso. Compilation methods of minimal acyclic automata for large dictionaries. In Bruce W. Watson and Derick Wood, editors, Proc. of the 6th Conference on Implementations and Applications of Automata (CIAA 2001), pages 116-129, Pretoria, South Africa, July 2001. 266
Jorge Graña, Fco. Mario Barcala, and Jesús Vilares. Formal methods of tokenization for part-of-speech tagging. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 240–249. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 266
Jorge Graña, Jean-Cédric Chappelier, and Manuel Vilares. Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of the Eu-roconference Recent Advances in Natural Language Processing (RANLP 2001), pages 122-128, Tzigov Chark, Bulgaria, 2001. 266
Jane Greenberg. Automatic query expansion via lexical-semantic relationships. Journal of the American Society for Information Science and Technology, 52(5):402–415, 2001. 267
Christian Jacquemin and Evelyne Tzoukermann. NLP for term variant extraction: synergy between morphology, lexicon and syntax. In Tomek Strza-lkowski, editor, Natural Language Information Retrieval, volume 7 of Text, Speech and Language Technology, pages 25–74. Kluwer Academic Publishers, Dordrecht/Boston/London, 1999. 268
J. Savoy, A. Le Calve, and D. Vrajitoru. Report on the TREC-5 experiment: Data fusion and collection fusion. Proceedings of TREC’5, NIST publication #500-238, pages 489-502, Gaithersburg, MD, 1997. 272
Jesús Vilares, Fco. Mario Barcala, and Miguel A. Alonso. Using syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2276 of Lecture Notes in Computer Science, pages 381–390. Springer-Verlag, Berlin-Heidelberg-New York, 2002. 268, 276
Jesús Vilares, David Cabrero, and Miguel A. Alonso. Applying productive derivational morphology to term indexing of Spanish texts. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, Volume 2004 of Lecture Notes in Computer Science, pages 336–348. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 267
Jesús Vilares, Manuel Vilares, and Miguel A. Alonso. Towards the development of heuristics for automatic query expansion. In Heinrich C. Mayr, Jiri Lazansky, Gerald Quirchmayr, and Pavel Vogel, editors, Database and Expert Systems Applications, Volume 2113 of Lecture Notes in Computer Science, pages 887–896. Springer-Verlag, Berlin-Heidelberg-New York, 2001. 270, 272, 276
David Yarowsky. A comparison of corpus-based techniques for restoring accents in Spanish and French text. In Natural Language Processing Using Very Large Corpora, pages 99-120. Kluwer Academic Publishers, 1999. 269
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vilares, J., Alonso, M.A., Ribadas, F.J., Vilares, M. (2003). COLE Experiments in the CLEF 2002 Spanish Monolingual Track. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Advances in Cross-Language Information Retrieval. CLEF 2002. Lecture Notes in Computer Science, vol 2785. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45237-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-45237-9_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40830-7
Online ISBN: 978-3-540-45237-9
eBook Packages: Springer Book Archive