Abstract
In this paper, we describe ongoing work (in the framework of the project BioMiRe) in automatic detection of gene and protein names in biomedical texts. The approach adopted here is one based on robust linguistic analysis of these texts. The first part will show the specific problems encountered in the corpora, from the lexical and syntactic points of view. Then we will describe how the tools we use for linguistic processing perform our task. The problem of evaluation will then be addressed and our first results will be given. Finally, we will sketch how we intend to continue the work.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abney, S.: Parsing by chunks. In: R. Berwick, S. Abney, C. Tenny, eds. Principle-Based Parsing. Dodrecht: Kluwer Academic Publishers. (1991)
Abney, S.: Partial Parsing via Finite-State Cascades. In ESSLI’96 Workshop on Robust Parsing, Prague, Czech Republic. (1996)
Aït-Mokhtar, S., Chanod, J.-P., Roux, C: Robustness beyond shallowness: incremental deep parsing In: Journal of Natural Language Engineering, Special Issue on Robust Methods in Analysis of Natural Language Data, Afzal Ballim, Vincenzo Pallotta (eds) Cambridge University Press (to appear)
Cutting, D., Kupiec J., Pedersen J., Sibun, P.: A practical part-of-speech tagger. In: Proceedings of the 3rd Conference on Applied Natural Language Processing Trento, Italy. (1992)
Karttunen, L., Chanod, J.-P., Grefenstette, G., Schiller, A.: Regular Expressions for language Engineering. In: Journal of Natural Language Engineering vol 2 no 4. Cambridge University Press (1997) 307–330
Karttunen, L.: Applications of Finite-State Transducers in Natural Language Processing. In: Proceedings of CIAA-2000. Lecture Notes in Computer Science. Springer Verlag (2000)
Krauthammer, M., Rzhetsky, A., Morozov, P., Friedman, C: Using BLAST for identifying gene and protein names in journal articles. In Gene 259 (2000) pp. 245–252
Pillet, V.: Méthodologie d’extraction automatique d’information à partir de la littérature scientifique en vue d’alimenter un nouveau système d’information. Thèse de l’Université de droit, d’économie et des sciences d’Aix-Marseille (2000)
Proux, D., Rechenman, F., Juilliard, L.: A pragmatic Information Extraction Strategy for gathering Data on Genetic Interactions In ISMB 2000.
Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Exctraction. In: Genome Informatics 1998 (GIW’98). Miyano S. and Takagi T. editors, Universal Academy Press, Inc, Tokyo, Japan (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hagège, C., Sándor, Á., Schiller, A. (2002). Linguistic Processing of Biomedical Texts. In: Ranchhod, E., Mamede, N.J. (eds) Advances in Natural Language Processing. PorTAL 2002. Lecture Notes in Computer Science(), vol 2389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45433-0_29
Download citation
DOI: https://doi.org/10.1007/3-540-45433-0_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43829-8
Online ISBN: 978-3-540-45433-5
eBook Packages: Springer Book Archive