Abstract
Biomedical named entity recognition (NER) is a difficult problem in biomedical information processing due to the widespread ambiguity of terms out of context and extensive lexical variations. This paper presents a two-phase biomedical NER consisting of term boundary detection and semantic labeling. By dividing the problem, we can adopt an effective model for each process. In our study, we use two exponential models, conditional random fields and maximum entropy, at each phase. Moreover, results by this machine learning based model are refined by rule-based postprocessing implemented using a finite state method. Experiments show it achieves the performance of F-score 71.19% on the JNLPBA 2004 shared task of identifying 5 classes of biomedical NEs.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brants, T.: TnT A Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied Natural Language Processing (2000)
Chen, S.F., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMUCS-99-108, Carnegie Mellon University
Collier, N., Nobata, C., Tsujii, J.-i.: Extracting the Names of Genes and Gene Products with a Hidden Markov Model. In: Proceedings of COLING 2000, pp. 201–207 (2000)
Finkel, J., Dingare, S., Nguyen, H.: Exploiting Context for Biomedical Entity Recognition From Syntax to thw Web. In: Proceedings of JNLPBA/BioNLP 2004, pp. 88–91 (2004)
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Proceedins of the Pacific Symposium on Biocomputing 1998, pp. 707–718 (1998)
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, pp. 1–8 (2002)
Krauthammer, M., Nenadic, G.: Term Identification in the Biomedical literature. Journal of Biomedical Informatics 37(6), 512–526 (2004)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML-2001, pp. 282–289 (2001)
Lee, K.-J., Hwang, Y.-S., Kim, S., Rim, H.-C.: Biomedical named entity recognition using two-phase model based on SVMs. Journal of Biomedical Informatics 37(6), 436–447 (2004)
Park, K.-M., Kim, S., Lee, K.-J., Lee, D.-G., Rim, H.-C.: Incorportating Lexical Knowledge into Biomedical NE Recognition. In: Proceedings of Natural Language Processing in Biomedicine and its Applications Post-COLING Workshop, pp. 76–79 (2004)
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Proceedings of JNLPBA/BioNLP, pp. 104–107 (2004)
Tuason, O., Chen, L., Liu, H., Blake, J.A., Friedman, C.: Biological Nomenclatures: A Source of Lexical Knowledge and Ambiguity. In: Pacific Symposium on Biocomputing, pp. 238–249 (2004)
Zhou, G., Zhang, J., Su, J., Tan, C.-L.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Proceedings of JNLPBA/BioNLP, pp. 99–102 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, S., Yoon, J., Park, KM., Rim, HC. (2005). Two-Phase Biomedical Named Entity Recognition Using A Hybrid Method. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_57
Download citation
DOI: https://doi.org/10.1007/11562214_57
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)