Abstract
Gazetteers, or entity dictionaries, are an important element for Named Entity Recognition. Named Entity Recognition is an essential component of Information Extraction. Gazetteers work as specialized dictionaries to support initial tagging. They provide quick entity identification thus creating richer document representation. However, the compilation of such gazetteers is sometimes mentioned as a stumbling block in Named Entity Recognition. Machine learning, both rule-based and look-up based approaches, are often used to perform this process. In this paper, a gazetteer developed from MUC-3 annotated data for the ‘person named’ entity type is presented. The process used has a small computational cost. We combine rule-based grammars and a simple filtering technique for automatically inducing the gazetteer. We conclude with experiments to compare the content of the gazetteer with the manually crafted one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Mikheev, A., Moens, M., Grover, C.: Name Entity Recognition without Gazetteers. In: 9th Conference of European Chapter of the Association of Computational Linguistic, pp. 1–8 (1999)
Nissim, M., Markert, K.: Syntactic Features and Word Similarity for Supervised Metonymy Resolution. In: 10th Conference of European Chapter of the Association of Computational Linguistic, pp. 56–63 (2003)
Tanenblatt, M., Coden, A., Sominsky, I.: The ConceptMapper Approach to Named Entity Recognition. In: 7th Language Resource and Evaluation Conference, pp. 546–551 (2010)
Nadeau, D.: Semi-Supervised Named Entity Recognition: Learning to Recognize 100 Entity Types with Little Supervision. PhD Thesis, University of Ottawa, Canada (2007)
Stevenson, M., Gaizauskas, R.: Using Corpus-derived Name Lists for Named Entity Recognition. In: North American Chapter of Association for Computational Linguistics, pp. 290–295 (2000)
Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity Recognition. In: 31th Conference on Computational Natural Language Learning, pp. 147–155 (2009)
Minkov., E., Wang, R.C., Cohen, W.W.: Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text. In: Human Language Technology / Empirical Methods in Natural Language Processing, pp. 443–450 (2005)
Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: International Conference on Computational Linguistics, pp. 539–545 (1992)
Riloff, E., Jones, R.: Learning Dictionaries for Information Extraction using Multi-level Bootstrapping. In: 16th National Conference on Artificial Intelligence, pp. 474–479 (1999)
Etzioni, O., Cafarella, M., Downey, D., Popescu, D., Shaked, A.M., Soderland, T., Weldnad, D.S., Yates, A.: Unsupervised Named Entity Extraction from the Web: An Experimental Study. J. Artificial Intelligence 165, 91–134 (2005)
Sasano, R., Kurohashi, S.: Japanese Named Entity Recognition using Structural Natural Language Processing. In: 3rd International Joint Conference on Natural Language Processing, pp. 607–612 (2008)
Pang, W., Fan, X., Gu, Y., Yu, J.: Chinese Unknown Words Extraction Based on Word-Level Characteristics. In: 9th International Conference on Hybrid Intelligent System, pp. 361–366 (2009)
Krieger, H.U., Schäfer, U.: DL Meet FL: A Bidirectional Mapping between Ontologies and Linguistic Knowledge. In: 23rd International Conference on Computational Linguistics, pp. 588–596 (2010)
Zamin, N., Oxley, A.: Information Extraction for Counter-Terrorism: A Survey on Link Analysis. In: International Symposium on Information Technology, pp. 1211–1215 (2010)
Zamin, N., Oxley, A.: Unapparent Information Revelation: A Knowledge Discovery using Concept Chain Graph Approach. In: National Seminar on Computer Science and Mathematics (2010) (Internal Publication)
Zamin, N.: Information Extraction for Counter-Terrorism: A Survey. Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns, 520–526 (2009)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part of Speech Tagging. J. Computational Linguistics 21(4), 543–556 (1995)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 363–370 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zamin, N., Oxley, A. (2011). Building a Corpus-Derived Gazetteer for Named Entity Recognition. In: Zain, J.M., Wan Mohd, W.M.b., El-Qawasmeh, E. (eds) Software Engineering and Computer Systems. ICSECS 2011. Communications in Computer and Information Science, vol 180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22191-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-22191-0_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22190-3
Online ISBN: 978-3-642-22191-0
eBook Packages: Computer ScienceComputer Science (R0)