Abstract
Sci-tech compound phrase entity (e.g., the names of projects, books and patents) recognition is a fundamental task of science and technology data processing and discovery. However, much less work have been done on the problem. In this paper, we first give the characteristics of sci-tech entities that are different from personal name and other traditional entities. Then we introduce a self-learning rule-based approach to address the problem. The approach consists of three stages, namely rule-based text truncation, BlackPOS-based text split and WhiteKey-based confirmation. Constructing the best WhiteKey list is a NP-hard problem. We further design dynamic programming and greedy algorithms to address the problem. Experimental results show that our approach achieves 94.78% precision rate, 89.19% recall rate and 91.9% F 1 measure in average. Moreover, our approach is universal and orthogonal to prior named entity recognition work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A High-performance Learning Name-finder. In: Proc. of ANLC, pp. 194–201 (1997)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proc. of MUC (1998)
Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., Vaithyanathan, S.: Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks. In: Proc. of EMNLP, pp. 1002–1012 (2010)
Cucerzan, S.: Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In: Proc. of EMNLP-CoNLL 2007, pp. 708–716 (2007)
Cucerzan, S., Yarowsky, D.: Language independent named entity recognition combining morphological and contextual evidence. In: Proc. of EMNLP and VLC, pp. 90–99 (1999)
Farmakiotou, D., Karkaletsis, V., Koutsias, J., Sigletos, G., Spyropoulos, C.D., Stamatopoulos, P.: Rule-Based Named Entity Recognition For Greek Financial Texts. In: Proc. of COMLEX, pp. 75–78 (2000)
Zhang, H.: NLPIR/ICTCLAS (2012), http://ictclas.nlpir.org/
Krishnan, V., Manning, C.D.: An Effective Two-stage Model for Exploiting Non-local Dependencies in Named Entity Recognition. In: Proc. of ACL, pp. 1121–1128 (2006)
Mann, G.S., Yarowsky, D.: Unsupervised Personal Name Disambiguation. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 33–40 (2003)
McCallum, A., Li, W.: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-enhanced Lexicons. In: Proc. of CONLL at HLT-NAACL 2003, vol. 4, pp. 188–191 (2003)
Mikheev, A., Moens, M., Grover, C.: Named Entity Recognition without Gazetteers. In: Proc. of EACL, pp. 1–8 (1999)
Nadeau, D., Sekine, S.: A Survey of Named Entity Recognition and Classification (2007), http://brown.cl.uni-heidelberg.de/~sourjiko/NER_Literatur/survey.pdf
Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of MUC (1998)
Sogou Labs: Sogou Text Classification Corpus (2008), http://www.sogou.com/labs/dl/c.html/
Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: Proc. of COLING, vol. 20, pp. 1–7 (2002)
Viggo Kann: Minimum Set Cover (2000), http://perso.ensta-paristech.fr/~diam/ro/online/viggo_wwwcompendium/node146.html#6062
Wikipedia: Intellectual Property Protection, en.wikipedia.org/wiki/Intellectual_property
Wikipedia: Open-Source Intelligence, en.wikipedia.org/wiki/Open-source_intelligence
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, T., Zhang, Y., Yan, Y., Shi, J., Guo, L. (2015). A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition. In: Cheng, R., Cui, B., Zhang, Z., Cai, R., Xu, J. (eds) Web Technologies and Applications. APWeb 2015. Lecture Notes in Computer Science(), vol 9313. Springer, Cham. https://doi.org/10.1007/978-3-319-25255-1_60
Download citation
DOI: https://doi.org/10.1007/978-3-319-25255-1_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25254-4
Online ISBN: 978-3-319-25255-1
eBook Packages: Computer ScienceComputer Science (R0)