Named Entity Recognition for Open Domain Data Based on Distant Supervision

Wu, Junshuang; Zhang, Richong; Deng, Ting; Huai, Jinpeng

doi:10.1007/978-981-15-1956-7_17

Junshuang Wu^11,12,
Richong Zhang^11,12,
Ting Deng^11,12 &
…
Jinpeng Huai^11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1134))

Included in the following conference series:

China Conference on Knowledge Graph and Semantic Computing

1456 Accesses

Abstract

Named Entity Recognition (NER) for open domain data is a critical task for the natural language process applications and attracts many research attention. However, the complexity of semantic dependencies and the sparsity of the context information make it difficult for identifying correct entities from the corpus. In addition, the lack of annotated training data makes impossible the prediction of fine-grained entity types for detected entities. To solve the above-mentioned problems in NER, we propose an extractor which takes both the near arguments and long dependencies of relations into consideration for the entities and relations mention discovery. We then employ distant-supervision methods to automatically label mention types of training data sets and a neural network model is proposed for learning the type classifier. Empirical studies on two real-world raw text corpus, NYT and YELP, demonstrate that our proposed NER approach outperforms the existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatically building large-scale named entity recognition corpora from Chinese Wikipedia

Article 07 November 2015

Semantic Label Enhanced Named Entity Recognition with Incompletely Annotated Data

Enhancing NER with Sentence-Level Entity Detection as an Simple Auxiliary Task

Notes

1.
These five labels are introduced in Stanford Dependency notations. http://nlp.stanford.edu/software/dependencies_manual.pdf.

References

Anand, A., Awekar, A.: Fine-grained entity type classification by jointly learning representations and label embeddings. In: Proceedings of EACL, pp. 797–807 (2017)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. Trans. Knowl. Discov. Data 1(1), 1–36 (2007)
Article Google Scholar
Bollacker, K.D., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of SIGMOD, pp. 1247–1250 (2008)
Google Scholar
Bordes, A., Usunier, N., García-Durán, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Proceedings of NIPS, pp. 2787–2795 (2013)
Google Scholar
Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of COLING (2002)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.P.: Natural language processing (almost) from scratch. JMLR 12, 2493–2537 (2011)
MATH Google Scholar
Durrett, G., Klein, D.: A joint model for entity analysis: coreference, typing, and linking. Trans. Assoc. Comput. Linguist. 2, 477–490 (2014)
Article Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of EMNLP, pp. 1535–1545 (2011)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.D.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of ACL (2005)
Google Scholar
Gangemi, A., Nuzzolese, A.G., Presutti, V., Draicchio, F., Musetti, A., Ciancarini, P.: Automatic typing of DBpedia entities. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 65–81. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_5
Chapter Google Scholar
Gregoric, A.Z., Bachrach, Y., Coope, S.: Named entity recognition with parallel recurrent neural networks. In: Proceedings of ACL, pp. 69–74 (2018)
Google Scholar
Gupta, S., Manning, C.D.: Improved pattern learning for bootstrapped entity extraction. In: Proceedings of CoNLL, pp. 98–108 (2014)
Google Scholar
Han, X., Sun, L., Zhao, J.: Collective entity linking in web text: a graph-based method. In: Proceedings of SIGIR, pp. 765–774 (2011)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of ICML, pp. 448–456 (2015)
Google Scholar
Kate, R.J., Mooney, R.J.: Joint entity and relation extraction using card-pyramid parsing. In: Proceedings of CoNLL, pp. 203–212 (2010)
Google Scholar
Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of ICML, pp. 282–289 (2001)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL, pp. 260–270 (2016)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of ICML, pp. 1188–1196 (2014)
Google Scholar
Li, Q., Ji, H.: Incremental joint extraction of entity mentions and relations. In: Proceedings of ACL, pp. 402–412 (2014)
Google Scholar
Lin, T., Mausam, Etzioni, O.: No noun phrase left behind: detecting and typing unlinkable entities. In: Proceedings of EMNLP-CoNLL, pp. 893–903 (2012)
Google Scholar
Ling, X., Weld, D.S.: Fine-grained entity recognition. In: Proceedings of AAAI, pp. 94–100 (2012)
Google Scholar
Moon, C., Jones, P., Samatova, N.F.: Learning entity type embeddings for knowledge graph completion. In: Proceedings of CIKM, pp. 2215–2218 (2017)
Google Scholar
Neelakantan, A., Chang, M.: Inferring missing entity type instances for knowledge base completion: new dataset and methods. In: Proceedings of NAACL-HLT, pp. 515–525 (2015)
Google Scholar
Paulheim, H., Bizer, C.: Type inference on noisy RDF data. In: Proceedings of ISWC, pp. 510–525 (2013)
Chapter Google Scholar
Ren, X., El-Kishky, A., Wang, C., Tao, F., Voss, C.R., Han, J.: Clustype: effective entity recognition and typing by relation phrase-based clustering. In: Proceedings of SIGKDD, pp. 995–1004 (2015)
Google Scholar
Ren, X., He, W., Huang, M.Q.L., Ji, H., Han, J.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: Proceedings of EMNLP, pp. 1369–1378 (2016)
Google Scholar
Shimaoka, S., Stenetorp, P., Inui, K., Riedel, S.: Neural architectures for fine-grained entity type classification. In: Proceedings of EACL, pp. 1271–1280 (2017)
Google Scholar
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of WWW, pp. 697–706 (2007)
Google Scholar
Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M.: Representation learning of knowledge graphs with entity descriptions. In: Proceedings of AAAI, pp. 2659–2665 (2016)
Google Scholar
Xu, M., Jiang, H., Watcharawittayakul, S.: A local detection approach for named entity recognition and mention detection. In: Proceedings of ACL, pp. 1237–1247 (2017)
Google Scholar
Xu, P., Barbosa, D.: Neural fine-grained entity type classification with hierarchy-aware loss. In: Proceedings of NAACL, ACL, June 2018
Google Scholar
Zhou, G., Su, J.: Named entity recognition using an hmm-based chunk tagger. In: Proceedings of ACL, pp. 473–480 (2002)
Google Scholar

Download references

Acknowledgments

This work is supported partly by the National Natural Science Foundation of China (No. 61772059, 61602023 and 61421003), by the Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), by State Key Laboratory of Software Development Environment (No. SKLSDE-2018ZX-17), and by the Fundamental Research Funds for the Central Universities and the Beijing S&T Committee.

Author information

Authors and Affiliations

SKLSDE, School of Computer Science and Engineering, Beihang University, Beijing, China
Junshuang Wu, Richong Zhang, Ting Deng & Jinpeng Huai
Beijing Advanced Institution on Big Data and Brain Computing, Beihang University, Beijing, China
Junshuang Wu, Richong Zhang, Ting Deng & Jinpeng Huai

Authors

Junshuang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Richong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ting Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jinpeng Huai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richong Zhang .

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, China
Xiaoyan Zhu
Harbin Institute of Technology, Harbin, China
Bing Qin
Queen's University, Kingston, Canada
Xiaodan Zhu
Harbin Institute of Technology, Harbin, China
Ming Liu
Soochow University, Soochow, China
Longhua Qian

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Zhang, R., Deng, T., Huai, J. (2019). Named Entity Recognition for Open Domain Data Based on Distant Supervision. In: Zhu, X., Qin, B., Zhu, X., Liu, M., Qian, L. (eds) Knowledge Graph and Semantic Computing: Knowledge Computing and Language Understanding. CCKS 2019. Communications in Computer and Information Science, vol 1134. Springer, Singapore. https://doi.org/10.1007/978-981-15-1956-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-15-1956-7_17
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-1955-0
Online ISBN: 978-981-15-1956-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics