Abstract
Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baldwin, T., Cook, P., Lui, M., Mackinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources? In: Proceedings of IJCNLP, pp. 356–364 (2013)
Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.S.: Making sense of microposts (# MSM2013) concept extraction challenge (2013)
Delcea, C., Bradea, I.A.: Grey clustering in online social networks. Vietnam J. Comput. Sci., 1–9 (2016). doi:10.1007/s40595-016-0087-8
Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manage. 51(2), 32–49 (2015)
Eisenstein, J.: What to do about bad language on the internet. In: Proceedings of HLT-NAACL, pp. 359–369 (2013)
Hassanzadeh, H., Keyvanpour, M.: A two-phase hybrid of semi-supervised and active learning approach for sequence labeling. Intell. Data Anal. 17(2), 251–270 (2013)
Korecki, J.N., Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Semi-supervised learning on large complex simulations. In: Proceedings of ICPR 2008, pp. 1–4. IEEE (2008)
Liao, W., Veeramachaneni, S.: A simple semi-supervised algorithm for named entity recognition. In: Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing, pp. 58–65. ACL (2009)
Liu, X., Zhang, S., Wei, F., Zhou, M.: Recognizing named entities in tweets. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 359–367. ACL (2011)
Liu, X., Zhou, M.: Two-stage ner for tweets with clustering. Inf. Process. Manage. 49(1), 264–273 (2013)
Nguyen, N.T.: Using consensus methods for solving conflicts of data in distributed systems. In: Hlaváč, V., Jeffery, K.G., Wiedermann, J. (eds.) SOFSEM 2000. LNCS, vol. 1963, pp. 411–419. Springer, Heidelberg (2000). doi:10.1007/3-540-44411-4_30
Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55–66), 11 (2010)
Tran, V.C., Hwang, D., Jung, J.J.: Twisner: Semi-supervised method for named entity recognition from text streams on twitter. J. Univ. Comput. Sci 22(6), 782–801 (2016)
Tran, V.C., Nguyen, T.T., Hoang, D.T., Hwang, D., Nguyen, N.T.: Active learning-based approach for named entity recognition on short text streams. In: Zgrzywa, A., Choroś, K., Siemiński, A. (eds.) Multimedia and Network Information Systems. AISC, vol. 506, pp. 321–330. Springer, Cham (2017). doi:10.1007/978-3-319-43982-2_28
Yao, L., Sun, C., Wang, X., Wang, X.: Combining self learning and active learning for chinese named entity recognition. J. Softw. 5(5), 530–537 (2010)
Zhang, Y., Wen, J., Wang, X., Jiang, Z.: Semi-supervised learning combining co-training with active learning. Expert Syst. Appl. 41(5), 2372–2378 (2014)
Acknowledgment
This work was supported by the BK21+ program of the National Research Foundation (NRF) of Korea.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tran, V.C., Hoang, D.T., Nguyen, N.T., Hwang, D. (2017). A Hybrid Method for Named Entity Recognition on Tweet Streams. In: Nguyen, N., Tojo, S., Nguyen, L., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2017. Lecture Notes in Computer Science(), vol 10191. Springer, Cham. https://doi.org/10.1007/978-3-319-54472-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-54472-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54471-7
Online ISBN: 978-3-319-54472-4
eBook Packages: Computer ScienceComputer Science (R0)