Abstract
For a range of Natural Language Processing (NLP) applications, including Sentiment Analysis, Sarcasm Detection, Information Retrieval, Question Answering, and Named Entity Identification, text derived from multiple users’ posts and what they comment on social media constitute significant information (IR). All such applications require part-of-speech (POS) tagging to add tag information to the raw text. Code-mixing, a social media user’s natural desire to submit content in multiple languages, presents a difficulty to POS tagging. In addition, sophisticated and freestyle writing increases the intricacy of the issue. For POS tagging of Code-Mixed Indian social media text, a supervised algorithm using Hidden Markov Model (HMM) with the Viterbi algorithm has been developed to address the problem. The suggested system has been trained and tested using publicly accessible social media text in Indian languages (ILs), particularly Bengali, Telugu, English, and Hindi. On the basis of the F-measure, the accuracy of the system-annotated tags have been assessed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
Ambikairajah, E., Li, H., Wang, L., Yin, B., Sethu, V.: Language identification: a tutorial. IEEE Circuits Syst. Mag. 11(2), 82–108 (2011)
Bandyopadhyay, S., Ekbal, A.: HMM based POS tagger and rule-based chunker for Bengali. In: Advances in Pattern Recognition, pp. 384–390. World Scientific (2007)
Banko, M., Moore, R.C.: Part-of-speech tagging in context. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pp. 556–561 (2004)
Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)
Bishwas, A.K., Mani, A., Palade, V.: Parts of speech tagging in NLP: runtime optimization with quantum formulation and ZX calculus. arXiv preprint arXiv:2007.10328 (2020)
Eddy, S.R.: Hidden Markov models. Curr. Opin. Struct. Biol. 6(3), 361–365 (1996)
Ekbal, A., Mondal, S., Bandyopadhyay, S.: POS tagging using HMM and rule-based chunking. Proc. SPSAL 8(1), 25–28 (2007)
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Gadde, P., Yeleti, M.V.: Improving statistical POS tagging using linguistic feature for Hindi and Telugu. In: Proceedings of ICON (2008)
Ghosh, S., Ghosh, S., Das, D.: Part-of-speech tagging of code-mixed social media text. In: Proceedings of the Second Workshop on Computational Approaches to Code Switching, pp. 90–97 (2016)
Hasan, F.M., UzZaman, N., Khan, M.: Comparison of different POS tagging techniques (n-gram, HMM and Brill’s tagger) for Bangla. In: Elleithy, K. (ed.) Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 121–126. Springer, Dordrecht (2007). https://doi.org/10.1007/978-1-4020-6264-3_23
Jamatia, A., Gambäck, B., Das, A.: Part-of-speech tagging for code-mixed English-Hindi Twitter and Facebook chat messages. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 239–248 (2015)
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: 2011 IEEE 11th International Conference on Data Mining Workshops, pp. 251–258. IEEE (2011)
Nadkarni, P.M., Ohno-Machado, L., Chapman, W.W.: Natural language processing: an introduction. J. Am. Med. Inform. Assoc. 18(5), 544–551 (2011)
Naseem, T., Snyder, B., Eisenstein, J., Barzilay, R.: Multilingual part-of-speech tagging: two unsupervised approaches. J. Artif. Intell. Res. 36, 341–385 (2009)
Nave, M., Rita, P., Guerreiro, J.: A decision support system framework to track consumer sentiments in social media. J. Hospitality Market. Manag. 27(6), 693–710 (2018)
Pakray, P., Majumder, G., Pathak, A.: An HMM based POS tagger for POS tagging of code-mixed Indian social media text. In: Mandal, J.K., Sinha, D. (eds.) CSI 2018. CCIS, vol. 836, pp. 495–504. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-1343-1_41
Pandey, S., Dadure, P., Nunsanga, M.V., Pakray, P.: Parts of speech tagging towards classical to quantum computing. In: 2022 IEEE Silchar Subsection Conference (SILCON), pp. 1–6. IEEE (2022)
Plank, B., Søgaard, A., Goldberg, Y.: Multilingual part-of-speech tagging with bidirectional long short-term memory models and auxiliary loss. arXiv preprint arXiv:1604.05529 (2016)
Shinghal, R., Toussaint, G.T.: Experiments in text recognition with the modified Viterbi algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 2, 184–193 (1979)
Singh, K., Sen, I., Kumaraguru, P.: A Twitter corpus for Hindi-English code mixed POS tagging. In: Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media, pp. 12–17. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/W18-3503. https://aclanthology.org/W18-3503
Sniedovich, M.: Dynamic Programming, vol. 297. CRC Press (1991)
Srinivasan, S., Gordon, G., Boots, B.: Learning hidden quantum Markov models. In: International Conference on Artificial Intelligence and Statistics, pp. 1979–1987. PMLR (2018)
Taylor, A., Marcus, M., Santorini, B.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks, vol. 20, pp. 5–22. Springer, Dordrecht (2003). https://doi.org/10.1007/978-94-010-0201-1_1
Vyas, Y., Gella, S., Sharma, J., Bali, K., Choudhury, M.: POS tagging of English-Hindi code-mixed social media content. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 974–979 (2014)
Acknowledgement
The work presented here is a part of experiments being conducted under the Research Project Grant Ref. No. N-21/17/2020-NeGD supported by MeitY Quantum Computing Applications Lab (QCAL) and Amazon-braket. We also extend our gratitude to the Department of CSE, NIT Silchar, and the Center for Natural Language Processing for their support.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Basisth, N.J., Sachan, T., Kumari, N., Pandey, S., Pakray, P. (2024). An Automatic POS Tagger System for Code Mixed Indian Social Media Text. In: Dasgupta, K., Mukhopadhyay, S., Mandal, J.K., Dutta, P. (eds) Computational Intelligence in Communications and Business Analytics. CICBA 2023. Communications in Computer and Information Science, vol 1956. Springer, Cham. https://doi.org/10.1007/978-3-031-48879-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-48879-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48878-8
Online ISBN: 978-3-031-48879-5
eBook Packages: Computer ScienceComputer Science (R0)