Abstract
With the current upsurge in the usage of social media platforms, the trend of using short text, or microtext, in place of standard English has witnessed a significant rise. This work incorporates microtext normalization into a robot’s chatbot. The work leverages the fact that humans tend to write in different unconstrained ways. This work also involves a binary classifier to detect microtext, which helps in reducing the execution time of the microtext normalization module. The results show an improvement in the chatbot’s understanding and performance increase to most forms of unconstrained languages available on social media. The BLEU score is used to evaluate the efficiency before and after the normalization of sentences. Results show that the microtext normalization technique promises to increase unconstrained text understanding in a pre-trained chatbot.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Reproduced by Permission©1995–2018 NetLingo®The Internet Dictionary at http://www.netlingo.com.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 33–40 (2006)
Beaufort, R., Roekhaut, S., Cougnon, L.A.l., Fairon, C.D.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: ACL, pp. 770–779. Association for Computational Linguistics (2010)
Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293 (2000)
Cambria, E., Hussain, A.: Sentic album: content-, concept-, and context-based online personal photo management system. Cogn. Comput. 4(4), 477–496 (2012)
Cambria, E., Poria, S., Bisio, F., Bajpai, R., Chaturvedi, I.: The CLSA model: a novel framework for concept-level sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 3–22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_1
Cambria, E., Song, Y., Wang, H., Howard, N.: Semantic multi-dimensional scaling for open-domain sentiment analysis. IEEE Intell. Syst. 29(2), 44–51 (2014)
Choudhury, M., Saraf, R., Jain, V., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3–4), 157–174 (2007)
Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)
Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)
Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., Zhou, M.: SuperAgent: a customer service chatbot for e-commerce websites. Proceedings of ACL 2017, System Demonstrations, pp. 97–102 (2017)
Grassi, M., Cambria, E., Hussain, A., Piazza, F.: Sentic web: a new paradigm for managing social media affective information. Cogn. Comput. 3(3), 480–489 (2011)
Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 368–378 (2011)
Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)
Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. In: International conference on natural language processing, Kharagpur, India (2010)
Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 392–399 (2015)
Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 441–448. Association for Computational Linguistics (2008)
Li, C., Liu, Y.: Normalization of text messages using character-and phone-based machine translation approaches. In: Thirteenth Annual Conference of the International Speech Communication Association, pp. 2330–2333 (2012)
Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025–1032. ACL-44, Association for Computational Linguistics (2006)
Li, Z., Yarowsky, D.: Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In: Proceedings of ACL-08: HLT, pp. 425–433. Association for Computational Linguistics (2008)
Liu, F., Weng, F., Jiang, X.: A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1035–1044. Association for Computational Linguistics (2012)
Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 71–76 (2011)
Ma, Y., Nguyen, K.L., Xing, F., Cambria, E.: A survey on empathetic dialogue systems. Inf. Fusion 64, 50–70 (2020)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Pennell, D., Liu, Y.: A character-level machine translation approach for normalization of SMS abbreviations. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 974–982 (2011)
Pennell, D.L., Liu, Y.: Normalization of informal text. Comput. Speech Lang. 28(1), 256–277 (2014)
Petrović, S., Osborne, M., Lavrenko, V.: The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT Workshop on Computational Linguistics in a World of Social Media, pp. 25–26 (2010)
Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for Twitter sentiment analysis. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 407–413. IEEE (2017)
Satapathy, R., Li, Y., Cavallari, S., Cambria, E.: Seq2seq deep learning models for microtext normalization. In: 2019 inTernational Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Satapathy, Ranjan, Singh, Aalind, Cambria, Erik: PhonSenticNet: a cognitive approach to microtext normalization for concept-level sentiment analysis. In: Tagarelli, Andrea, Tong, Hanghang (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 177–188. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_20
Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3783. AAAI Press (2016)
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)
Susanto, Y., Livingstone, A., Ng, B.C., Cambria, E.: The hourglass model revisited. IEEE Intell. Syst. 35(5), 96–102 (2020)
Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151. Association for Computational Linguistics (2002)
Wang, P., Ng, H.T.: A beam-search decoder for normalization of social media text with application to machine translation. In: NAACL, pp. 471–481 (2013)
Wang, Z., Ho, S., Cambria, E.: A review of emotion sensing: categorization models and algorithms. Multimed. Tools Appl. 79, 35553–35582 (2020)
Xu, A., Liu, Z., Guo, Y., Sinha, V., Akkiraju, R.: A new chatbot for customer service on social media. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3506–3510. ACM (2017)
Young, T., Xing, F., Pandelea, V., Ni, J., Cambria, E.: Fusing task-oriented and open-domain dialogues in conversational agents. In: Proceedings of AAAI, pp. 11622–11629 (2022)
Acknowledgment
This research is supported by the BeingTogether Centre, a collaboration between Nanyang Technological University (NTU) Singapore and University of North Carolina (UNC) at Chapel Hill. The BeingTogether Centre is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Satapathy, R., Cambria, E., Thalmann, N.M. (2023). Microtext Normalization for Chatbots. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)