Microtext Normalization for Chatbots | SpringerLink
Skip to main content

Microtext Normalization for Chatbots

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Abstract

With the current upsurge in the usage of social media platforms, the trend of using short text, or microtext, in place of standard English has witnessed a significant rise. This work incorporates microtext normalization into a robot’s chatbot. The work leverages the fact that humans tend to write in different unconstrained ways. This work also involves a binary classifier to detect microtext, which helps in reducing the execution time of the microtext normalization module. The results show an improvement in the chatbot’s understanding and performance increase to most forms of unconstrained languages available on social media. The BLEU score is used to evaluate the efficiency before and after the normalization of sentences. Results show that the microtext normalization technique promises to increase unconstrained text understanding in a pre-trained chatbot.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.nltk.org/_modules/nltk/translate/bleu_score.html.

  2. 2.

    Reproduced by Permission©1995–2018 NetLingo®The Internet Dictionary at http://www.netlingo.com.

  3. 3.

    http://makeuseof.com/tag/30-trendy-internet-acronyms.

  4. 4.

    http://acronymsandslang.com/.

  5. 5.

    http://internetslang.com/.

  6. 6.

    http://github.com/kite1988/nus-sms-corpus.

  7. 7.

    https://developer.twitter.com/en/docs.

References

  1. Aw, A., Zhang, M., Xiao, J., Su, J.: A phrase-based statistical model for SMS text normalization. In: 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 33–40 (2006)

    Google Scholar 

  2. Beaufort, R., Roekhaut, S., Cougnon, L.A.l., Fairon, C.D.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: ACL, pp. 770–779. Association for Computational Linguistics (2010)

    Google Scholar 

  3. Brill, E., Moore, R.C.: An improved error model for noisy channel spelling correction. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pp. 286–293 (2000)

    Google Scholar 

  4. Cambria, E., Hussain, A.: Sentic album: content-, concept-, and context-based online personal photo management system. Cogn. Comput. 4(4), 477–496 (2012)

    Article  Google Scholar 

  5. Cambria, E., Poria, S., Bisio, F., Bajpai, R., Chaturvedi, I.: The CLSA model: a novel framework for concept-level sentiment analysis. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 3–22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18117-2_1

    Chapter  Google Scholar 

  6. Cambria, E., Song, Y., Wang, H., Howard, N.: Semantic multi-dimensional scaling for open-domain sentiment analysis. IEEE Intell. Syst. 29(2), 44–51 (2014)

    Article  Google Scholar 

  7. Choudhury, M., Saraf, R., Jain, V., Sarkar, S., Basu, A.: Investigation and modeling of the structure of texting language. Int. J. Doc. Anal. Recogn. 10(3–4), 157–174 (2007)

    Article  Google Scholar 

  8. Church, K.W., Gale, W.A.: Probability scoring for spelling correction. Stat. Comput. 1(2), 93–103 (1991)

    Article  Google Scholar 

  9. Cook, P., Stevenson, S.: An unsupervised model for text message normalization. In: Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, pp. 71–78 (2009)

    Google Scholar 

  10. Cui, L., Huang, S., Wei, F., Tan, C., Duan, C., Zhou, M.: SuperAgent: a customer service chatbot for e-commerce websites. Proceedings of ACL 2017, System Demonstrations, pp. 97–102 (2017)

    Google Scholar 

  11. Grassi, M., Cambria, E., Hussain, A., Piazza, F.: Sentic web: a new paradigm for managing social media affective information. Cogn. Comput. 3(3), 480–489 (2011)

    Article  Google Scholar 

  12. Han, B., Baldwin, T.: Lexical normalisation of short text messages: Makn sens a# twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 368–378 (2011)

    Google Scholar 

  13. Hutto, C.J., Gilbert, E.: VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Eighth international AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)

    Google Scholar 

  14. Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. In: International conference on natural language processing, Kharagpur, India (2010)

    Google Scholar 

  15. Khoury, R.: Microtext normalization using probably-phonetically-similar word discovery. In: 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 392–399 (2015)

    Google Scholar 

  16. Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 441–448. Association for Computational Linguistics (2008)

    Google Scholar 

  17. Li, C., Liu, Y.: Normalization of text messages using character-and phone-based machine translation approaches. In: Thirteenth Annual Conference of the International Speech Communication Association, pp. 2330–2333 (2012)

    Google Scholar 

  18. Li, M., Zhang, Y., Zhu, M., Zhou, M.: Exploring distributional similarity based models for query spelling correction. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1025–1032. ACL-44, Association for Computational Linguistics (2006)

    Google Scholar 

  19. Li, Z., Yarowsky, D.: Unsupervised translation induction for Chinese abbreviations using monolingual corpora. In: Proceedings of ACL-08: HLT, pp. 425–433. Association for Computational Linguistics (2008)

    Google Scholar 

  20. Liu, F., Weng, F., Jiang, X.: A broad-coverage normalization system for social media language. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 1035–1044. Association for Computational Linguistics (2012)

    Google Scholar 

  21. Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution? Normalizing text messages without pre-categorization nor supervision. ACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 2, pp. 71–76 (2011)

    Google Scholar 

  22. Ma, Y., Nguyen, K.L., Xing, F., Cambria, E.: A survey on empathetic dialogue systems. Inf. Fusion 64, 50–70 (2020)

    Article  Google Scholar 

  23. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  24. Pennell, D., Liu, Y.: A character-level machine translation approach for normalization of SMS abbreviations. In: Proceedings of 5th International Joint Conference on Natural Language Processing, pp. 974–982 (2011)

    Google Scholar 

  25. Pennell, D.L., Liu, Y.: Normalization of informal text. Comput. Speech Lang. 28(1), 256–277 (2014)

    Article  Google Scholar 

  26. Petrović, S., Osborne, M., Lavrenko, V.: The Edinburgh Twitter corpus. In: Proceedings of the NAACL HLT Workshop on Computational Linguistics in a World of Social Media, pp. 25–26 (2010)

    Google Scholar 

  27. Satapathy, R., Guerreiro, C., Chaturvedi, I., Cambria, E.: Phonetic-based microtext normalization for Twitter sentiment analysis. In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 407–413. IEEE (2017)

    Google Scholar 

  28. Satapathy, R., Li, Y., Cavallari, S., Cambria, E.: Seq2seq deep learning models for microtext normalization. In: 2019 inTernational Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  29. Satapathy, Ranjan, Singh, Aalind, Cambria, Erik: PhonSenticNet: a cognitive approach to microtext normalization for concept-level sentiment analysis. In: Tagarelli, Andrea, Tong, Hanghang (eds.) CSoNet 2019. LNCS, vol. 11917, pp. 177–188. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34980-6_20

    Chapter  Google Scholar 

  30. Serban, I.V., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pp. 3776–3783. AAAI Press (2016)

    Google Scholar 

  31. Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001)

    Article  Google Scholar 

  32. Susanto, Y., Livingstone, A., Ng, B.C., Cambria, E.: The hourglass model revisited. IEEE Intell. Syst. 35(5), 96–102 (2020)

    Article  Google Scholar 

  33. Toutanova, K., Moore, R.C.: Pronunciation modeling for improved spelling correction. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 144–151. Association for Computational Linguistics (2002)

    Google Scholar 

  34. Wang, P., Ng, H.T.: A beam-search decoder for normalization of social media text with application to machine translation. In: NAACL, pp. 471–481 (2013)

    Google Scholar 

  35. Wang, Z., Ho, S., Cambria, E.: A review of emotion sensing: categorization models and algorithms. Multimed. Tools Appl. 79, 35553–35582 (2020)

    Article  Google Scholar 

  36. Xu, A., Liu, Z., Guo, Y., Sinha, V., Akkiraju, R.: A new chatbot for customer service on social media. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3506–3510. ACM (2017)

    Google Scholar 

  37. Young, T., Xing, F., Pandelea, V., Ni, J., Cambria, E.: Fusing task-oriented and open-domain dialogues in conversational agents. In: Proceedings of AAAI, pp. 11622–11629 (2022)

    Google Scholar 

Download references

Acknowledgment

This research is supported by the BeingTogether Centre, a collaboration between Nanyang Technological University (NTU) Singapore and University of North Carolina (UNC) at Chapel Hill. The BeingTogether Centre is supported by the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Erik Cambria .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Satapathy, R., Cambria, E., Thalmann, N.M. (2023). Microtext Normalization for Chatbots. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics