Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation | SpringerLink
Skip to main content

Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2022)

Abstract

Generally, to start the operation of a chatbot based on information retrieval, real data generated by a domain expert is needed so that the chatbot can start its interaction and continue improving from it. It takes a lot of time and cost to create knowledge bases with all the linguistic variations. This work proposes and evaluates a data augmentation methodology for a Portuguese language corpus, based on paraphrase and reverse translation, focused on generating data to categorize intentions in conversational agents in order to reduce the time to obtain and prepare quality data. The experiments carried out in the data augmentation task and evaluated in an NLU model showed that the method is effective, presenting an average gain in accuracy of 6%.

Supported by organization x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 26311
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 32889
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://translate.google.com.br/.

  2. 2.

    https://www.deepl.com/translator.

References

  1. Bayer, M., Kaufhold, M.A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surv. 55, 1–39 (2021)

    Article  Google Scholar 

  2. Beddiar, D.R., Jahan, M.S., Oussalah, M.: Data expansion using back translation and paraphrasing for hate speech detection. Online Soc. Networks Media 24, 100153 (2021)

    Article  Google Scholar 

  3. Benayas, A., Hashempour, R., Rumble, D., Jameel, S., De Amorim, R.C.: Unified transformer multi-task learning for intent classification with entity recognition. IEEE Access 9, 147306–147314 (2021)

    Article  Google Scholar 

  4. Cavalcante, H.G., Soares, J.N., Maia, J.E.: Question expansion in a question-answering system in a closed-domain system. Int. J. Comput. Appl. 975, 8887 (2021)

    Google Scholar 

  5. Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Expl. Newslett. 19(2), 25–35 (2017)

    Article  Google Scholar 

  6. Damodaran, P.: Parrot: Paraphrase generation for nlu. GitHub (2021)

    Google Scholar 

  7. Fellbaum, C.: Wordnet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10

  8. Karimi, A., Rossi, L., Prati, A.: AEDA: an easier data augmentation technique for text classification. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2748–2754 (2021)

    Google Scholar 

  9. Kumar, V., Choudhary, A., Cho, E.: Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pp. 18–26 (2020)

    Google Scholar 

  10. Lee, J., Kim, J., Kang, P.: Back-translated task adaptive pretraining: improving accuracy and robustness on text classification. arXiv preprint arXiv:2107.10474 (2021)

  11. Li, B., Hou, Y., Che, W.: Data augmentation approaches in natural language processing: a survey. AI Open (2022)

    Google Scholar 

  12. Ng, N., Cho, K., Ghassemi, M.: Ssmba: self-supervised manifold based data augmentation for improving out-of-domain robustness. In: 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pp. 1268–1283. Association for Computational Linguistics (ACL) (2020)

    Google Scholar 

  13. Paula, R.T., Neto, D.G.A., Romero, D., Guerra, P.T.: Evaluation of synthetic datasets generation for intent classification tasks in Portuguese. In: Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pp. 265–274. SBC (2021)

    Google Scholar 

  14. Sahu, G., Rodriguez, P., Laradji, I., Atighehchian, P., Vazquez, D., Bahdanau, D.: Data augmentation for intent classification with off-the-shelf large language models. In: Proceedings of the 4th Workshop on NLP for Conversational AI, pp. 47–57 (2022)

    Google Scholar 

  15. Soares, J.N., Cavalcante, H.G., Maia, J.E.: A question classification in closed domain question-answer systems. Int. J. Appl. Inf. Syst. 12(38), 1–5 (2021)

    Google Scholar 

  16. Sumit, R.: Building chatbots with python. In: Using Natural Language Processing and Machine Learning (2018)

    Google Scholar 

  17. Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 EMNLP-IJCNLP, pp. 6382–6388 (2019)

    Google Scholar 

  18. Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jéferson do Nascimento Soares .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

do Nascimento Soares, J., Maia, J.E.B. (2023). Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 715. Springer, Cham. https://doi.org/10.1007/978-3-031-35507-3_40

Download citation

Publish with us

Policies and ethics