Abstract
Generally, to start the operation of a chatbot based on information retrieval, real data generated by a domain expert is needed so that the chatbot can start its interaction and continue improving from it. It takes a lot of time and cost to create knowledge bases with all the linguistic variations. This work proposes and evaluates a data augmentation methodology for a Portuguese language corpus, based on paraphrase and reverse translation, focused on generating data to categorize intentions in conversational agents in order to reduce the time to obtain and prepare quality data. The experiments carried out in the data augmentation task and evaluated in an NLU model showed that the method is effective, presenting an average gain in accuracy of 6%.
Supported by organization x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bayer, M., Kaufhold, M.A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surv. 55, 1–39 (2021)
Beddiar, D.R., Jahan, M.S., Oussalah, M.: Data expansion using back translation and paraphrasing for hate speech detection. Online Soc. Networks Media 24, 100153 (2021)
Benayas, A., Hashempour, R., Rumble, D., Jameel, S., De Amorim, R.C.: Unified transformer multi-task learning for intent classification with entity recognition. IEEE Access 9, 147306–147314 (2021)
Cavalcante, H.G., Soares, J.N., Maia, J.E.: Question expansion in a question-answering system in a closed-domain system. Int. J. Comput. Appl. 975, 8887 (2021)
Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Expl. Newslett. 19(2), 25–35 (2017)
Damodaran, P.: Parrot: Paraphrase generation for nlu. GitHub (2021)
Fellbaum, C.: Wordnet. In: Poli, R., Healy, M., Kameas, A. (eds.) Theory and Applications of Ontology: Computer Applications, pp. 231–243. Springer, Dordrecht (2010). https://doi.org/10.1007/978-90-481-8847-5_10
Karimi, A., Rossi, L., Prati, A.: AEDA: an easier data augmentation technique for text classification. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 2748–2754 (2021)
Kumar, V., Choudhary, A., Cho, E.: Data augmentation using pre-trained transformer models. In: Proceedings of the 2nd Workshop on Life-long Learning for Spoken Language Systems, pp. 18–26 (2020)
Lee, J., Kim, J., Kang, P.: Back-translated task adaptive pretraining: improving accuracy and robustness on text classification. arXiv preprint arXiv:2107.10474 (2021)
Li, B., Hou, Y., Che, W.: Data augmentation approaches in natural language processing: a survey. AI Open (2022)
Ng, N., Cho, K., Ghassemi, M.: Ssmba: self-supervised manifold based data augmentation for improving out-of-domain robustness. In: 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, pp. 1268–1283. Association for Computational Linguistics (ACL) (2020)
Paula, R.T., Neto, D.G.A., Romero, D., Guerra, P.T.: Evaluation of synthetic datasets generation for intent classification tasks in Portuguese. In: Anais do XIII Simpósio Brasileiro de Tecnologia da Informação e da Linguagem Humana, pp. 265–274. SBC (2021)
Sahu, G., Rodriguez, P., Laradji, I., Atighehchian, P., Vazquez, D., Bahdanau, D.: Data augmentation for intent classification with off-the-shelf large language models. In: Proceedings of the 4th Workshop on NLP for Conversational AI, pp. 47–57 (2022)
Soares, J.N., Cavalcante, H.G., Maia, J.E.: A question classification in closed domain question-answer systems. Int. J. Appl. Inf. Syst. 12(38), 1–5 (2021)
Sumit, R.: Building chatbots with python. In: Using Natural Language Processing and Machine Learning (2018)
Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proceedings of the 2019 EMNLP-IJCNLP, pp. 6382–6388 (2019)
Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
do Nascimento Soares, J., Maia, J.E.B. (2023). Improving the Categorization of Intent of a Chatbot in Portuguese with Data Augmentation Obtained by Reverse Translation. In: Abraham, A., Pllana, S., Casalino, G., Ma, K., Bajaj, A. (eds) Intelligent Systems Design and Applications. ISDA 2022. Lecture Notes in Networks and Systems, vol 715. Springer, Cham. https://doi.org/10.1007/978-3-031-35507-3_40
Download citation
DOI: https://doi.org/10.1007/978-3-031-35507-3_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35506-6
Online ISBN: 978-3-031-35507-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)