Abstract
The field of natural language processing (NLP) has significantly evolved with the advent of state-of-the-art models. The discovery of these models has entirely revolutionised how NLP tasks such as machine translation, sentiment analysis and many others are performed. However, despite their high efficacy and meticulous performance, these models are prone to adversarial attacks. Adversarial attacks involve the introduction of perturbations imperceptible to humans, which can severely impact the model’s learning and prediction accuracy. Current defenses on text data include approaches such as spell-checking and adversarial training, which have their limitations against state-of-the-art adversarial attacks. This paper put forward an effective transformation-based defense, TRIESTE (TRanslatIon basEd defenSe for Text classifiErs). The proposed defense overcomes the shortcomings of existing defenses by translating the input text from the source language to a target language and again back to the source language before providing it to the text classifier. Translation ensures that the sentiment of the translated text is similar to that of the input text by taking the entire text into consideration, which leads to the removal of adversarial perturbations. Rigorous evaluation on publicly available datasets showcases that TRIESTE is successful against state-of-the-art attacks without a significant drop in the classifier accuracy.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The datasets are publicly available at: IMDB (www.ai.stanford.edu/~amaas/data/sentiment/), Yelp Polarity Reviews (http://www.course.fast.ai/datasets), and Rotten Tomatoes Movie Reviews (www.cs.cornell.edu/people/pabo/movie-review-data/). All the models and pretrained weights are from the Hugging Face repository (www.huggingface.co), and the attacks are performed using the open-source Textattack framework (www.github.com/QData/TextAttack)
References
Alshemali B, Kalita J (2019) Toward mitigating adversarial texts. Int J Comput Appl 178:1–7. https://doi.org/10.5120/ijca2019919384
Alsmadi I, Ahmad K, Nazzal M et al (2021) Adversarial attacks and defenses for social network text processing applications: Techniques, challenges and future research directions. arXiv:2110.13980
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. arXiv:1409.0473
Bird JJ, Ekárt A, Faria DR (2021) Chatbot interaction with artificial intelligence: human data augmentation with T5 and language transformer ensemble for text classification. J Ambient Intell Hum Comput pp 1–16. https://doi.org/10.1007/s12652-021-03439-8
Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems, https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Cao N, Li G, Zhu P et al (2019) Handling the adversarial attacks. J Ambient Intell Hum Comput 10(8):2929–2943. https://doi.org/10.1007/s12652-018-0714-6
Cer D, Yang Y, Kong S, et al (2018) Universal sentence encoder for English. In: Conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 169–174. https://doi.org/10.18653/v1/d18-2029
Chakraborty A, Alam M, Dey V et al (2018) Adversarial attacks and defences: A survey. arXiv:1810.00069
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 4171–4186, https://doi.org/10.18653/v1/n19-1423
Eberhard DM, Simons GF, Fennig CD (2021) Ethnologue: Languages of the World. Twenty-fourth edition. SIL International, http://www.ethnologue.com/
Ebrahimi J, Rao A, Lowd D, et al (2018) Hotflip: White-box adversarial examples for text classification. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 31–36. https://aclanthology.org/P18-2006/
Forcada ML, Ginestí-Rosell M, Nordfalk J et al (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144. https://doi.org/10.1007/s10590-011-9090-0
Gan WC, Ng HT (2019) Improving the robustness of question answering systems to question paraphrasing. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 6065–6075. https://doi.org/10.18653/v1/p19-1610
Garg S, Ramakrishnan G (2020) BAE: bert-based adversarial examples for text classification. In: Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
Gupta AK, Gupta P, Rahtu E (2021) Fatalread-fooling visual speech recognition models. Applied Intelligence pp 1–16. https://doi.org/10.1007/s10489-021-02846-w
Gupta P, Rahtu E (2019) Ciidefence: Defeating adversarial attacks by fusing class-specific image inpainting and image denoising. In: International Conference on Computer Vision, pp 6708–6717. https://doi.org/10.1109/ICCV.2019.00681
Iyyer M, Wieting J, Gimpel K, et al (2018) Adversarial example generation with syntactically controlled paraphrase networks. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 1875–1885, https://doi.org/10.18653/v1/n18-1170
Jain PK, Quamer W, Pamula R, et al (2021) SpSAN: Sparse self-attentive network-based aspect-aware model for sentiment analysis. J Ambient Intell Hum Comput, pp 1–18. https://doi.org/10.1007/s12652-021-03436-x
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Empirical methods in natural language processing. Assoc Comput Linguist, pp 2021–2031. https://doi.org/10.18653/v1/d17-1215
Jin D, Jin Z, Zhou JT, et al (2020) Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Conference on Artificial Intelligence. AAAI Press, pp 8018–8025. https://aaai.org/ojs/index.php/AAAI/article/view/6311
Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, et al (2018) Marian: Fast neural machine translation in C++. In: Association for computational linguistics, pp 116–121. https://www.aclweb.org/anthology/P18-4020/
Lan Z, Chen M, Goodman S, et al (2020) ALBERT: A lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=H1eA7AEtvS
Li D, Zhang Y, Peng H, et al (2021) Contextualized perturbation for textual adversarial attack. In: Association for Computational Linguistics: Human Language Technologies, pp 5053–5069. https://doi.org/10.18653/v1/2021.naacl-main.400
Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692
Liu Y, Gu J, Goyal N, et al (2020) Multilingual denoising pre-training for neural machine translation. Trans Assoc Comput Linguist 8:726–742. https://transacl.org/ojs/index.php/tacl/article/view/2107
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3). https://doi.org/10.1145/1380584.1380586
Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. In: Association for Computational Linguistics: Human Language Technologies. The Association for Computer Linguistics, pp 142–150, https://www.aclweb.org/anthology/P11-1015/
Madichetty S, Muthukumarasamy S, Jayadev P (2021) Multi-modal classification of twitter data during disasters for humanitarian response. J Ambient Intell Hum Comput pp 1–15. https://doi.org/10.1007/s12652-020-02791-5
Mishra S, Gupta AK, Gupta P (2021) Dare: Deceiving audio-visual speech recognition model. Knowl-Based Syst 232(107):503. https://doi.org/10.1016/j.knosys.2021.107503
Morris JX, Lifland E, Yoo JY, et al (2020) Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp 119–126, https://doi.org/10.18653/v1/2020.emnlp-demos.16
Nguyen MT, Le DT, Le L (2021) Transformers-based information extraction with limited data for domain-specific business documents. Eng Appl Artif Intell 97(104):100. https://doi.org/10.1016/j.engappai.2020.104100
Okpor M (2014) Machine translation approaches: issues and challenges. Int J Comput Sci Issues 11(5):159. https://www.ijcsi.org/papers/IJCSI-11-5-2-159-165.pdf
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, pp 115–124. https://www.aclweb.org/anthology/P05-1015/
Pruthi D, Dhingra B, Lipton ZC (2019) Combating adversarial misspellings with robust word recognition. In: Conference of the Association for Computational Linguistics. Association for Computational Linguistics, pp 5582–5591. https://doi.org/10.18653/v1/p19-1561
Raffel C, Shazeer N, Roberts A, et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:140:1–140:67. http://jmlr.org/papers/v21/20-074.html
Rothe S, Narayan S, Severyn A (2020) Leveraging pre-trained checkpoints for sequence generation tasks. Trans Assoc Comput Linguistic 8:264–280. https://transacl.org/ojs/index.php/tacl/article/view/1849
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang W, Wang R, Wang L, et al (2019a) Towards a robust deep neural network in texts: a survey. arXiv:1902.07285
Wang X, Jin H, He K (2019b) Natural language adversarial attacks and defenses in word level. arXiv:1909.06723
Wang Y, Bansal M (2018) Robust machine comprehension models via adversarial training. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 575–581. https://doi.org/10.18653/v1/n18-2091
Wolf T, Debut L, Sanh V, et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. arXiv:1910.03771
Zang Y, Qi F, Yang C, et al (2020) Word-level textual adversarial attacking as combinatorial optimization. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 6066–6080. https://doi.org/10.18653/v1/2020.acl-main.540
Zhang WE, Sheng QZ, Alhazmi AAF, et al (2020) Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans Intell Syst Technol 11(3):24:1–24:41. https://doi.org/10.1145/3374217
Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems, pp 649–657. https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html
Zhao Z, Dua D, Singh S (2018) Generating natural adversarial examples. In: International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=H1BLjgZCb
Acknowledgements
We would like to thank the respective authors for providing code and pretrained models. We are also thankful to the anonymous reviewers for their valuable suggestions to improve the quality of the paper. Anup Kumar Gupta acknowledges the support of Prime Minister Research Fellowship (PMRF) program of the Government of India.
Funding
The work of Anup Kumar Gupta is partially supported by Prime Minister’s Research Fellowship (PMRF), the Ministry of Education, Government of India (PMRF-192002-1909).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by Anup Kumar Gupta, Vardhan Paliwal and Aryan Rastogi. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest regarding the publication of the research article.
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
All the paper authors have provided their consent for the publication.
Code availability
The implementation is available at https://github.com/AnupKumarGupta/TRIESTE.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gupta, A.K., Paliwal, V., Rastogi, A. et al. TRIESTE: translation based defense for text classifiers. J Ambient Intell Human Comput 14, 16385–16396 (2023). https://doi.org/10.1007/s12652-022-03859-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-022-03859-0