TRIESTE: translation based defense for text classifiers

Gupta, Anup Kumar; Paliwal, Vardhan; Rastogi, Aryan; Gupta, Puneet

doi:10.1007/s12652-022-03859-0

TRIESTE: translation based defense for text classifiers

Original Research
Published: 30 May 2022

Volume 14, pages 16385–16396, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Anup Kumar Gupta ORCID: orcid.org/0000-0003-1090-6036¹^na1,
Vardhan Paliwal²^na1,
Aryan Rastogi² &
…
Puneet Gupta¹

478 Accesses
6 Citations
Explore all metrics

Abstract

The field of natural language processing (NLP) has significantly evolved with the advent of state-of-the-art models. The discovery of these models has entirely revolutionised how NLP tasks such as machine translation, sentiment analysis and many others are performed. However, despite their high efficacy and meticulous performance, these models are prone to adversarial attacks. Adversarial attacks involve the introduction of perturbations imperceptible to humans, which can severely impact the model’s learning and prediction accuracy. Current defenses on text data include approaches such as spell-checking and adversarial training, which have their limitations against state-of-the-art adversarial attacks. This paper put forward an effective transformation-based defense, TRIESTE (TRanslatIon basEd defenSe for Text classifiErs). The proposed defense overcomes the shortcomings of existing defenses by translating the input text from the source language to a target language and again back to the source language before providing it to the text classifier. Translation ensures that the sentiment of the translated text is similar to that of the input text by taking the entire text into consideration, which leads to the removal of adversarial perturbations. Rigorous evaluation on publicly available datasets showcases that TRIESTE is successful against state-of-the-art attacks without a significant drop in the classifier accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Adversarial Attack and Defense on Natural Language Processing in Deep Learning: A Survey and Perspective

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The datasets are publicly available at: IMDB (www.ai.stanford.edu/~amaas/data/sentiment/), Yelp Polarity Reviews (http://www.course.fast.ai/datasets), and Rotten Tomatoes Movie Reviews (www.cs.cornell.edu/people/pabo/movie-review-data/). All the models and pretrained weights are from the Hugging Face repository (www.huggingface.co), and the attacks are performed using the open-source Textattack framework (www.github.com/QData/TextAttack)

References

Alshemali B, Kalita J (2019) Toward mitigating adversarial texts. Int J Comput Appl 178:1–7. https://doi.org/10.5120/ijca2019919384
Article Google Scholar
Alsmadi I, Ahmad K, Nazzal M et al (2021) Adversarial attacks and defenses for social network text processing applications: Techniques, challenges and future research directions. arXiv:2110.13980
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. arXiv:1409.0473
Bird JJ, Ekárt A, Faria DR (2021) Chatbot interaction with artificial intelligence: human data augmentation with T5 and language transformer ensemble for text classification. J Ambient Intell Hum Comput pp 1–16. https://doi.org/10.1007/s12652-021-03439-8
Brown TB, Mann B, Ryder N et al (2020) Language models are few-shot learners. In: Advances in neural information processing systems, https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html
Cao N, Li G, Zhu P et al (2019) Handling the adversarial attacks. J Ambient Intell Hum Comput 10(8):2929–2943. https://doi.org/10.1007/s12652-018-0714-6
Article Google Scholar
Cer D, Yang Y, Kong S, et al (2018) Universal sentence encoder for English. In: Conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 169–174. https://doi.org/10.18653/v1/d18-2029
Chakraborty A, Alam M, Dey V et al (2018) Adversarial attacks and defences: A survey. arXiv:1810.00069
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 4171–4186, https://doi.org/10.18653/v1/n19-1423
Eberhard DM, Simons GF, Fennig CD (2021) Ethnologue: Languages of the World. Twenty-fourth edition. SIL International, http://www.ethnologue.com/
Ebrahimi J, Rao A, Lowd D, et al (2018) Hotflip: White-box adversarial examples for text classification. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 31–36. https://aclanthology.org/P18-2006/
Forcada ML, Ginestí-Rosell M, Nordfalk J et al (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144. https://doi.org/10.1007/s10590-011-9090-0
Article Google Scholar
Gan WC, Ng HT (2019) Improving the robustness of question answering systems to question paraphrasing. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 6065–6075. https://doi.org/10.18653/v1/p19-1610
Garg S, Ramakrishnan G (2020) BAE: bert-based adversarial examples for text classification. In: Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp 6174–6181. https://doi.org/10.18653/v1/2020.emnlp-main.498
Gupta AK, Gupta P, Rahtu E (2021) Fatalread-fooling visual speech recognition models. Applied Intelligence pp 1–16. https://doi.org/10.1007/s10489-021-02846-w
Gupta P, Rahtu E (2019) Ciidefence: Defeating adversarial attacks by fusing class-specific image inpainting and image denoising. In: International Conference on Computer Vision, pp 6708–6717. https://doi.org/10.1109/ICCV.2019.00681
Iyyer M, Wieting J, Gimpel K, et al (2018) Adversarial example generation with syntactically controlled paraphrase networks. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 1875–1885, https://doi.org/10.18653/v1/n18-1170
Jain PK, Quamer W, Pamula R, et al (2021) SpSAN: Sparse self-attentive network-based aspect-aware model for sentiment analysis. J Ambient Intell Hum Comput, pp 1–18. https://doi.org/10.1007/s12652-021-03436-x
Jia R, Liang P (2017) Adversarial examples for evaluating reading comprehension systems. In: Empirical methods in natural language processing. Assoc Comput Linguist, pp 2021–2031. https://doi.org/10.18653/v1/d17-1215
Jin D, Jin Z, Zhou JT, et al (2020) Is BERT really robust? A strong baseline for natural language attack on text classification and entailment. In: Conference on Artificial Intelligence. AAAI Press, pp 8018–8025. https://aaai.org/ojs/index.php/AAAI/article/view/6311
Junczys-Dowmunt M, Grundkiewicz R, Dwojak T, et al (2018) Marian: Fast neural machine translation in C++. In: Association for computational linguistics, pp 116–121. https://www.aclweb.org/anthology/P18-4020/
Lan Z, Chen M, Goodman S, et al (2020) ALBERT: A lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=H1eA7AEtvS
Li D, Zhang Y, Peng H, et al (2021) Contextualized perturbation for textual adversarial attack. In: Association for Computational Linguistics: Human Language Technologies, pp 5053–5069. https://doi.org/10.18653/v1/2021.naacl-main.400
Liu Y, Ott M, Goyal N, et al (2019) Roberta: A robustly optimized BERT pretraining approach. arXiv:1907.11692
Liu Y, Gu J, Goyal N, et al (2020) Multilingual denoising pre-training for neural machine translation. Trans Assoc Comput Linguist 8:726–742. https://transacl.org/ojs/index.php/tacl/article/view/2107
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3). https://doi.org/10.1145/1380584.1380586
Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. In: Association for Computational Linguistics: Human Language Technologies. The Association for Computer Linguistics, pp 142–150, https://www.aclweb.org/anthology/P11-1015/
Madichetty S, Muthukumarasamy S, Jayadev P (2021) Multi-modal classification of twitter data during disasters for humanitarian response. J Ambient Intell Hum Comput pp 1–15. https://doi.org/10.1007/s12652-020-02791-5
Mishra S, Gupta AK, Gupta P (2021) Dare: Deceiving audio-visual speech recognition model. Knowl-Based Syst 232(107):503. https://doi.org/10.1016/j.knosys.2021.107503
Article Google Scholar
Morris JX, Lifland E, Yoo JY, et al (2020) Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in NLP. In: Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp 119–126, https://doi.org/10.18653/v1/2020.emnlp-demos.16
Nguyen MT, Le DT, Le L (2021) Transformers-based information extraction with limited data for domain-specific business documents. Eng Appl Artif Intell 97(104):100. https://doi.org/10.1016/j.engappai.2020.104100
Article Google Scholar
Okpor M (2014) Machine translation approaches: issues and challenges. Int J Comput Sci Issues 11(5):159. https://www.ijcsi.org/papers/IJCSI-11-5-2-159-165.pdf
Pang B, Lee L (2005) Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, pp 115–124. https://www.aclweb.org/anthology/P05-1015/
Pruthi D, Dhingra B, Lipton ZC (2019) Combating adversarial misspellings with robust word recognition. In: Conference of the Association for Computational Linguistics. Association for Computational Linguistics, pp 5582–5591. https://doi.org/10.18653/v1/p19-1561
Raffel C, Shazeer N, Roberts A, et al (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21:140:1–140:67. http://jmlr.org/papers/v21/20-074.html
Rothe S, Narayan S, Severyn A (2020) Leveraging pre-trained checkpoints for sequence generation tasks. Trans Assoc Comput Linguistic 8:264–280. https://transacl.org/ojs/index.php/tacl/article/view/1849
Vaswani A, Shazeer N, Parmar N, et al (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang W, Wang R, Wang L, et al (2019a) Towards a robust deep neural network in texts: a survey. arXiv:1902.07285
Wang X, Jin H, He K (2019b) Natural language adversarial attacks and defenses in word level. arXiv:1909.06723
Wang Y, Bansal M (2018) Robust machine comprehension models via adversarial training. In: Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp 575–581. https://doi.org/10.18653/v1/n18-2091
Wolf T, Debut L, Sanh V, et al (2019) Huggingface’s transformers: State-of-the-art natural language processing. arXiv:1910.03771
Zang Y, Qi F, Yang C, et al (2020) Word-level textual adversarial attacking as combinatorial optimization. In: Association for Computational Linguistics. Association for Computational Linguistics, pp 6066–6080. https://doi.org/10.18653/v1/2020.acl-main.540
Zhang WE, Sheng QZ, Alhazmi AAF, et al (2020) Adversarial attacks on deep-learning models in natural language processing: a survey. ACM Trans Intell Syst Technol 11(3):24:1–24:41. https://doi.org/10.1145/3374217
Zhang X, Zhao JJ, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems: Annual Conference on Neural Information Processing Systems, pp 649–657. https://proceedings.neurips.cc/paper/2015/hash/250cf8b51c773f3f8dc8b4be867a9a02-Abstract.html
Zhao Z, Dua D, Singh S (2018) Generating natural adversarial examples. In: International Conference on Learning Representations. OpenReview.net. https://openreview.net/forum?id=H1BLjgZCb

Download references

Acknowledgements

We would like to thank the respective authors for providing code and pretrained models. We are also thankful to the anonymous reviewers for their valuable suggestions to improve the quality of the paper. Anup Kumar Gupta acknowledges the support of Prime Minister Research Fellowship (PMRF) program of the Government of India.

Funding

The work of Anup Kumar Gupta is partially supported by Prime Minister’s Research Fellowship (PMRF), the Ministry of Education, Government of India (PMRF-192002-1909).

Author information

Anup Kumar Gupta and Vardhan Paliwal have contributed equally to this work.

Authors and Affiliations

Department of Computer Science and Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, 452020, India
Anup Kumar Gupta & Puneet Gupta
Department of Electrical Engineering, Indian Institute of Technology Indore, Indore, Madhya Pradesh, 452020, India
Vardhan Paliwal & Aryan Rastogi

Authors

Anup Kumar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Vardhan Paliwal
View author publications
You can also search for this author in PubMed Google Scholar
Aryan Rastogi
View author publications
You can also search for this author in PubMed Google Scholar
Puneet Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by Anup Kumar Gupta, Vardhan Paliwal and Aryan Rastogi. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anup Kumar Gupta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the publication of the research article.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All the paper authors have provided their consent for the publication.

Code availability

The implementation is available at https://github.com/AnupKumarGupta/TRIESTE.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 107 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gupta, A.K., Paliwal, V., Rastogi, A. et al. TRIESTE: translation based defense for text classifiers. J Ambient Intell Human Comput 14, 16385–16396 (2023). https://doi.org/10.1007/s12652-022-03859-0

Download citation

Received: 08 October 2021
Accepted: 14 April 2022
Published: 30 May 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s12652-022-03859-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

TRIESTE: translation based defense for text classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Attack and Defense on Natural Language Processing in Deep Learning: A Survey and Perspective

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 107 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

TRIESTE: translation based defense for text classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Adversarial Attack and Defense on Natural Language Processing in Deep Learning: A Survey and Perspective

Towards Robustness of Large Language Models on Text-to-SQL Task: An Adversarial and Cross-Domain Investigation

Translated Texts Under the Lens: From Machine Translation Detection to Source Language Identification

Explore related subjects

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 107 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation