TRIESTE: translation based defense for text classifiers | Journal of Ambient Intelligence and Humanized Computing Skip to main content
Log in

TRIESTE: translation based defense for text classifiers

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

The field of natural language processing (NLP) has significantly evolved with the advent of state-of-the-art models. The discovery of these models has entirely revolutionised how NLP tasks such as machine translation, sentiment analysis and many others are performed. However, despite their high efficacy and meticulous performance, these models are prone to adversarial attacks. Adversarial attacks involve the introduction of perturbations imperceptible to humans, which can severely impact the model’s learning and prediction accuracy. Current defenses on text data include approaches such as spell-checking and adversarial training, which have their limitations against state-of-the-art adversarial attacks. This paper put forward an effective transformation-based defense, TRIESTE (TRanslatIon basEd defenSe for Text classifiErs). The proposed defense overcomes the shortcomings of existing defenses by translating the input text from the source language to a target language and again back to the source language before providing it to the text classifier. Translation ensures that the sentiment of the translated text is similar to that of the input text by taking the entire text into consideration, which leads to the removal of adversarial perturbations. Rigorous evaluation on publicly available datasets showcases that TRIESTE is successful against state-of-the-art attacks without a significant drop in the classifier accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The datasets are publicly available at: IMDB (www.ai.stanford.edu/~amaas/data/sentiment/), Yelp Polarity Reviews (http://www.course.fast.ai/datasets), and Rotten Tomatoes Movie Reviews (www.cs.cornell.edu/people/pabo/movie-review-data/). All the models and pretrained weights are from the Hugging Face repository (www.huggingface.co), and the attacks are performed using the open-source Textattack framework (www.github.com/QData/TextAttack)

References

Download references

Acknowledgements

We would like to thank the respective authors for providing code and pretrained models. We are also thankful to the anonymous reviewers for their valuable suggestions to improve the quality of the paper. Anup Kumar Gupta acknowledges the support of Prime Minister Research Fellowship (PMRF) program of the Government of India.

Funding

The work of Anup Kumar Gupta is partially supported by Prime Minister’s Research Fellowship (PMRF), the Ministry of Education, Government of India (PMRF-192002-1909).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study’s conception and design. Material preparation, data collection and analysis were performed by Anup Kumar Gupta, Vardhan Paliwal and Aryan Rastogi. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Anup Kumar Gupta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the publication of the research article.

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

All the paper authors have provided their consent for the publication.

Code availability

The implementation is available at https://github.com/AnupKumarGupta/TRIESTE.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 107 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, A.K., Paliwal, V., Rastogi, A. et al. TRIESTE: translation based defense for text classifiers. J Ambient Intell Human Comput 14, 16385–16396 (2023). https://doi.org/10.1007/s12652-022-03859-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-022-03859-0

Keywords

Navigation