DFaP: Data Filtering and Purification Against Backdoor Attacks | SpringerLink
Skip to main content

DFaP: Data Filtering and Purification Against Backdoor Attacks

  • Conference paper
  • First Online:
Artificial Intelligence Security and Privacy (AIS&P 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14509))

  • 742 Accesses

Abstract

The rapid development of deep learning has led to a dramatic increase in user demand for training data. As a result, users are often compelled to acquire data from unsecured external sources through automated methods or outsourcing. Therefore, severe backdoor attacks occur during the training data collection phase of the DNNs pipeline, where adversaries can stealthily control DNNs to make expected or unintended outputs by contaminating the training data. In this paper, we propose a novel backdoor defense framework called DFaP (Data Filter and Purify). DFaP can make backdoor samples with local-patch or full-image triggers added harmless without needing additional clean samples. With DFaP, users can safely train clean DNN models with unsecured data. We have conducted experiments on two networks (AlexNet, ResNet-34) and two datasets (CIFAR10, GTSRB). The experimental results show that DFaP can defend against six state-of-the-art backdoor attacks. In comparison to the other four defense methods, DFaP demonstrates superior performance with an average reduction in attack success rate of 98.01%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, C., Seff, A., Kornhauser, A., et al.: DeepDriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)

    Google Scholar 

  2. Tian, Y., Pei, K., Jana, S., et al.: DeepTest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, pp. 303–314 (2018)

    Google Scholar 

  3. Jung, C., Shim, D.H.: Incorporating multi-context into the traversability map for urban autonomous driving using deep inverse reinforcement learning. IEEE Robot. Autom. Lett. 6(2), 1662–1669 (2021)

    Article  Google Scholar 

  4. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  5. Guo, J., Han, K., Wang, Y., et al.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)

    Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Xie, W., Feng, Y., Gu, S., et al.: Importance-based neuron allocation for multilingual neural machine translation. arXiv preprint arXiv:2107.06569 (2021)

  8. Gao, Y., Doan, B.G., Zhang, Z., et al.: Backdoor attacks and countermeasures on deep learning: a comprehensive review. arXiv preprint arXiv:2007.10760 (2020)

  9. Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)

  10. Turner, A., Tsipras, D., Madry, A.: Label-Consistent Backdoor Attacks. stat 1050, 6 (2019)

    Google Scholar 

  11. Li, S., Xue, M., Zhao, B.Z.H., et al.: Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secure Comput. 18(5), 2088–2105 (2020)

    Google Scholar 

  12. Wang, T., Yao, Y., Xu, F., et al.: Backdoor attack through frequency domain. arXiv preprint arXiv:2111.10991 (2021)

  13. Pang, R., Zhang, Z., Gao, X., et al.: TROJANZOO: towards unified, holistic, and practical evaluation of neural backdoors. In:2022 IEEE 7th European Symposium on Security and Privacy (EuroS &P), pp. 684–702. IEEE (2022)

    Google Scholar 

  14. Chou, E., Tramer, F., Pellegrino, G.: SentiNet: detecting localized universal attacks against deep learning systems. In: 2020 IEEE Security and Privacy Workshops (SPW), pp. 48–54. IEEE (2020)

    Google Scholar 

  15. Zhong, H., Liao, C., Squicciarini, A.C., et al.: Backdoor embedding in convolutional neural network models via invisible perturbation. In: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp. 97–108 (2020)

    Google Scholar 

  16. Shafahi, A., Huang, W.R., Najibi, M., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, 31 (2018)

    Google Scholar 

  17. Zhu, C., Huang, W.R., Li, H., et al.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning. PMLR, pp. 7614–7623 (2019)

    Google Scholar 

  18. Barni, M., Kallas, K., Tondi, B.: A new backdoor attack in CNNs by training set corruption without label poisoning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105. IEEE (2019)

    Google Scholar 

  19. Quanxin, Z., Wencong, M.A., Yajie, W., et al.: Backdoor attacks on image classification models in deep neural networks. Chin. J. Electron. (2022). https://doi.org/10.1049/cje.2021.00.126

  20. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  Google Scholar 

  21. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  22. Li, Y., Sha, T., Baker, T., et al.: Adaptive vertical federated learning via feature map transferring in mobile edge computing. Computing, 1–17 (2022). https://doi.org/10.1007/s00607-022-01117-x

  23. Yang, J., Baker, T., Gill, S.S., et al.: A federated learning attack method based on edge collaboration via cloud. Softw. Pract. Exp. (2022)

    Google Scholar 

  24. Zheng, J., Zhang, Y., Li, Y., et al.: Towards evaluating the robustness of adversarial attacks against image scaling transformation. Chin. J. Electron. 32(1), 151–158 (2023)

    Article  Google Scholar 

  25. Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks. In: 25th Annual Network and Distributed System Security Symposium (NDSS 2018). Internet Soc (2018)

    Google Scholar 

  26. Zhang, Y., Tan, Y., Sun, H., et al.: Improving the invisibility of adversarial examples with perceptually adaptive perturbation. Inf. Sci. 635, 126–137 (2023)

    Article  Google Scholar 

  27. Wang, Y., Tan, Y., Lyu, H., et al.: Toward feature space adversarial attack in the frequency domain. Int. J. Intell. Syst. 37(12), 11019–11036 (2022)

    Article  Google Scholar 

  28. Wang, B., Yao, Y., Shan, S., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)

    Google Scholar 

  29. Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-Pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13

    Chapter  Google Scholar 

  30. Li, Y., Lyu, X., Koren, N., et al.: Neural attention distillation: erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)

  31. Zeng, Y., Chen, S., Park, W., et al.: Adversarial unlearning of backdoors via implicit hypergradient. In: International Conference on Learning Representations

    Google Scholar 

  32. Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, 31 (2018)

    Google Scholar 

  33. Hayase, J., Kong, W., Somani, R., et al.: SPECTRE: defending against backdoor attacks using robust statistics. In: International Conference on Machine Learning, pp. 4129–4139. PMLR (2021)

    Google Scholar 

  34. Gao, Y., Xu, C., Wang, D., et al.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125 (2019)

    Google Scholar 

  35. Yang, J., Zheng, J., Zhang, Z., et al.: Security of federated learning for cloud-edge intelligence collaborative computing. Int. J. Intell. Syst. 37(11), 9290–9308 (2022)

    Article  Google Scholar 

  36. Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.. Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, pp. 897–912 (2020)

    Google Scholar 

  37. Tang, D., Wang, X.F., Tang, H., et al.: Demon in the variant: statistical analysis of DNNs for robust backdoor contamination detection. In: USENIX Security Symposium, pp. 1541–1558 (2021)

    Google Scholar 

  38. Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)

    Google Scholar 

  39. Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools 9(1), 23–34 (2004)

    Article  Google Scholar 

  40. Batson, J., Royer, L.. Noise2self: blind denoising by self-supervision. In: International Conference on Machine Learning. PMLR, pp. 524–533 (2019)

    Google Scholar 

  41. Stallkamp, J., Schlipsing, M., Salmen, J., et al.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)

    Google Scholar 

  42. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  43. Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)

    Google Scholar 

  44. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  45. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  46. Guo, W., Wang, L., Xing, X., et al.: TABOR: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv e-prints (2019). arXiv: 1908.01763

  47. Subramanya, A., Pillai, V., Pirsiavash, H.: Fooling network interpretation in image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2020–2029 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haochen Wang .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. Backdoor Attacks Configurations

The six state-of-the-art backdoor attacks detailed in Table 3 employ various methodologies. BadNet achieves backdoor implantation by introducing a local-patch trigger into the model. In contrast, CL employs the FGSM [21] technique to add perturbations to samples, effectively implanting the backdoor without altering the labels. Trojan WM takes a reverse engineering approach to generate triggers. On the other hand, \(\ell _{0}\) inv formulates trigger generation as a regularization optimization problem. Utilizing steganography, a covert full-image trigger is incorporated by modifying the least significant bits of images. Lastly, FTrojan introduces a frequency domain-based backdoor attack, discreetly embedding a full-image trigger in the RGB space.

Table 3. A configuration summary for the backdoor attacks.

1.2 B. Defend Against Different Injection Rate Attacks

Adversaries may increase the backdoor sample injection rate to increase the difficulty of data filtering. Therefore, we test DFaP by implementing BadNets and Steganography attacks with different injection rates (i.e., 10% to 50%). We compare the best defense methods under each attack as a reference. Specifically, STRIP, Frbruus, and DFaP were tested under BadNets attack with different injection rates, and Fine-Pruning, I-BAU, and DFaP were tested under Steganography attack with different injection rates. The ASRs for different injection rates attacks under various defense methods are shown in Fig. 6(a). Fine-Pruning and Februus failed to defend after the injection rates reached 30% and 40%. The rest of the defense methods have consistent defense performance under different injection rates. The BAs (based on CIFAR-10) for different injection rates of BadNets under various defense methods are shown in Fig. 6(b). DFaP and STRIP achieved comparable results, higher than Februus. The BAs (based on GTSRB) for different injection rates of Steganography under various defense methods are shown in Fig. 6(c). DFaP and I-BAU achieved comparable results above Fine-Pruning. The experimental results show that the ASR of the retrained model after DFaP dropped from 100% to nearly 0%, and there was no gap between the BA of the retrained model and the BA of the infected model. Therefore, the performance of DFaP is robust to backdoor attacks with different injection rates, despite higher injection rates being more challenging to defend.

Fig. 6.
figure 6

The performance of DFaP is evaluated based on different injection rates. We show the ASRs for different injection rates attacks under various defense methods in (a), the BAs (based on CIFAR-10) for different injection rates of BadNets under various defense methods in (b), and the BAs(based on GTSRB) for different injection rates of Steganography under various defense methods in (c).

1.3 C. Sensitivity Study

In this section, we evaluate the sensitivity of the CAM Filter to the erasure repair threshold through a sensitivity study. We calculate the True Acceptance Rate (TAR) and the False Acceptance Rate (FAR) to measure the data filtering capability of DFaP. TAR and FAR represent the percentage of local-patch and clean samples judged as erasure-prone samples.

In Fig. 7, we demonstrate the data filtering effect of DFaP based on different erasure repair thresholds (from 0.3 to 0.6) for three datasets based on a 10% injection rate of BadNets. The experimental results reveal that when the threshold is dropped from 0.6 to 0.3, the TAR still reaches about 80%, indicating that DFaP at lower threshold can still ensure the dataset’s usefulness. Meanwhile, even with a higher threshold, such as CIFAR-10 under the erasure repair threshold of 0.6, the FAR reaches 6.74% (i.e., an injection rate of 0.006), which is still far below the injection rate required for a successful backdoor attack. Therefore, DFaP does not necessitate complex hyperparameter selection.

Fig. 7.
figure 7

The performance of DFaP with different erasure repair thresholds.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, H., Mu, T., Feng, G., Wu, S., Li, Y. (2024). DFaP: Data Filtering and Purification Against Backdoor Attacks. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-9785-5_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-9784-8

  • Online ISBN: 978-981-99-9785-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics