DFaP: Data Filtering and Purification Against Backdoor Attacks

Wang, Haochen; Mu, Tianshi; Feng, Guocong; Wu, ShangBo; Li, Yuanzhang

doi:10.1007/978-981-99-9785-5_7

Haochen Wang¹⁰,
Tianshi Mu¹¹,
Guocong Feng¹²,
ShangBo Wu¹⁰ &
…
Yuanzhang Li¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14509))

Included in the following conference series:

International Conference on Artificial Intelligence Security and Privacy

742 Accesses

Abstract

The rapid development of deep learning has led to a dramatic increase in user demand for training data. As a result, users are often compelled to acquire data from unsecured external sources through automated methods or outsourcing. Therefore, severe backdoor attacks occur during the training data collection phase of the DNNs pipeline, where adversaries can stealthily control DNNs to make expected or unintended outputs by contaminating the training data. In this paper, we propose a novel backdoor defense framework called DFaP (Data Filter and Purify). DFaP can make backdoor samples with local-patch or full-image triggers added harmless without needing additional clean samples. With DFaP, users can safely train clean DNN models with unsecured data. We have conducted experiments on two networks (AlexNet, ResNet-34) and two datasets (CIFAR10, GTSRB). The experimental results show that DFaP can defend against six state-of-the-art backdoor attacks. In comparison to the other four defense methods, DFaP demonstrates superior performance with an average reduction in attack success rate of 98.01%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 10295; Price includes VAT (Japan)

Softcover Book: JPY 12869; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bag of tricks for backdoor learning

Article 05 April 2024

SDBC: A Novel and Effective Self-Distillation Backdoor Cleansing Approach

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

References

Chen, C., Seff, A., Kornhauser, A., et al.: DeepDriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Google Scholar
Tian, Y., Pei, K., Jana, S., et al.: DeepTest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, pp. 303–314 (2018)
Google Scholar
Jung, C., Shim, D.H.: Incorporating multi-context into the traversability map for urban autonomous driving using deep inverse reinforcement learning. IEEE Robot. Autom. Lett. 6(2), 1662–1669 (2021)
Article Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Guo, J., Han, K., Wang, Y., et al.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Xie, W., Feng, Y., Gu, S., et al.: Importance-based neuron allocation for multilingual neural machine translation. arXiv preprint arXiv:2107.06569 (2021)
Gao, Y., Doan, B.G., Zhang, Z., et al.: Backdoor attacks and countermeasures on deep learning: a comprehensive review. arXiv preprint arXiv:2007.10760 (2020)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Turner, A., Tsipras, D., Madry, A.: Label-Consistent Backdoor Attacks. stat 1050, 6 (2019)
Google Scholar
Li, S., Xue, M., Zhao, B.Z.H., et al.: Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secure Comput. 18(5), 2088–2105 (2020)
Google Scholar
Wang, T., Yao, Y., Xu, F., et al.: Backdoor attack through frequency domain. arXiv preprint arXiv:2111.10991 (2021)
Pang, R., Zhang, Z., Gao, X., et al.: TROJANZOO: towards unified, holistic, and practical evaluation of neural backdoors. In:2022 IEEE 7th European Symposium on Security and Privacy (EuroS &P), pp. 684–702. IEEE (2022)
Google Scholar
Chou, E., Tramer, F., Pellegrino, G.: SentiNet: detecting localized universal attacks against deep learning systems. In: 2020 IEEE Security and Privacy Workshops (SPW), pp. 48–54. IEEE (2020)
Google Scholar
Zhong, H., Liao, C., Squicciarini, A.C., et al.: Backdoor embedding in convolutional neural network models via invisible perturbation. In: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp. 97–108 (2020)
Google Scholar
Shafahi, A., Huang, W.R., Najibi, M., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, 31 (2018)
Google Scholar
Zhu, C., Huang, W.R., Li, H., et al.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning. PMLR, pp. 7614–7623 (2019)
Google Scholar
Barni, M., Kallas, K., Tondi, B.: A new backdoor attack in CNNs by training set corruption without label poisoning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105. IEEE (2019)
Google Scholar
Quanxin, Z., Wencong, M.A., Yajie, W., et al.: Backdoor attacks on image classification models in deep neural networks. Chin. J. Electron. (2022). https://doi.org/10.1049/cje.2021.00.126
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Li, Y., Sha, T., Baker, T., et al.: Adaptive vertical federated learning via feature map transferring in mobile edge computing. Computing, 1–17 (2022). https://doi.org/10.1007/s00607-022-01117-x
Yang, J., Baker, T., Gill, S.S., et al.: A federated learning attack method based on edge collaboration via cloud. Softw. Pract. Exp. (2022)
Google Scholar
Zheng, J., Zhang, Y., Li, Y., et al.: Towards evaluating the robustness of adversarial attacks against image scaling transformation. Chin. J. Electron. 32(1), 151–158 (2023)
Article Google Scholar
Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks. In: 25th Annual Network and Distributed System Security Symposium (NDSS 2018). Internet Soc (2018)
Google Scholar
Zhang, Y., Tan, Y., Sun, H., et al.: Improving the invisibility of adversarial examples with perceptually adaptive perturbation. Inf. Sci. 635, 126–137 (2023)
Article Google Scholar
Wang, Y., Tan, Y., Lyu, H., et al.: Toward feature space adversarial attack in the frequency domain. Int. J. Intell. Syst. 37(12), 11019–11036 (2022)
Article Google Scholar
Wang, B., Yao, Y., Shan, S., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)
Google Scholar
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-Pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13
Chapter Google Scholar
Li, Y., Lyu, X., Koren, N., et al.: Neural attention distillation: erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)
Zeng, Y., Chen, S., Park, W., et al.: Adversarial unlearning of backdoors via implicit hypergradient. In: International Conference on Learning Representations
Google Scholar
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, 31 (2018)
Google Scholar
Hayase, J., Kong, W., Somani, R., et al.: SPECTRE: defending against backdoor attacks using robust statistics. In: International Conference on Machine Learning, pp. 4129–4139. PMLR (2021)
Google Scholar
Gao, Y., Xu, C., Wang, D., et al.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125 (2019)
Google Scholar
Yang, J., Zheng, J., Zhang, Z., et al.: Security of federated learning for cloud-edge intelligence collaborative computing. Int. J. Intell. Syst. 37(11), 9290–9308 (2022)
Article Google Scholar
Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.. Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, pp. 897–912 (2020)
Google Scholar
Tang, D., Wang, X.F., Tang, H., et al.: Demon in the variant: statistical analysis of DNNs for robust backdoor contamination detection. In: USENIX Security Symposium, pp. 1541–1558 (2021)
Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools 9(1), 23–34 (2004)
Article Google Scholar
Batson, J., Royer, L.. Noise2self: blind denoising by self-supervision. In: International Conference on Machine Learning. PMLR, pp. 524–533 (2019)
Google Scholar
Stallkamp, J., Schlipsing, M., Salmen, J., et al.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Guo, W., Wang, L., Xing, X., et al.: TABOR: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv e-prints (2019). arXiv: 1908.01763
Subramanya, A., Pillai, V., Pirsiavash, H.: Fooling network interpretation in image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2020–2029 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Haochen Wang, ShangBo Wu & Yuanzhang Li
China Southern Power Grid Digital Grid Group Co., Ltd., Guangzhou, China
Tianshi Mu
China Southern Power Grid Co., Ltd., Guangzhou, China
Guocong Feng

Authors

Haochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianshi Mu
View author publications
You can also search for this author in PubMed Google Scholar
Guocong Feng
View author publications
You can also search for this author in PubMed Google Scholar
ShangBo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haochen Wang .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Tampere University, Tampere, Finland
Moncef Gabbouj
Guangzhou University, Guangzhou, China
Jin Li

Appendix

1.1 A. Backdoor Attacks Configurations

The six state-of-the-art backdoor attacks detailed in Table 3 employ various methodologies. BadNet achieves backdoor implantation by introducing a local-patch trigger into the model. In contrast, CL employs the FGSM [21] technique to add perturbations to samples, effectively implanting the backdoor without altering the labels. Trojan WM takes a reverse engineering approach to generate triggers. On the other hand, \(\ell _{0}\) inv formulates trigger generation as a regularization optimization problem. Utilizing steganography, a covert full-image trigger is incorporated by modifying the least significant bits of images. Lastly, FTrojan introduces a frequency domain-based backdoor attack, discreetly embedding a full-image trigger in the RGB space.

Table 3. A configuration summary for the backdoor attacks.

Full size table

1.2 B. Defend Against Different Injection Rate Attacks

Adversaries may increase the backdoor sample injection rate to increase the difficulty of data filtering. Therefore, we test DFaP by implementing BadNets and Steganography attacks with different injection rates (i.e., 10% to 50%). We compare the best defense methods under each attack as a reference. Specifically, STRIP, Frbruus, and DFaP were tested under BadNets attack with different injection rates, and Fine-Pruning, I-BAU, and DFaP were tested under Steganography attack with different injection rates. The ASRs for different injection rates attacks under various defense methods are shown in Fig. 6(a). Fine-Pruning and Februus failed to defend after the injection rates reached 30% and 40%. The rest of the defense methods have consistent defense performance under different injection rates. The BAs (based on CIFAR-10) for different injection rates of BadNets under various defense methods are shown in Fig. 6(b). DFaP and STRIP achieved comparable results, higher than Februus. The BAs (based on GTSRB) for different injection rates of Steganography under various defense methods are shown in Fig. 6(c). DFaP and I-BAU achieved comparable results above Fine-Pruning. The experimental results show that the ASR of the retrained model after DFaP dropped from 100% to nearly 0%, and there was no gap between the BA of the retrained model and the BA of the infected model. Therefore, the performance of DFaP is robust to backdoor attacks with different injection rates, despite higher injection rates being more challenging to defend.

1.3 C. Sensitivity Study

In this section, we evaluate the sensitivity of the CAM Filter to the erasure repair threshold through a sensitivity study. We calculate the True Acceptance Rate (TAR) and the False Acceptance Rate (FAR) to measure the data filtering capability of DFaP. TAR and FAR represent the percentage of local-patch and clean samples judged as erasure-prone samples.

In Fig. 7, we demonstrate the data filtering effect of DFaP based on different erasure repair thresholds (from 0.3 to 0.6) for three datasets based on a 10% injection rate of BadNets. The experimental results reveal that when the threshold is dropped from 0.6 to 0.3, the TAR still reaches about 80%, indicating that DFaP at lower threshold can still ensure the dataset’s usefulness. Meanwhile, even with a higher threshold, such as CIFAR-10 under the erasure repair threshold of 0.6, the FAR reaches 6.74% (i.e., an injection rate of 0.006), which is still far below the injection rate required for a successful backdoor attack. Therefore, DFaP does not necessitate complex hyperparameter selection.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Mu, T., Feng, G., Wu, S., Li, Y. (2024). DFaP: Data Filtering and Purification Against Backdoor Attacks. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_7

Download citation

DOI: https://doi.org/10.1007/978-981-99-9785-5_7
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9784-8
Online ISBN: 978-981-99-9785-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DFaP: Data Filtering and Purification Against Backdoor Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Bag of tricks for backdoor learning

SDBC: A Novel and Effective Self-Distillation Backdoor Cleansing Approach

Fine-Pruning: Defending Against Backdooring Attacks on Deep Neural Networks

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 A. Backdoor Attacks Configurations

1.2 B. Defend Against Different Injection Rate Attacks

1.3 C. Sensitivity Study

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us