Abstract
The rapid development of deep learning has led to a dramatic increase in user demand for training data. As a result, users are often compelled to acquire data from unsecured external sources through automated methods or outsourcing. Therefore, severe backdoor attacks occur during the training data collection phase of the DNNs pipeline, where adversaries can stealthily control DNNs to make expected or unintended outputs by contaminating the training data. In this paper, we propose a novel backdoor defense framework called DFaP (Data Filter and Purify). DFaP can make backdoor samples with local-patch or full-image triggers added harmless without needing additional clean samples. With DFaP, users can safely train clean DNN models with unsecured data. We have conducted experiments on two networks (AlexNet, ResNet-34) and two datasets (CIFAR10, GTSRB). The experimental results show that DFaP can defend against six state-of-the-art backdoor attacks. In comparison to the other four defense methods, DFaP demonstrates superior performance with an average reduction in attack success rate of 98.01%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Seff, A., Kornhauser, A., et al.: DeepDriving: learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Tian, Y., Pei, K., Jana, S., et al.: DeepTest: automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th International Conference on Software Engineering, pp. 303–314 (2018)
Jung, C., Shim, D.H.: Incorporating multi-context into the traversability map for urban autonomous driving using deep inverse reinforcement learning. IEEE Robot. Autom. Lett. 6(2), 1662–1669 (2021)
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Guo, J., Han, K., Wang, Y., et al.: Distilling object detectors via decoupled features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2154–2164 (2021)
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Xie, W., Feng, Y., Gu, S., et al.: Importance-based neuron allocation for multilingual neural machine translation. arXiv preprint arXiv:2107.06569 (2021)
Gao, Y., Doan, B.G., Zhang, Z., et al.: Backdoor attacks and countermeasures on deep learning: a comprehensive review. arXiv preprint arXiv:2007.10760 (2020)
Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017)
Turner, A., Tsipras, D., Madry, A.: Label-Consistent Backdoor Attacks. stat 1050, 6 (2019)
Li, S., Xue, M., Zhao, B.Z.H., et al.: Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secure Comput. 18(5), 2088–2105 (2020)
Wang, T., Yao, Y., Xu, F., et al.: Backdoor attack through frequency domain. arXiv preprint arXiv:2111.10991 (2021)
Pang, R., Zhang, Z., Gao, X., et al.: TROJANZOO: towards unified, holistic, and practical evaluation of neural backdoors. In:2022 IEEE 7th European Symposium on Security and Privacy (EuroS &P), pp. 684–702. IEEE (2022)
Chou, E., Tramer, F., Pellegrino, G.: SentiNet: detecting localized universal attacks against deep learning systems. In: 2020 IEEE Security and Privacy Workshops (SPW), pp. 48–54. IEEE (2020)
Zhong, H., Liao, C., Squicciarini, A.C., et al.: Backdoor embedding in convolutional neural network models via invisible perturbation. In: Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, pp. 97–108 (2020)
Shafahi, A., Huang, W.R., Najibi, M., et al.: Poison frogs! targeted clean-label poisoning attacks on neural networks. In: Advances in Neural Information Processing Systems, 31 (2018)
Zhu, C., Huang, W.R., Li, H., et al.: Transferable clean-label poisoning attacks on deep neural nets. In: International Conference on Machine Learning. PMLR, pp. 7614–7623 (2019)
Barni, M., Kallas, K., Tondi, B.: A new backdoor attack in CNNs by training set corruption without label poisoning. In: 2019 IEEE International Conference on Image Processing (ICIP), pp. 101–105. IEEE (2019)
Quanxin, Z., Wencong, M.A., Yajie, W., et al.: Backdoor attacks on image classification models in deep neural networks. Chin. J. Electron. (2022). https://doi.org/10.1049/cje.2021.00.126
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Li, Y., Sha, T., Baker, T., et al.: Adaptive vertical federated learning via feature map transferring in mobile edge computing. Computing, 1–17 (2022). https://doi.org/10.1007/s00607-022-01117-x
Yang, J., Baker, T., Gill, S.S., et al.: A federated learning attack method based on edge collaboration via cloud. Softw. Pract. Exp. (2022)
Zheng, J., Zhang, Y., Li, Y., et al.: Towards evaluating the robustness of adversarial attacks against image scaling transformation. Chin. J. Electron. 32(1), 151–158 (2023)
Liu, Y., Ma, S., Aafer, Y., et al.: Trojaning attack on neural networks. In: 25th Annual Network and Distributed System Security Symposium (NDSS 2018). Internet Soc (2018)
Zhang, Y., Tan, Y., Sun, H., et al.: Improving the invisibility of adversarial examples with perceptually adaptive perturbation. Inf. Sci. 635, 126–137 (2023)
Wang, Y., Tan, Y., Lyu, H., et al.: Toward feature space adversarial attack in the frequency domain. Int. J. Intell. Syst. 37(12), 11019–11036 (2022)
Wang, B., Yao, Y., Shan, S., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-Pruning: defending against backdooring attacks on deep neural networks. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 273–294. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_13
Li, Y., Lyu, X., Koren, N., et al.: Neural attention distillation: erasing backdoor triggers from deep neural networks. arXiv preprint arXiv:2101.05930 (2021)
Zeng, Y., Chen, S., Park, W., et al.: Adversarial unlearning of backdoors via implicit hypergradient. In: International Conference on Learning Representations
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: Advances in Neural Information Processing Systems, 31 (2018)
Hayase, J., Kong, W., Somani, R., et al.: SPECTRE: defending against backdoor attacks using robust statistics. In: International Conference on Machine Learning, pp. 4129–4139. PMLR (2021)
Gao, Y., Xu, C., Wang, D., et al.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125 (2019)
Yang, J., Zheng, J., Zhang, Z., et al.: Security of federated learning for cloud-edge intelligence collaborative computing. Int. J. Intell. Syst. 37(11), 9290–9308 (2022)
Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.. Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, pp. 897–912 (2020)
Tang, D., Wang, X.F., Tang, H., et al.: Demon in the variant: statistical analysis of DNNs for robust backdoor contamination detection. In: USENIX Security Symposium, pp. 1541–1558 (2021)
Selvaraju, R.R., Cogswell, M., Das, A., et al.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools 9(1), 23–34 (2004)
Batson, J., Royer, L.. Noise2self: blind denoising by self-supervision. In: International Conference on Machine Learning. PMLR, pp. 524–533 (2019)
Stallkamp, J., Schlipsing, M., Salmen, J., et al.: Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw. 32, 323–332 (2012)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Guo, W., Wang, L., Xing, X., et al.: TABOR: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv e-prints (2019). arXiv: 1908.01763
Subramanya, A., Pillai, V., Pirsiavash, H.: Fooling network interpretation in image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2020–2029 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 A. Backdoor Attacks Configurations
The six state-of-the-art backdoor attacks detailed in Table 3 employ various methodologies. BadNet achieves backdoor implantation by introducing a local-patch trigger into the model. In contrast, CL employs the FGSM [21] technique to add perturbations to samples, effectively implanting the backdoor without altering the labels. Trojan WM takes a reverse engineering approach to generate triggers. On the other hand, \(\ell _{0}\) inv formulates trigger generation as a regularization optimization problem. Utilizing steganography, a covert full-image trigger is incorporated by modifying the least significant bits of images. Lastly, FTrojan introduces a frequency domain-based backdoor attack, discreetly embedding a full-image trigger in the RGB space.
1.2 B. Defend Against Different Injection Rate Attacks
Adversaries may increase the backdoor sample injection rate to increase the difficulty of data filtering. Therefore, we test DFaP by implementing BadNets and Steganography attacks with different injection rates (i.e., 10% to 50%). We compare the best defense methods under each attack as a reference. Specifically, STRIP, Frbruus, and DFaP were tested under BadNets attack with different injection rates, and Fine-Pruning, I-BAU, and DFaP were tested under Steganography attack with different injection rates. The ASRs for different injection rates attacks under various defense methods are shown in Fig. 6(a). Fine-Pruning and Februus failed to defend after the injection rates reached 30% and 40%. The rest of the defense methods have consistent defense performance under different injection rates. The BAs (based on CIFAR-10) for different injection rates of BadNets under various defense methods are shown in Fig. 6(b). DFaP and STRIP achieved comparable results, higher than Februus. The BAs (based on GTSRB) for different injection rates of Steganography under various defense methods are shown in Fig. 6(c). DFaP and I-BAU achieved comparable results above Fine-Pruning. The experimental results show that the ASR of the retrained model after DFaP dropped from 100% to nearly 0%, and there was no gap between the BA of the retrained model and the BA of the infected model. Therefore, the performance of DFaP is robust to backdoor attacks with different injection rates, despite higher injection rates being more challenging to defend.
The performance of DFaP is evaluated based on different injection rates. We show the ASRs for different injection rates attacks under various defense methods in (a), the BAs (based on CIFAR-10) for different injection rates of BadNets under various defense methods in (b), and the BAs(based on GTSRB) for different injection rates of Steganography under various defense methods in (c).
1.3 C. Sensitivity Study
In this section, we evaluate the sensitivity of the CAM Filter to the erasure repair threshold through a sensitivity study. We calculate the True Acceptance Rate (TAR) and the False Acceptance Rate (FAR) to measure the data filtering capability of DFaP. TAR and FAR represent the percentage of local-patch and clean samples judged as erasure-prone samples.
In Fig. 7, we demonstrate the data filtering effect of DFaP based on different erasure repair thresholds (from 0.3 to 0.6) for three datasets based on a 10% injection rate of BadNets. The experimental results reveal that when the threshold is dropped from 0.6 to 0.3, the TAR still reaches about 80%, indicating that DFaP at lower threshold can still ensure the dataset’s usefulness. Meanwhile, even with a higher threshold, such as CIFAR-10 under the erasure repair threshold of 0.6, the FAR reaches 6.74% (i.e., an injection rate of 0.006), which is still far below the injection rate required for a successful backdoor attack. Therefore, DFaP does not necessitate complex hyperparameter selection.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, H., Mu, T., Feng, G., Wu, S., Li, Y. (2024). DFaP: Data Filtering and Purification Against Backdoor Attacks. In: Vaidya, J., Gabbouj, M., Li, J. (eds) Artificial Intelligence Security and Privacy. AIS&P 2023. Lecture Notes in Computer Science, vol 14509. Springer, Singapore. https://doi.org/10.1007/978-981-99-9785-5_7
Download citation
DOI: https://doi.org/10.1007/978-981-99-9785-5_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-9784-8
Online ISBN: 978-981-99-9785-5
eBook Packages: Computer ScienceComputer Science (R0)