Abstract
Neural networks are successfully used in a variety of applications, many of them having safety and security concerns. As a result researchers have proposed formal verification techniques for verifying neural network properties. While previous efforts have mainly focused on checking local robustness in neural networks, we instead study another neural network security issue, namely model poisoning. In this case an attacker inserts a trigger into a subset of the training data, in such a way that at test time, this trigger in an input causes the trained model to misclassify to some target class. We show how to formulate the check for model poisoning as a property that can be checked with off-the-shelf verification tools, such as Marabou and nneum, where counterexamples of failed checks constitute the triggers. We further show that the discovered triggers are ‘transferable’ from a small model to a larger, better-trained model, allowing us to analyze state-of-the art performant models trained for image classification tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Examples in this paper are made available open-source https://github.com/theyoucheng/vpn.
- 2.
Github link: https://github.com/NeuralNetworkVerification/Marabou (commit number 54e76b2c027c79d56f14751013fd649c8673dc1b).
- 3.
Github link: https://github.com/stanleybak/nnenum (commit number fd07f2b6c55ca46387954559f40992ae0c9b06b7).
References
Bak, S.: nnenum: verification of relu neural networks with optimized abstraction refinement. In: Dutle, A., Moscato, M.M., Titolo, L., Muñoz, C.A., Perez, I. (eds.) NFM 2021. LNCS, vol. 12673, pp. 19–36. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-76384-8_2
Bak, S., Liu, C., Johnson, T.: The second international verification of neural networks competition (vnn-comp 2021): summary and results. arXiv preprint arXiv:2109.00498 (2021)
Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40994-3_25
Brown, T.B., Mané, D., Roy, A., Abadi, M., Gilmer, J.: Adversarial patch (2017). 10.48550/ARXIV.1712.09665. https://arxiv.org/abs/1712.09665
Cheng, S., Liu, Y., Ma, S., Zhang, X.: Deep feature space trojan attack of neural networks by controlled detoxification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 1148–1156 (2021)
Chiang, P.Y., Ni, R., Abdelkader, A., Zhu, C., Studor, C., Goldstein, T.: Certified defenses for adversarial patches. In: International Conference on Learning Representations (2019)
Demontis, A., et al.: Why do adversarial attacks transfer? explaining transferability of evasion and poisoning attacks. In: USENIX Security, pp. 321–338 (2019)
Elboher, Y.Y., Gottschlich, J., Katz, G.: An abstraction-based framework for neural network verification. In: Lahiri, S.K., Wang, C. (eds.) CAV 2020. LNCS, vol. 12224, pp. 43–65. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-53288-8_3
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: A defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, pp. 113–125 (2019)
Gu, T., Liu, K., Dolan-Gavitt, B., Garg, S.: Badnets: evaluating backdooring attacks on deep neural networks. IEEE Access 7, 47230–47244 (2019). https://doi.org/10.1109/ACCESS.2019.2909068
Guo, W., Wang, L., Xing, X., Du, M., Song, D.: Tabor: a highly accurate approach to inspecting and restoring trojan backdoors in AI systems. arXiv preprint arXiv:1908.01763 (2019)
Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel, C.: Detection of traffic signs in real-world images: the german traffic sign detection benchmark. In: International Joint Conference on Neural Networks, No. 1288 (2013)
Huang, X., et al: A survey of safety and trustworthiness of deep neural networks: verification, testing, adversarial attack and defence, and interpretability. Comput. Sci. Rev. 37, 100270 (2020)
Katz, Guy, Katz, G.: The Marabou Framework for Verification and Analysis of Deep Neural Networks. In: Dillig, Isil, Tasiran, Serdar (eds.) CAV 2019. LNCS, vol. 11561, pp. 443–452. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25540-4_26
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Liu, Y., et al.: Trojaning attack on neural networks. In: NDSS (2018)
Steinhardt, J., Koh, P.W.W., Liang, P.S.: Certified defenses for data poisoning attacks. Advances in neural information processing systems 30 (2017)
Suciu, O., Marginean, R., Kaya, Y., Daume III, H., Dumitras, T.: When does machine learning \(\{\)FAIL\(\}\)? generalized transferability for evasion and poisoning attacks. In: USENIX Security (2018)
Usman, M. et al.: NNrepair: constraint-based repair of neural network classifiers. Computer Aided Verification , 3–25 (2021). https://doi.org/10.1007/978-3-030-81685-8_1
Wang, B., et al.: Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. IEEE (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Y., Usman, M., Gopinath, D., Păsăreanu, C.S. (2022). VPN: Verification of Poisoning in Neural Networks. In: Isac, O., Ivanov, R., Katz, G., Narodytska, N., Nenzi, L. (eds) Software Verification and Formal Methods for ML-Enabled Autonomous Systems. NSV FoMLAS 2022 2022. Lecture Notes in Computer Science, vol 13466. Springer, Cham. https://doi.org/10.1007/978-3-031-21222-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-21222-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21221-5
Online ISBN: 978-3-031-21222-2
eBook Packages: Computer ScienceComputer Science (R0)