Abstract
Fuzzing is a technique in which anomalous data is fed into software to find potential bugs. It is mainly used to discover vulnerabilities including but not limited to buffer overflows, memory leaks, and crashes when handling abnormal inputs. However, to ensure all inputs are valid in Fuzzing is infeasible in practice due to the high instrumentation overhead. Popular Fuzzers (e.g., AFL) often generate a large number of invalid mutations when performing Fuzzing, which prevents Fuzzers from discovering potential paths that lead to new crashes. More importantly, it prevents Fuzzers from making wise decisions on fuzzing operators. In this article, we propose a mutation sensitive Fuzzing solution Seq2Seq-AFL, in which mutation operator and mutation position are simultaneously taken into account, and different Seq2Seq models are designed to perform optimization scheme. The optimization scheme is capable of efficiently training a function for obtaining mutation operator and mutation position pairs, and utilizes the function to conduct Fuzzing. To verify the effectiveness of our scheme, we construct the dataset with two-dimensional vector data that corresponding to objdump, readelf, and nm programs. The experiment results demonstrate that our proposed scheme significantly improves the performance of the state-of-the-art AFL Fuzzing tool, with the coverage improvements of 13.7%, 17.6% and 6.9% of objdump, readelf and nm, respectively. Especially, Seq2Seq-AFL exposes a total of 21 Crashes for objdump.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available on request from the corresponding author.
References
Wu T, Liu J, Xue L, Wu Y (2023) Fixed-time synchronization of multilayer complex networks under denial-of-service attacks. IEEE Trans Circuits Syst II Express Briefs 70(9):3519–3523. https://doi.org/10.1109/TCSII.2023.3261405
Min D, Ko Y, Walker R, Lee J, Kim Y (2022) A content-based ransomware detection and backup solid-state drive for ransomware defense. IEEE Trans Comput Aided Des Integr Circuits Syst 41(7):2038–2051. https://doi.org/10.1109/TCAD.2021.3099084
Gan S et al (2022) Path sensitive fuzzing for native applications. IEEE Trans Dependable Secure Comput 19(3):1544–1561. https://doi.org/10.1109/TDSC.2020.3027690
Iorga D, Wickerson J, Donaldson AF (2023) Simulating operational memory models using off-the-shelf program analysis tools. IEEE Trans Software Eng 49(12):5084–5102. https://doi.org/10.1109/TSE.2023.3326056
Zuo F et al (2022) Vulnerability detection of ICS protocols via cross-state fuzzing. IEEE Trans Comput Aided Des Integr Circuits Syst 41(11):4457–4468. https://doi.org/10.1109/TCAD.2022.3201471
Li Y et al (2023) G-Fuzz: a directed fuzzing framework for gVisor. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2023.3244825
He R, He H, Zhang Y, Zhou M (2023) Automating dependency updates in practice: an exploratory study on GitHub dependabot. IEEE Trans Software Eng 49(8):4004–4022. https://doi.org/10.1109/TSE.2023.3278129
Zhang Zenong et al. {FIXREVERTER}: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing. 31st USENIX Security Symposium (USENIX Security 22). 2022.
Dai H, Sun CA, Liu H, Zhang X (2023) DFuzzer: diversity-driven seed queue construction of fuzzing for deep learning models. IEEE Trans Reliab. https://doi.org/10.1109/TR.2023.3322406
Klooster T, Turkmen F, Broenink G, Hove RT, Böhme M (2023) Continuous fuzzing: a study of the effectiveness and scalability of fuzzing in CI/CD pipelines. 2023 IEEE/ACM International Workshop on Search-Based and Fuzz Testing (SBFT), Melbourne, Australia, 2023, pp 25-32. https://doi.org/10.1109/SBFT59156.2023.00015
Barinov V, Kashkarov M, Kazmin A (2020) Applying compiler-based binary watermarking technology to ensure binary compatibility in GNU/Linux distribution. 2020 Ivannikov Ispras Open Conference (ISPRAS). Moscow, Russia 2020:11–15
Lin G (2021) Software vulnerability discovery via learning multi-domainknowledge bases. IEEE Trans Dependable Secure Comput 18(5):2469–2485. https://doi.org/10.1109/TDSC.2019.2954088
Lin G, Wen S, Han Q-L, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848. https://doi.org/10.1109/JPROC.2020.2993293
Croft Roland, et al. An empirical study of rule-based and learning-based approaches for static application security testing. Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). (2021). https://doi.org/10.1145/3475716.3475781.
Dinh Sung Ta, et al (2021) Favocado: fuzzing the binding code of javascript engines using semantically correct test cases. NDSS. https://doi.org/10.14722/ndss.2021.24224.
Cloosters T et al (2022) SGXFuzz: efficiently synthesizing nested structures for SGX enclave fuzzing. 31st USENIX Security Symposium (USENIX Security 22)
Kim SJ, Shon T (2018) Field classification-based novel fuzzing case generation for ICS protocols. J Supercomput 74:4434–4450. https://doi.org/10.1007/s11227-017-1980-3
Kiss Balázs, et al. Combining static and dynamic analyses for vulnerability detection: illustration on heartbleed. Hardware and Software: Verification and Testing: 11th International Haifa Verification Conference, HVC 2015, Haifa, Israel, Proceedings 11. Springer International Publishing, Cham. 2015. https://doi.org/10.1007/978-3-319-26287-1_3.
Li Z, Zhao H, Shi J, Huang Y, Xiong J (2019) An intelligent fuzzing data generation method based on deep adversarial learning. IEEE Access. 7:49327–49340. https://doi.org/10.1109/ACCESS.2019.2911121
Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. 2017 IEEE Symposium on Security and Privacy (SP), San Jose. pp. 579–594. https://doi.org/10.1109/sp.2017.23.
Xue Y et al (2022) xFuzz: machine learning guided cross-contract fuzzing. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2022.3182373
Situ L et al (2023) Physical devices-agnostic hybrid fuzzing of IoT firmware. IEEE Internet Things J 10(23):20718–20734. https://doi.org/10.1109/JIOT.2023.3303780
Wang B, Wang R, Song H (2023) Toward the trustworthiness of industrial robotics using differential fuzz testing. IEEE Trans Ind Inf. 19(3):2782–2791. https://doi.org/10.1109/TII.2022.3211888
Rajpal Mohit, William Blum, Rishabh Singh (2017) Not all bytes are equal: Neural byte sieve for fuzzing. arXiv preprint arXiv:1711.04596.
She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ: efficient fuzzing with neural program smoothing. 2019 IEEE Symposium on Security and Privacy (SP), San Francisco. 2019. pp. 803-817. https://doi.org/10.1109/sp.2019.00052
She Dongdong, et al. MTFuzz: fuzzing with a multi-task neural network. Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2020. https://doi.org/10.1145/3410251.
Wang X, Hu C, Ma R, Li B, Wang X (2020) LAFuzz: neural network for efficient fuzzing.” 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore. 2020. pp. 603–611. https://doi.org/10.1109/ictai50040.2020.00098.
Hu Zhicheng, et al. GANFuzz: a GAN-based industrial network protocol fuzzing framework. Proceedings of the 15th ACM International Conference on Computing Frontiers. 2018. https://doi.org/10.1145/3203217.3203241.
Zalewski M. American fuzzy lop. http://lcamtuf.coredump.c/afl 2014.
Aschermann C, Frassetto T, Holz T, et al. NAUTILUS: fishing for deep bugs with grammars. Network and distributed system security symposium. 2019. https://doi.org/10.14722/ndss.2019.23412.
Lyu C, Ji S, Zhang C, et al. MOPT: optimized mutation scheduling for fuzzers. USENIX Security Symposium. 2019.
Böhme M, Pham V-T, Roychoudhury A (2019) Coverage-based greybox fuzzing as markov chain. IEEE Trans Software Eng 45(5):489-506. https://doi.org/10.1109/TSE.2017.2785841
Wang J, Chen B, Wei L et al (2017) Skyfire: data-driven seed generation for fuzzing. IEEE. https://doi.org/10.1109/SP.2017.23
Hao P et al (2023) Lifelong property price prediction: a case study for the toronto real estate market. IEEE Trans Knowl Data Eng 35(3):2765–2780. https://doi.org/10.1109/TKDE.2021.3112749
Qian Li et al (2022) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Trans Audio Speech Lang Process 30:520–533. https://doi.org/10.1109/TASLP.2021.3138670
Ma L, Zhao Y, Wang B, Shen F (2023) A multistep sequence-to-sequence model with attention LSTM neural networks for industrial soft sensor application. IEEE Sens J 23(10):10801–10813. https://doi.org/10.1109/JSEN.2023.3266104
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Menendez HD, Clark D (2022) Hashing fuzzing: introducing input diversity to improve crash detection. IEEE Trans Software Eng 48(9):3540–3553. https://doi.org/10.1109/TSE.2021.3100858
Miller BP, Zhang M, Heymann ER (2022) The relevance of classic fuzz testing: have we solved this one? IEEE Trans Software Eng 48(6):2028–2039. https://doi.org/10.1109/TSE.2020.3047766
Arizon-Peretz R, Hadar I, Luria G (2022) The importance of securityis in the eye of the beholder: cultural, organizational, and personal factors affecting the implementation of security by design. IEEE Trans Software Eng 48(11):4433–4446. https://doi.org/10.1109/TSE.2021.3119721
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China (Grant Nos. U2333205, 62302025, 62276017), a fund project: State Grid Co., Ltd. Technology R&D Project (ProjectName: Research on Key Technologies of Data Scenario-based Security Governance and Emergency Blocking in Power Monitoring System, Project No.: 5108-202303439A-3-2-ZN), the 2022 CCF-NSFOCUS Kun-Peng Scientific Research Fund and the Opening Project of Shanghai Trusted Industrial Control Platform.
Author information
Authors and Affiliations
Contributions
Liqun Yang: Conceptualization, Methodology, Software, Funding acquisition. Chaoren Wei: Methodology, Formal analysis, Software, Writing, and Editing. Jian Yang: Review, Writing, Original draft and Editing. Jinxin Ma: Methodology, Review and Editing and Revise. Hongcheng Guo: Conceptualization, Methodology and Review. Long Cheng: Conduct Fuzzing testing in his self-developing equipment, and Review. Zhoujun Li: Review and Funding acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, L., Wei, C., Yang, J. et al. Seq2Seq-AFL: Fuzzing via sequence-to-sequence model. Int. J. Mach. Learn. & Cyber. 15, 4403–4421 (2024). https://doi.org/10.1007/s13042-024-02153-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-024-02153-z