Seq2Seq-AFL: Fuzzing via sequence-to-sequence model | International Journal of Machine Learning and Cybernetics Skip to main content
Log in

Seq2Seq-AFL: Fuzzing via sequence-to-sequence model

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Fuzzing is a technique in which anomalous data is fed into software to find potential bugs. It is mainly used to discover vulnerabilities including but not limited to buffer overflows, memory leaks, and crashes when handling abnormal inputs. However, to ensure all inputs are valid in Fuzzing is infeasible in practice due to the high instrumentation overhead. Popular Fuzzers (e.g., AFL) often generate a large number of invalid mutations when performing Fuzzing, which prevents Fuzzers from discovering potential paths that lead to new crashes. More importantly, it prevents Fuzzers from making wise decisions on fuzzing operators. In this article, we propose a mutation sensitive Fuzzing solution Seq2Seq-AFL, in which mutation operator and mutation position are simultaneously taken into account, and different Seq2Seq models are designed to perform optimization scheme. The optimization scheme is capable of efficiently training a function for obtaining mutation operator and mutation position pairs, and utilizes the function to conduct Fuzzing. To verify the effectiveness of our scheme, we construct the dataset with two-dimensional vector data that corresponding to objdump, readelf, and nm programs. The experiment results demonstrate that our proposed scheme significantly improves the performance of the state-of-the-art AFL Fuzzing tool, with the coverage improvements of 13.7%, 17.6% and 6.9% of objdump, readelf and nm, respectively. Especially, Seq2Seq-AFL exposes a total of 21 Crashes for objdump.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

  1. Wu T, Liu J, Xue L, Wu Y (2023) Fixed-time synchronization of multilayer complex networks under denial-of-service attacks. IEEE Trans Circuits Syst II Express Briefs 70(9):3519–3523. https://doi.org/10.1109/TCSII.2023.3261405

    Article  Google Scholar 

  2. Min D, Ko Y, Walker R, Lee J, Kim Y (2022) A content-based ransomware detection and backup solid-state drive for ransomware defense. IEEE Trans Comput Aided Des Integr Circuits Syst 41(7):2038–2051. https://doi.org/10.1109/TCAD.2021.3099084

    Article  Google Scholar 

  3. Gan S et al (2022) Path sensitive fuzzing for native applications. IEEE Trans Dependable Secure Comput 19(3):1544–1561. https://doi.org/10.1109/TDSC.2020.3027690

    Article  Google Scholar 

  4. Iorga D, Wickerson J, Donaldson AF (2023) Simulating operational memory models using off-the-shelf program analysis tools. IEEE Trans Software Eng 49(12):5084–5102. https://doi.org/10.1109/TSE.2023.3326056

    Article  Google Scholar 

  5. Zuo F et al (2022) Vulnerability detection of ICS protocols via cross-state fuzzing. IEEE Trans Comput Aided Des Integr Circuits Syst 41(11):4457–4468. https://doi.org/10.1109/TCAD.2022.3201471

    Article  Google Scholar 

  6. Li Y et al (2023) G-Fuzz: a directed fuzzing framework for gVisor. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2023.3244825

    Article  Google Scholar 

  7. He R, He H, Zhang Y, Zhou M (2023) Automating dependency updates in practice: an exploratory study on GitHub dependabot. IEEE Trans Software Eng 49(8):4004–4022. https://doi.org/10.1109/TSE.2023.3278129

    Article  Google Scholar 

  8. Zhang Zenong et al. {FIXREVERTER}: A Realistic Bug Injection Methodology for Benchmarking Fuzz Testing. 31st USENIX Security Symposium (USENIX Security 22). 2022.

  9. Dai H, Sun CA, Liu H, Zhang X (2023) DFuzzer: diversity-driven seed queue construction of fuzzing for deep learning models. IEEE Trans Reliab. https://doi.org/10.1109/TR.2023.3322406

    Article  Google Scholar 

  10. Klooster T, Turkmen F, Broenink G, Hove RT, Böhme M (2023) Continuous fuzzing: a study of the effectiveness and scalability of fuzzing in CI/CD pipelines. 2023 IEEE/ACM International Workshop on Search-Based and Fuzz Testing (SBFT), Melbourne, Australia, 2023, pp 25-32. https://doi.org/10.1109/SBFT59156.2023.00015

  11. Barinov V, Kashkarov M, Kazmin A (2020) Applying compiler-based binary watermarking technology to ensure binary compatibility in GNU/Linux distribution. 2020 Ivannikov Ispras Open Conference (ISPRAS). Moscow, Russia 2020:11–15

    Google Scholar 

  12. Lin G (2021) Software vulnerability discovery via learning multi-domainknowledge bases. IEEE Trans Dependable Secure Comput 18(5):2469–2485. https://doi.org/10.1109/TDSC.2019.2954088

    Article  Google Scholar 

  13. Lin G, Wen S, Han Q-L, Zhang J, Xiang Y (2020) Software vulnerability detection using deep neural networks: a survey. Proc IEEE 108(10):1825–1848. https://doi.org/10.1109/JPROC.2020.2993293

    Article  Google Scholar 

  14. Croft Roland, et al. An empirical study of rule-based and learning-based approaches for static application security testing. Proceedings of the 15th ACM/IEEE international symposium on empirical software engineering and measurement (ESEM). (2021). https://doi.org/10.1145/3475716.3475781.

  15. Dinh Sung Ta, et al (2021) Favocado: fuzzing the binding code of javascript engines using semantically correct test cases. NDSS. https://doi.org/10.14722/ndss.2021.24224.

  16. Cloosters T et al (2022) SGXFuzz: efficiently synthesizing nested structures for SGX enclave fuzzing. 31st USENIX Security Symposium (USENIX Security 22)

  17. Kim SJ, Shon T (2018) Field classification-based novel fuzzing case generation for ICS protocols. J Supercomput 74:4434–4450. https://doi.org/10.1007/s11227-017-1980-3

    Article  Google Scholar 

  18. Kiss Balázs, et al. Combining static and dynamic analyses for vulnerability detection: illustration on heartbleed. Hardware and Software: Verification and Testing: 11th International Haifa Verification Conference, HVC 2015, Haifa, Israel, Proceedings 11. Springer International Publishing, Cham. 2015. https://doi.org/10.1007/978-3-319-26287-1_3.

  19. Li Z, Zhao H, Shi J, Huang Y, Xiong J (2019) An intelligent fuzzing data generation method based on deep adversarial learning. IEEE Access. 7:49327–49340. https://doi.org/10.1109/ACCESS.2019.2911121

    Article  Google Scholar 

  20. Wang J, Chen B, Wei L, Liu Y (2017) Skyfire: data-driven seed generation for fuzzing. 2017 IEEE Symposium on Security and Privacy (SP), San Jose. pp. 579–594. https://doi.org/10.1109/sp.2017.23.

  21. Xue Y et al (2022) xFuzz: machine learning guided cross-contract fuzzing. IEEE Trans Dependable Secure Comput. https://doi.org/10.1109/TDSC.2022.3182373

    Article  Google Scholar 

  22. Situ L et al (2023) Physical devices-agnostic hybrid fuzzing of IoT firmware. IEEE Internet Things J 10(23):20718–20734. https://doi.org/10.1109/JIOT.2023.3303780

    Article  Google Scholar 

  23. Wang B, Wang R, Song H (2023) Toward the trustworthiness of industrial robotics using differential fuzz testing. IEEE Trans Ind Inf. 19(3):2782–2791. https://doi.org/10.1109/TII.2022.3211888

    Article  Google Scholar 

  24. Rajpal Mohit, William Blum, Rishabh Singh (2017) Not all bytes are equal: Neural byte sieve for fuzzing. arXiv preprint arXiv:1711.04596.

  25. She D, Pei K, Epstein D, Yang J, Ray B, Jana S. NEUZZ: efficient fuzzing with neural program smoothing. 2019 IEEE Symposium on Security and Privacy (SP), San Francisco. 2019. pp. 803-817. https://doi.org/10.1109/sp.2019.00052

  26. She Dongdong, et al. MTFuzz: fuzzing with a multi-task neural network. Proceedings of the 28th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 2020. https://doi.org/10.1145/3410251.

  27. Wang X, Hu C, Ma R, Li B, Wang X (2020) LAFuzz: neural network for efficient fuzzing.” 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore. 2020. pp. 603–611. https://doi.org/10.1109/ictai50040.2020.00098.

  28. Hu Zhicheng, et al. GANFuzz: a GAN-based industrial network protocol fuzzing framework. Proceedings of the 15th ACM International Conference on Computing Frontiers. 2018. https://doi.org/10.1145/3203217.3203241.

  29. Zalewski M. American fuzzy lop. http://lcamtuf.coredump.c/afl 2014.

  30. Aschermann C, Frassetto T, Holz T, et al. NAUTILUS: fishing for deep bugs with grammars. Network and distributed system security symposium. 2019. https://doi.org/10.14722/ndss.2019.23412.

  31. Lyu C, Ji S, Zhang C, et al. MOPT: optimized mutation scheduling for fuzzers. USENIX Security Symposium. 2019.

  32. Böhme M, Pham V-T, Roychoudhury A (2019) Coverage-based greybox fuzzing as markov chain. IEEE Trans Software Eng 45(5):489-506. https://doi.org/10.1109/TSE.2017.2785841

    Article  Google Scholar 

  33. Wang J, Chen B, Wei L et al (2017) Skyfire: data-driven seed generation for fuzzing. IEEE. https://doi.org/10.1109/SP.2017.23

    Article  Google Scholar 

  34. Hao P et al (2023) Lifelong property price prediction: a case study for the toronto real estate market. IEEE Trans Knowl Data Eng 35(3):2765–2780. https://doi.org/10.1109/TKDE.2021.3112749

    Article  Google Scholar 

  35. Qian Li et al (2022) Reinforcement learning-based dialogue guided event extraction to exploit argument relations. IEEE/ACM Trans Audio Speech Lang Process 30:520–533. https://doi.org/10.1109/TASLP.2021.3138670

    Article  Google Scholar 

  36. Ma L, Zhao Y, Wang B, Shen F (2023) A multistep sequence-to-sequence model with attention LSTM neural networks for industrial soft sensor application. IEEE Sens J 23(10):10801–10813. https://doi.org/10.1109/JSEN.2023.3266104

    Article  Google Scholar 

  37. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput. 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  38. Menendez HD, Clark D (2022) Hashing fuzzing: introducing input diversity to improve crash detection. IEEE Trans Software Eng 48(9):3540–3553. https://doi.org/10.1109/TSE.2021.3100858

    Article  Google Scholar 

  39. Miller BP, Zhang M, Heymann ER (2022) The relevance of classic fuzz testing: have we solved this one? IEEE Trans Software Eng 48(6):2028–2039. https://doi.org/10.1109/TSE.2020.3047766

    Article  Google Scholar 

  40. Arizon-Peretz R, Hadar I, Luria G (2022) The importance of securityis in the eye of the beholder: cultural, organizational, and personal factors affecting the implementation of security by design. IEEE Trans Software Eng 48(11):4433–4446. https://doi.org/10.1109/TSE.2021.3119721

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (Grant Nos. U2333205, 62302025, 62276017), a fund project: State Grid Co., Ltd. Technology R&D Project (ProjectName: Research on Key Technologies of Data Scenario-based Security Governance and Emergency Blocking in Power Monitoring System, Project No.: 5108-202303439A-3-2-ZN), the 2022 CCF-NSFOCUS Kun-Peng Scientific Research Fund and the Opening Project of Shanghai Trusted Industrial Control Platform.

Author information

Authors and Affiliations

Authors

Contributions

Liqun Yang: Conceptualization, Methodology, Software, Funding acquisition. Chaoren Wei: Methodology, Formal analysis, Software, Writing, and Editing. Jian Yang: Review, Writing, Original draft and Editing. Jinxin Ma: Methodology, Review and Editing and Revise. Hongcheng Guo: Conceptualization, Methodology and Review. Long Cheng: Conduct Fuzzing testing in his self-developing equipment, and Review. Zhoujun Li: Review and Funding acquisition.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Wei, C., Yang, J. et al. Seq2Seq-AFL: Fuzzing via sequence-to-sequence model. Int. J. Mach. Learn. & Cyber. 15, 4403–4421 (2024). https://doi.org/10.1007/s13042-024-02153-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-024-02153-z

Keywords

Navigation