QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection | SpringerLink
Skip to main content

QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection

  • Conference paper
  • First Online:
ICT Systems Security and Privacy Protection (SEC 2021)

Abstract

Due to code reuse among software packages, vulnerabilities can propagate from one software package to another. Current code clone detection techniques are useful for preventing and managing such vulnerability propagation. When the source code for a software package is not available, such as when working with proprietary or custom software distributions, binary code clone detection can be used to examine software for flaws. However, existing binary code clone detectors have scalability issues, or are limited in their accurate detection of vulnerable code clones.

In this paper, we introduce QuickBCC, a scalable binary code clone detection framework designed for vulnerability scanning. The framework was built on the idea of extracting semantics from vulnerable binaries both before and after security patches, and comparing them to target binaries. In order to improve performance, we created a signature based on the changes between the pre- and post-patched binaries, and implemented a filtering process when comparing the signatures to the target binaries. In addition, we leverage the smallest semantic unit, a strand, to improve accuracy and robustness against compile environments. QuickBCC is highly optimized, capable of preprocessing 5,439 target binaries within 111 min, and is able to match those binaries against 6 signatures in 23 s when running as a multi-threaded application. QuickBCC takes, on average, 3 ms to match one target binary. Comparing performance to other approaches, we found that it outperformed other approaches in terms of performance when detecting well known vulnerabilities with acceptable level of accuracy.

H. Jang, K. Yang, and G. Lee—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 16015
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 20019
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 20019
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Debian Security Tracker (1997–2021). https://security-tracker.debian.org/tracker. Accessed 15 Apr 2021

  2. VEX-IR (2004–2021). https://sourceware.org/git/?p=valgrind.git;f=VEX/pub/libvex_ir.h;a=blob_plain. Accessed 15 Apr 2021

  3. IDA: About (2005–2021). https://www.hex-rays.com/products/ida/. Accessed 15 Apr 2021

  4. Diaphora, the most advanced Free and Open Source program diffing tool (2015–2021). https://github.com/joxeankoret/diaphora. Accessed 15 Apr 2021

  5. https://popcon.debian.org/

  6. radare (2021). https://rada.re/n/radare2.html. Accessed 15 Apr 2021

  7. Alrabaee, S.: Efficient, scalable, and accurate program fingerprinting in binary code. Ph.D. thesis, Concordia University (2018)

    Google Scholar 

  8. Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Software Eng. 33(9), 577–591 (2007). https://doi.org/10.1109/TSE.2007.70725

    Article  Google Scholar 

  9. David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 266–280. ACM, New York (2016). https://doi.org/10.1145/2908080.2908126

  10. David, Y., Partush, N., Yahav, E.: FirmUp: precise static detection of common vulnerabilities in firmware. In: ASPLOS 2018, pp. 392–404, May 2018. https://doi.org/10.1109/SP.2017.62

  11. Ding, S.H., Fung, B.C., Charland, P.: Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)

    Google Scholar 

  12. Duan, Y., Li, X., Wang, J., Yin, H.: Deepbindiff: learning program-wide code representations for binary diffing. In: Proceedings of the Network and Distributed System Security Symposium (2020)

    Google Scholar 

  13. Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: Proceedings of the 2016 Network and Distributed System Security (NDSS) Symposium (2016)

    Google Scholar 

  14. Feng, Q., Wang, M., Zhang, M., Zhou, R., Henderson, A., Yin, H.: Extracting conditional formulas for cross-platform bug search. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 346–359 (2017)

    Google Scholar 

  15. Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88625-9_16

    Chapter  Google Scholar 

  16. Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 155–166. ACM (2017)

    Google Scholar 

  17. Jang, J., Agrawal, A., Brumley, D.: ReDeBug: finding unpatched code clones in entire OS distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62, May 2012. https://doi.org/10.1109/SP.2012.13

  18. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Eng. 28(7), 654–670 (2002)

    Article  Google Scholar 

  19. Kim, S., Woo, S., Lee, H., Oh, H.: VUDDY: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 595–614, May 2017. https://doi.org/10.1109/SP.2017.62

  20. Kondrak, G.: N-Gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13

    Chapter  Google Scholar 

  21. Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15

    Chapter  Google Scholar 

  22. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). https://doi.org/10.1145/1273442.1250746

    Article  Google Scholar 

  23. Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP 2015, pp. 709–724. IEEE Computer Society, Washington, DC (2015). https://doi.org/10.1109/SP.2015.49

  24. Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Investig. 14, S146–S155 (2015)

    Article  Google Scholar 

  25. Shirani, P., et al.: BINARM: scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In: Giuffrida, C., Bardin, S., Blanc, G. (eds.) DIMVA 2018. LNCS, vol. 10885, pp. 114–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93411-2_6

    Chapter  Google Scholar 

  26. Xiao, Y., et al.: \(\{\)MVP\(\}\): detecting vulnerabilities using patch-enhanced vulnerability signatures. In: 29th USENIX Security Symposium (USENIX Security 2020), pp. 1165–1182 (2020)

    Google Scholar 

  27. Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)

    Google Scholar 

  28. Xu, Z., Chen, B., Chandramohan, M., Liu, Y., Song, F.: Spain: security patch analysis for binaries towards understanding the pain and pills. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 462–472. IEEE (2017)

    Google Scholar 

  29. Zhang, H., Qian, Z.: Precise and accurate patch presence test for binaries. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 887–902 (2018)

    Google Scholar 

  30. Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)

Download references

Acknowledgement

We thank Seongbeom Park for his contribution on the signature generation. This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01697, Development of Automated Vulnerability Discovery Technologies for Blockchain Platform Security), the National Research Foundation (NRF), Korea, under project BK21 FOUR, and the Research Foundation City University of New York.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Heejo Lee or Sven Dietrich .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jang, H. et al. (2021). QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection. In: Jøsang, A., Futcher, L., Hagen, J. (eds) ICT Systems Security and Privacy Protection. SEC 2021. IFIP Advances in Information and Communication Technology, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-030-78120-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-78120-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78119-4

  • Online ISBN: 978-3-030-78120-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics