Abstract
Due to code reuse among software packages, vulnerabilities can propagate from one software package to another. Current code clone detection techniques are useful for preventing and managing such vulnerability propagation. When the source code for a software package is not available, such as when working with proprietary or custom software distributions, binary code clone detection can be used to examine software for flaws. However, existing binary code clone detectors have scalability issues, or are limited in their accurate detection of vulnerable code clones.
In this paper, we introduce QuickBCC, a scalable binary code clone detection framework designed for vulnerability scanning. The framework was built on the idea of extracting semantics from vulnerable binaries both before and after security patches, and comparing them to target binaries. In order to improve performance, we created a signature based on the changes between the pre- and post-patched binaries, and implemented a filtering process when comparing the signatures to the target binaries. In addition, we leverage the smallest semantic unit, a strand, to improve accuracy and robustness against compile environments. QuickBCC is highly optimized, capable of preprocessing 5,439 target binaries within 111 min, and is able to match those binaries against 6 signatures in 23 s when running as a multi-threaded application. QuickBCC takes, on average, 3 ms to match one target binary. Comparing performance to other approaches, we found that it outperformed other approaches in terms of performance when detecting well known vulnerabilities with acceptable level of accuracy.
H. Jang, K. Yang, and G. Lee—Contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Debian Security Tracker (1997–2021). https://security-tracker.debian.org/tracker. Accessed 15 Apr 2021
VEX-IR (2004–2021). https://sourceware.org/git/?p=valgrind.git;f=VEX/pub/libvex_ir.h;a=blob_plain. Accessed 15 Apr 2021
IDA: About (2005–2021). https://www.hex-rays.com/products/ida/. Accessed 15 Apr 2021
Diaphora, the most advanced Free and Open Source program diffing tool (2015–2021). https://github.com/joxeankoret/diaphora. Accessed 15 Apr 2021
radare (2021). https://rada.re/n/radare2.html. Accessed 15 Apr 2021
Alrabaee, S.: Efficient, scalable, and accurate program fingerprinting in binary code. Ph.D. thesis, Concordia University (2018)
Bellon, S., Koschke, R., Antoniol, G., Krinke, J., Merlo, E.: Comparison and evaluation of clone detection tools. IEEE Trans. Software Eng. 33(9), 577–591 (2007). https://doi.org/10.1109/TSE.2007.70725
David, Y., Partush, N., Yahav, E.: Statistical similarity of binaries. In: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2016, pp. 266–280. ACM, New York (2016). https://doi.org/10.1145/2908080.2908126
David, Y., Partush, N., Yahav, E.: FirmUp: precise static detection of common vulnerabilities in firmware. In: ASPLOS 2018, pp. 392–404, May 2018. https://doi.org/10.1109/SP.2017.62
Ding, S.H., Fung, B.C., Charland, P.: Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In: 2019 IEEE Symposium on Security and Privacy (SP), pp. 472–489. IEEE (2019)
Duan, Y., Li, X., Wang, J., Yin, H.: Deepbindiff: learning program-wide code representations for binary diffing. In: Proceedings of the Network and Distributed System Security Symposium (2020)
Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: Proceedings of the 2016 Network and Distributed System Security (NDSS) Symposium (2016)
Feng, Q., Wang, M., Zhang, M., Zhou, R., Henderson, A., Yin, H.: Extracting conditional formulas for cross-platform bug search. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 346–359 (2017)
Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88625-9_16
Huang, H., Youssef, A.M., Debbabi, M.: BinSequence: fast, accurate and scalable binary code reuse detection. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 155–166. ACM (2017)
Jang, J., Agrawal, A., Brumley, D.: ReDeBug: finding unpatched code clones in entire OS distributions. In: 2012 IEEE Symposium on Security and Privacy, pp. 48–62, May 2012. https://doi.org/10.1109/SP.2012.13
Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Software Eng. 28(7), 654–670 (2002)
Kim, S., Woo, S., Lee, H., Oh, H.: VUDDY: a scalable approach for vulnerable code clone discovery. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 595–614, May 2017. https://doi.org/10.1109/SP.2017.62
Kondrak, G.: N-Gram similarity and distance. In: Consens, M., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 115–126. Springer, Heidelberg (2005). https://doi.org/10.1007/11575832_13
Massarelli, L., Di Luna, G.A., Petroni, F., Baldoni, R., Querzoni, L.: SAFE: self-attentive function embeddings for binary similarity. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds.) DIMVA 2019. LNCS, vol. 11543, pp. 309–329. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22038-9_15
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007). https://doi.org/10.1145/1273442.1250746
Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: Proceedings of the 2015 IEEE Symposium on Security and Privacy, SP 2015, pp. 709–724. IEEE Computer Society, Washington, DC (2015). https://doi.org/10.1109/SP.2015.49
Rahimian, A., Shirani, P., Alrbaee, S., Wang, L., Debbabi, M.: Bincomp: a stratified approach to compiler provenance attribution. Digit. Investig. 14, S146–S155 (2015)
Shirani, P., et al.: BINARM: scalable and efficient detection of vulnerabilities in firmware images of intelligent electronic devices. In: Giuffrida, C., Bardin, S., Blanc, G. (eds.) DIMVA 2018. LNCS, vol. 10885, pp. 114–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93411-2_6
Xiao, Y., et al.: \(\{\)MVP\(\}\): detecting vulnerabilities using patch-enhanced vulnerability signatures. In: 29th USENIX Security Symposium (USENIX Security 2020), pp. 1165–1182 (2020)
Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376 (2017)
Xu, Z., Chen, B., Chandramohan, M., Liu, Y., Song, F.: Spain: security patch analysis for binaries towards understanding the pain and pills. In: 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE), pp. 462–472. IEEE (2017)
Zhang, H., Qian, Z.: Precise and accurate patch presence test for binaries. In: 27th USENIX Security Symposium (USENIX Security 2018), pp. 887–902 (2018)
Zuo, F., Li, X., Young, P., Luo, L., Zeng, Q., Zhang, Z.: Neural machine translation inspired binary code similarity comparison beyond function pairs. arXiv preprint arXiv:1808.04706 (2018)
Acknowledgement
We thank Seongbeom Park for his contribution on the signature generation. This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2019-0-01697, Development of Automated Vulnerability Discovery Technologies for Blockchain Platform Security), the National Research Foundation (NRF), Korea, under project BK21 FOUR, and the Research Foundation City University of New York.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 IFIP International Federation for Information Processing
About this paper
Cite this paper
Jang, H. et al. (2021). QuickBCC: Quick and Scalable Binary Vulnerable Code Clone Detection. In: Jøsang, A., Futcher, L., Hagen, J. (eds) ICT Systems Security and Privacy Protection. SEC 2021. IFIP Advances in Information and Communication Technology, vol 625. Springer, Cham. https://doi.org/10.1007/978-3-030-78120-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-78120-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78119-4
Online ISBN: 978-3-030-78120-0
eBook Packages: Computer ScienceComputer Science (R0)