Abstract
Machine learning algorithms can effectively classify malware through dynamic behavior but are susceptible to adversarial attacks. Existing attacks, however, often fail to find an effective solution in both the feature and problem spaces. This issue arises from not addressing the intrinsic nondeterministic nature of malware, namely executing the same sample multiple times may yield significantly different behaviors. Hence, the perturbations computed for a specific behavior may be ineffective for others observed in subsequent executions. In this paper, we show how an attacker can augment their chance of success by leveraging a new and more efficient feature space algorithm for sequential data, which we have named Position Sensitive - Fast Gradient Sign Method, and by adopting two problem space strategies specially tailored to address nondeterminism in the problem space. We implement our novel algorithm and attack strategies in Tarallo, an end-to-end adversarial framework that significantly outperforms previous works in both white and black-box scenarios. Our preliminary analysis in a sandboxed environment and against two Recurrent Neural Network (RNN)-based malware detectors, shows that Tarallo achieves a success rate up to 99% on both feature and problem space attacks while significantly minimizing the number of modifications required for misclassification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The list of parameters can be found at: https://github.com/necst/Tarallo/blob/main/ChainFramework/config/config_api_args.py.
References
Cuckoo (2024). https://github.com/cuckoosandbox/cuckoo
Afianian, A., Niksefat, S., Sadeghiyan, B., Baptiste, D.: Malware dynamic analysis evasion techniques: a survey. ACM Comput. Surv. 52(6) (2019). https://doi.org/10.1145/3365001
Agrawal, R., Stokes, J.W., Marinescu, M., Selvaraj, K.: Neural sequential malware detection with parameters. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2656–2660. IEEE (2018)
Anderson, H.S., Roth, P.: Ember: an open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637 (2018)
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: Drebin: effective and explainable detection of Android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014)
Berman, D.S., Buczak, A.L., Chavis, J.S., Corbett, C.L.: A survey of deep learning methods for cyber security. Information 10(4), 122 (2019)
Catak, F.O., Yazı, A.F., Elezaj, O., Ahmed, J.: Deep learning based sequential model for malware analysis using windows exe API calls. PeerJ Comput. Sci. 6, e285 (2020)
Comparetti, P.M., Salvaneschi, G., Kirda, E., Kolbitsch, C., Kruegel, C., Zanero, S.: Identifying dormant functionality in malware programs. In: 2010 IEEE Symposium on Security and Privacy, pp. 61–76. IEEE (2010)
Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B., Bharath, A.A.: Generative adversarial networks: an overview. IEEE Signal Process. Mag. 35(1), 53–65 (2018)
D’Onghia, M., Di Cesare, F., Gallo, L., Carminati, M., Polino, M., Zanero, S.: Lookin’out my backdoor! investigating backdooring attacks against DL-driven malware detectors. In: Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, pp. 209–220 (2023)
D’Onghia, M., Salvadore, M., Nespoli, B.M., Carminati, M., Polino, M., Zanero, S.: Apícula: static detection of API calls in generic streams of bytes. Comput. Secur. 119, 102775 (2022)
Eykholt, K., et al.: Robust physical-world attacks on deep learning visual classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1625–1634 (2018)
Fang, Y., et al.: A new malware classification approach based on malware dynamic analysis. In: Pieprzyk, J., Suriadi, S. (eds.) ACISP 2017. LNCS, vol. 10343, pp. 173–189. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59870-3_10
Forensics, C.: Virusshare (2023). http://virusshare.com/
Furht, B.: SIMD Single Instruction Multiple Data Processing. In: Furht, B. (ed.) Encyclopedia of Multimedia, pp. 817–819. Springer, Boston (2008). https://doi.org/10.1007/978-0-387-78414-4_220
Galloro, N., Polino, M., Carminati, M., Continella, A., Zanero, S.: A systematical and longitudinal study of evasive behaviors in windows malware. Comput. Secur. 113, 102550 (2022)
Giffin, J.T., Jha, S., Miller, B.P.: Automated discovery of mimicry attacks. In: Zamboni, D., Kruegel, C. (eds.) RAID 2006. LNCS, vol. 4219, pp. 41–60. Springer, Heidelberg (2006). https://doi.org/10.1007/11856214_3
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Hariom, Handa, A., Kumar, N., Kumar Shukla, S.: Adversaries strike hard: adversarial attacks against malware classifiers using dynamic API calls as features. In: Dolev, S., Margalit, O., Pinkas, B., Schwarzmann, A. (eds.) CSCML 2021. LNCS, vol. 12716, pp. 20–37. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78086-9_2
Hassen, M., Carvalho, M.M., Chan, P.K.: Malware classification using static analysis based features. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1–7. IEEE (2017)
Hu, W., Tan, Y.: Black-box attacks against RNN based malware detection algorithms. arXiv preprint arXiv:1705.08131 (2017)
Hu, W., Tan, Y.: Generating adversarial malware examples for black-box attacks based on GAN. In: Tan, Y., Shi, Y. (eds.) DMBD 2022. CCIS, vol. 1745, pp. 409–423. Springer, Singapore (2022). https://doi.org/10.1007/978-981-19-8991-9_29
Kolosnjaji, B., et al.: Adversarial malware binaries: evading deep learning for malware detection in executables. In: 2018 26th European Signal Processing Conference (EUSIPCO), pp. 533–537. IEEE (2018)
Kreuk, F., Barak, A., Aviv-Reuven, S., Baruch, M., Pinkas, B., Keshet, J.: Deceiving end-to-end deep learning malware detectors using adversarial examples. arXiv preprint arXiv:1802.04528 (2018)
Li, C., Lv, Q., Li, N., Wang, Y., Sun, D., Qiao, Y.: A novel deep framework for dynamic malware detection based on API sequence intrinsic features. Comput. Secur. 116, 102686 (2022)
Liu, T., Curtsinger, C., Berger, E.D.: DTHREADS: efficient deterministic multithreading. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 327–336 (2011)
Lucas, K., Sharif, M., Bauer, L., Reiter, M.K., Shintre, S.: Malware makeover: breaking ML-based static analysis by modifying executable bytes. In: Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, pp. 744–758 (2021)
Machado, G.R., Silva, E., Goldschmidt, R.R.: Adversarial machine learning in image classification: a survey toward the defender’s perspective. ACM Comput. Surv. (CSUR) 55(1), 1–38 (2021)
Microsoft: Programming reference for the win32 API (2024). https://learn.microsoft.com/en-us/windows/win32/api/
Ming, J., Xin, Z., Lan, P., Wu, D., Liu, P., Mao, B.: Impeding behavior-based malware analysis via replacement attacks to malware specifications. J. Comput. Virol. Hacking Tech. 13, 193–207 (2017)
Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference (ACSAC 2007), pp. 421–430. IEEE (2007)
Or-Meir, O., Nissim, N., Elovici, Y., Rokach, L.: Dynamic malware analysis in the modern era-a state of the art survey. ACM Comput. Surv. (CSUR) 52(5), 1–48 (2019)
Papernot, N., McDaniel, P., Swami, A., Harang, R.: Crafting adversarial input sequences for recurrent neural networks. In: 2016 IEEE Military Communications Conference, MILCOM 2016, pp. 49–54. IEEE (2016)
Pawlowski, A., Contag, M., Holz, T.: Probfuscation: an obfuscation approach using probabilistic control flows. In: Caballero, J., Zurutuza, U., Rodríguez, R.J. (eds.) DIMVA 2016. LNCS, vol. 9721, pp. 165–185. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40667-1_9
Pierazzi, F., Pendlebury, F., Cortellazzi, J., Cavallaro, L.: Intriguing properties of adversarial ml attacks in the problem space. In: 2020 IEEE Symposium on Security and Privacy (SP), pp. 1332–1349. IEEE (2020)
Polino, M., et al.: Measuring and defeating anti-instrumentation-equipped malware. In: Polychronakis, M., Meier, M. (eds.) DIMVA 2017. LNCS, vol. 10327, pp. 73–96. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60876-1_4
Polino, M., Scorti, A., Maggi, F., Zanero, S.: Jackdaw: towards automatic reverse engineering of large datasets of binaries. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 121–143. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20550-2_7
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole exe. In: Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Rosenberg, I., Meir, S.: Bypassing NGAV for Fun and Pro t (2020)
Rosenberg, I., Meir, S., Berrebi, J., Gordon, I., Sicard, G., David, E.O.: Generating end-to-end adversarial examples for malware classifiers using explainability. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–10. IEEE (2020)
Rosenberg, I., Shabtai, A., Elovici, Y., Rokach, L.: Query-efficient black-box attack against sequence-based malware classifiers. In: Annual Computer Security Applications Conference, pp. 611–626 (2020)
Rosenberg, I., Shabtai, A., Rokach, L., Elovici, Y.: Generic black-box end-to-end attack against state of the art API call based malware classifiers. In: Bailey, M., Holz, T., Stamatogiannakis, M., Ioannidis, S. (eds.) RAID 2018. LNCS, vol. 11050, pp. 490–510. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00470-5_23
Somayaji, A., Forrest, S.: Automated response using \(\{\)System-Call\(\}\) delay. In: 9th USENIX Security Symposium (USENIX Security 2000) (2000)
Suciu, O., Coull, S.E., Johns, J.: Exploring adversarial examples in malware detection. In: 2019 IEEE Security and Privacy Workshops (SPW). IEEE (2019)
Tan, K., Maxion, R.: “Why 6?” defining the operational limits of stide, an anomaly-based intrusion detector. In: Proceedings 2002 IEEE Symposium on Security and Privacy, pp. 188–201 (2002). https://doi.org/10.1109/SECPRI.2002.1004371
Tian, R., Islam, R., Batten, L., Versteeg, S.: Differentiating malware from cleanware using behavioural analysis. In: 2010 5th International Conference on Malicious and Unwanted Software, pp. 23–30. IEEE (2010)
Ucci, D., Aniello, L., Baldoni, R.: Survey of machine learning techniques for malware analysis. Comput. Secur. 81, 123–147 (2019)
Uppal, D., Sinha, R., Mehra, V., Jain, V.: Malware detection and classification based on extraction of API sequences. In: 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE (2014)
Wagner, D., Dean, R.: Intrusion detection via static analysis. In: Proceedings 2001 IEEE Symposium on Security and Privacy, S &P 2001, pp. 156–168. IEEE (2000)
Wagner, D., Soto, P.: Mimicry attacks on host-based intrusion detection systems. In: Proceedings of the 9th ACM Conference on Computer and Communications Security, pp. 255–264 (2002)
Warrender, C., Forrest, S., Pearlmutter, B.: Detecting intrusions using system calls: alternative data models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No. 99CB36344), pp. 133–145. IEEE (1999)
You, I., Yim, K.: Malware obfuscation techniques: a brief survey. In: 2010 International Conference on Broadband, Wireless Computing, Communication and Applications, pp. 297–300 (2010). https://doi.org/10.1109/BWCCA.2010.85
Zhang, Z., Qi, P., Wang, W.: Dynamic malware analysis with feature engineering and feature learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 1210–1217 (2020)
Acknowledgements
This study was carried out within the MICS (Made in Italy - Circular and Sustainable) Extended Partnership and received funding from Next-Generation EU (Italian PNRR - M4 C2, Invest 1.3 - D.D. 1551.11-10-2022, PE00000004). CUP MICS D43C22003120001. Mario D’Onghia acknowledges support from TIM S.p.A. through the PhD scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Digregorio, G. et al. (2024). Tarallo: Evading Behavioral Malware Detectors in the Problem Space. In: Maggi, F., Egele, M., Payer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2024. Lecture Notes in Computer Science, vol 14828. Springer, Cham. https://doi.org/10.1007/978-3-031-64171-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-64171-8_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64170-1
Online ISBN: 978-3-031-64171-8
eBook Packages: Computer ScienceComputer Science (R0)