Abstract
Currently, malware is continually evolving and growing in complexity, posing a significant threat to network security. With the constant emergence of new types and quantities of malware coupled with the continuous updating of dissemination methods, the rapid and accurate identification of malware as well as providing precise support for corresponding warning and defense measures have become a crucial challenge in maintaining network security. This article focuses on API call sequences in malware that can characterize the behavioral characteristics of malware as text and then uses the latest text classification-related technologies to achieve the classification of malware. This article proposes a flexible and lightweight malicious code classification model based on API core semantic information. To address the issues of prolonged training time and low accuracy caused by excessive noise and redundant data in API call sequences, this model adopts an intimacy analysis method based on a self-attention mechanism for key information extraction. To enhance the capture of semantic information within malware API call sequences, a feature extraction model based on a self-attention mechanism is used to transform unstructured key API sequences into vector representations, extract core features, and finally connect to the TextCNN model for multi classification. In the dataset of the “Alibaba Cloud Security Malicious Program Detection” competition, the F1 value reached 90% in eight category classification tasks. The experimental results show that the model proposed in this article can achieve better results in malware detection and multi-classification.
Supported by Major Scientific and Technological Innovation Projects of Shandong Province (2020CXGC010116) and the National Natural Science Foundation of China (No. 62172042).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wadkar, M., Troia, F.D., Stamp, M.: Detecting malware evolution using support vector machines. Expert Syst. Appl. 143, 113022.1-113022.10 (2020)
Natani, P., Vidyarthi, D.: Malware detection using API function frequency with ensemble based classifier. In: International Symposium on Security in Computing & Communication, pp. 378–388 (2013)
Han, W., Xue, J., Wang, Y., et al.: MalDAE: detecting and explaining malware based on correlation and fusion of static and dynamic characteristics. Comput. Secur. 83, 208–233 (2019)
Cha, S.K., Moraru, I., Jang, J., et al.: SplitScreen: enabling efficient, distributed malware detection. J. Commun. Netw. 13(2), 187–200 (2011)
Malhotra, A., Bajaj, K.: A hybrid pattern based text mining approach for malware detection using DBScan. CSI Trans. ICT 4(2–4), 1–9 (2016)
Karnik, A., Goswami, S., Guha, R.: Detecting obfuscated viruses using cosine similarity analysis. In: Asia International Conference on Modelling & Simulation, pp. 165–170. IEEE Computer Society (2007)
Kinable, J., Kostakis, O.: Malware classification based on call graph clustering. J. Comput. Virol. 7(4), 233–245 (2011)
Darshan, S., Kumara, M., Jaidhar, C.D.: Windows malware detection based on cuckoo sandbox generated report using machine learning algorithm. In: 2016 11th International Conference on Industrial and Information Systems (ICIIS), pp. 534–549 (2016)
Fang, Y., Zhang, W., Li, B., et al.: Semi-supervised malware clustering based on the weight of bytecode and API. IEEE Access 8, 2313–2326 (2019)
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Xiaofeng, L., Fangshuo, J., Xiao, Z., Baojiang, C., Shengwei, Y., Jing, S.: A malicious sample detection framework based on the combination of API sequence features and statistical features. J. Tsinghua Univ. (Nat. Sci. Ed.) 58(05), 500–508 (2018)
Cui, Z., Xue, F., Cai, X., et al.: Detection of malicious code variants based on deep learning. IEEE Trans. Ind. Inf. 14, 3187–3196 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhou, Y., Liu, Z., Xue, J., Wang, Y., Zhang, J. (2024). LM-cAPI:A Lite Model Based on API Core Semantic Information for Malware Classification. In: Andreoni, M. (eds) Applied Cryptography and Network Security Workshops. ACNS 2024. Lecture Notes in Computer Science, vol 14586. Springer, Cham. https://doi.org/10.1007/978-3-031-61486-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-61486-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61485-9
Online ISBN: 978-3-031-61486-6
eBook Packages: Computer ScienceComputer Science (R0)