Abstract
With the advancement of cyber technology, proactive security methods such as adversary emulation and leveraging Cyber Threat Intelligence (CTI) have become increasingly essential. Currently, some methods have achieved automatic mapping of unstructured text Cyber Threat Intelligence to attack techniques that could facilitate proactive security. However, these methods do not consider the semantic relationships between CTI and attack techniques at different abstraction levels, which leads to poor performance in the classification. In this work, we propose a Hierarchy-aware method for Mapping of CTI to Attack Techniques (HMCAT). Specifically, HMCAT first extracts Indicators of Compromise (IOC) entities in the CTI with two steps, then projects the CTI with IOC entities and the corresponding attack technique into a joint embedding space. Finally, HMCAT captures the semantics relationship among text descriptions, coarse-grained techniques, fine-grained techniques and unrelated techniques through a hierarchy-aware mapping loss. Meanwhile, we also propose a data augmentation technique based on in-context learning to solve the problem of long-tailed distribution in the Adversarial Tactics, Techniques and Common Knowledge (ATT&CK) datasets, which could further improve the performance of mapping. Experimental results demonstrate that HMCAT significantly outperforms previous ML and DL methods, improving precision, recall and F-Measure by 6.6%, 13.9% and 9.9% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alves, P.M.M.R., Geraldo Filho, P.R., Gonçalves, V.P.: Leveraging BERT’s power to classify TTP from unstructured text. In: 2022 Workshop on Communication Networks and Power Systems (WCNPS), pp. 1–7. IEEE (2022)
Ampel, B., Samtani, S., Ullman, S., Chen, H.: Linking common vulnerabilities and exposures to the MITRE ATT &CK framework: a self-distillation approach. arXiv preprint arXiv:2108.01696 (2021)
Antle, A.N.: The CTI framework: informing the design of tangible systems for children. In: Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 195–202 (2007)
Applebaum, A., Miller, D., Strom, B., Korban, C., Wolf, R.: Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 363–373 (2016)
Bendovschi, A.: Cyber-attacks-trends, patterns and security countermeasures. Procedia Econ. Financ. 28, 24–31 (2015)
Brown, R., Lee, R.M.: The evolution of cyber threat intelligence (CTI): 2019 sans CTI survey. SANS Institute (2019). https://www.sans.org/white-papers/38790/. Accessed 12 July 2021
Brown, T.B., et al.: Language models are few-shot learners (2020)
Chen, H., Ma, Q., Lin, Z., Yan, J.: Hierarchy-aware label semantics matching network for hierarchical text classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4370–4379 (2021)
MITRE Corporation. MITRE ATT &CK. https://attack.mitre.org/
CTID. Fin6 adversary plan. https://github.com/center-for-threat-informed defense/adversary_emulation_library/tree/master/fin6
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dionísio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from twitter using deep neural networks (2019)
Hemberg, E., et al.: Linking threat tactics, techniques, and patterns with defensive weaknesses, vulnerabilities and affected platform configurations for cyber hunting. arXiv preprint arXiv:2010.00533 (2020)
Hutchins, E., Cloppert, M., Amin, R.: Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains, whitepaper, lockheed martin corp., nov. 2011 (2010)
Legoy, V., Caselli, M., Seifert, C., Peter, A.: Automated retrieval of ATT &CK tactics and techniques for cyber threat reports. arXiv preprint arXiv:2004.14322 (2020)
Lewis, D.D., Yang, Y., Russell-Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
Liu, C., Wang, J., Chen, X.: Threat intelligence ATT &CK extraction based on the attention transformer hierarchical recurrent neural network. Appl. Soft. Comput. 122, 108826 (2022)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mazzini, D., Napoletano, P., Piccoli, F., Schettini, R.: A novel approach to data augmentation for pavement distress segmentation. Comput. Ind. 121, 103225 (2020)
Oosthoek, K., Doerr, C.: SoK: ATT &CK techniques and trends in windows malware. In: Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A. (eds.) SecureComm 2019. LNICST, vol. 304, pp. 406–425. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37228-6_20
Orbinato, V., Barbaraci, M., Natella, R., Cotroneo, D.: Automatic mapping of unstructured cyber threat intelligence: an experimental study. arXiv preprint arXiv:2208.12144 (2022)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Tosh, D., Sengupta, S., Kamhoua, C.A., Kwiat, K.A.: Establishing evolutionary game models for cyber security information exchange (CYBEX). J. Comput. Syst. Sci. 98, 27–52 (2018)
Wang, B., Chen, L., Sun, W., Qin, K., Li, K., Zhou, H.: Ranking-based autoencoder for extreme multi-label classification. arXiv preprint arXiv:1904.05937 (2019)
Wu, Y., et al.: Price tag: towards semi-automatically discovery tactics, techniques and procedures of e-commerce cyber threat intelligence. IEEE Trans. Dependable Secure Comput. (2021)
You, Y., et al.: Tim: threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 5(1), 3 (2022)
Zhongkun, Yu., Wang, J.F., Tang, B.H., Li, L.: Tactics and techniques classification in cyber threat intelligence. Comput. J. 66(8), 1870–1881 (2022)
Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A The Comparison of Dataset Distributions
The initial distribution, depicted in Fig. 6(a), reveals numerous categories with a scarcity of samples. In some categories, as few as 3–5 samples are available due to limited data availability. Therefore, we use a data augmentation technique to enhance scarce categories. The augmented distribution and results are presented in Fig. 6(b). After using the data augmentation technique to increase samples, the model has a stronger ability to recognize different attack techniques and the text representation of different attack techniques becomes farther and farther away to avoid some classification confusion.
B Experimental Setup
In this Appendix, we introduce the datasets used in this paper and present the experimental results that demonstrate the advantages of our methods from a comparative perspective.
1.1 B.1 Datasets and Evaluation Metrics
Mapping Datasets: We follow Orbinato’s method [21] to build the datasets that use the public knowledge base of the MITRE ATT&CK framework. Each sample in the datasets corresponds to a specific malicious technique and has been annotated with a label, representing a technique from the MITRE ATT&CK framework taxonomy. A detailed description and category distribution of the initial datasets are shown in Table 6 and Fig. 6(a), respectively. It’s obvious that the initial datasets have a long-tailed distribution which means there are a few samples for most of the categories. To address this problem, we use a method of data Augmentation with in-context learning to build the categories balance augmented datasets. We first divide the original datasets into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification. Then we keep the same validation set and test set for all experiments which would reduce data leakage and ensure fairness. Finally, for the training datasets, we use data augmentation on categories with less than 50 samples so that the model can better learn the characteristics of these categories. Specifically, we iteratively use in-context learning for data augmentation until the final number of samples reaches 50, and filter out low-quality samples through critic model.
NER Dataset: We use Dionisio et al. [12] dataset collected from tweets through a selection of manually curated accounts and passed through a filter. The fine-grained NER targets are manually labeled. Specifically, the dataset contains 11074 tweets and 12356 entities with five different types of IOCs: ID, ORG, PRO, VER and VUL, accounting for 5770, 926, 3349, 1445 and 866 respectively. The same as the mapping dataset, we divide the NER dataset into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification to train our NER model and evaluate it.
Metrics: To compare the performance of different classification models, we consider three representative metrics: Precision, Recall and F-Measure.
1.2 B.2 Implementation Details
HMACT: We employ the ChatGPT(gpt-3.5-turbo-0301) model from the OpenAI API, with 175 billion parameters, as our LLM to generate new samples in the data augmentation step and we also fix the temperature to 0 to get the most certainty answer which can greatly reduce the number of low-quality samples. For categories that have less than 50 samples, we randomly select four demonstration examples as context to elicit ChatGPT to generate new samples until the total number of samples reaches 60. To further filter generated samples, we use SBERT as the critic model, which is based on dense sentence encoding. We set the upper and lower thresholds of similarity between generated samples and original samples as 0.8 and 0.3, respectively, which can eliminate duplicate and hallucinated samples. For the text encoder, we use two models: BERT [11] and RoBERTa [18]. For the label encoder in the hierarchical module, we initialize the GCN network with prior knowledge about the relationships between labels in the predefined hierarchy and corpus to accelerate the convergence of training. For the hierarchy-aware mapping, we set \(\gamma \) to 0.5 to penalize siblings with the target label and 0.7, 1 for non-sibling nodes and higher-level nodes. For the NER model, we use the BERT-base model that adopts a 12-layer structure and maps text to 768 dimensions. During the training process, a fine-tuning strategy is applied, wherein the first four layers of the model are kept fixed, and the weights of the subsequent eight layers are adjusted with a low learning rate of \(2\times 10^{-5}\). This approach ensures the effectiveness of training while reducing both training time and the required training set. The hidden state dimension of the BiLSTM is set to 256, and a dropout layer with a dropout rate of 0.3 is added to prevent overfitting. Because the characteristics and functions of the LSTM, BERT and CRF models are different, it is necessary to adopt a hierarchical learning rate method to learn better model parameters. The learning rate of BERT is set to \(2\times 10^{-5}\), and the learning rate of LSTM is 0.001. However, due to the large transfer matrix of CRF, a relatively large learning rate of 0.01 needs to be set. The batch size is set to 16, the training epoch is 10, Adam is used as the optimizer, CRF.loss is used as the LOSS function and CRF.accuracy is selected as the evaluation function. When the validation loss is no longer reduced, it is reduced by a factor of 0.1 overall learning rate.
Hardware Environment: CPU is Intel Xeon Gold 6248R with 14 cores and 2.00GHz, memory is 72G. GPU uses A100-PCIE with a memory of 40G.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, Z., Li, C., Fu, X., Luo, B., Du, X. (2024). Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14985. Springer, Cham. https://doi.org/10.1007/978-3-031-70903-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-70903-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70902-9
Online ISBN: 978-3-031-70903-6
eBook Packages: Computer ScienceComputer Science (R0)