Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques

Hao, Zhiqiang; Li, Chuanyi; Fu, Xiao; Luo, Bin; Du, Xiaojiang

doi:10.1007/978-3-031-70903-6_4

Zhiqiang Hao^11,12,
Chuanyi Li^11,12,
Xiao Fu^11,12,
Bin Luo^11,12 &
…
Xiaojiang Du¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14985))

Included in the following conference series:

European Symposium on Research in Computer Security

982 Accesses

Abstract

With the advancement of cyber technology, proactive security methods such as adversary emulation and leveraging Cyber Threat Intelligence (CTI) have become increasingly essential. Currently, some methods have achieved automatic mapping of unstructured text Cyber Threat Intelligence to attack techniques that could facilitate proactive security. However, these methods do not consider the semantic relationships between CTI and attack techniques at different abstraction levels, which leads to poor performance in the classification. In this work, we propose a Hierarchy-aware method for Mapping of CTI to Attack Techniques (HMCAT). Specifically, HMCAT first extracts Indicators of Compromise (IOC) entities in the CTI with two steps, then projects the CTI with IOC entities and the corresponding attack technique into a joint embedding space. Finally, HMCAT captures the semantics relationship among text descriptions, coarse-grained techniques, fine-grained techniques and unrelated techniques through a hierarchy-aware mapping loss. Meanwhile, we also propose a data augmentation technique based on in-context learning to solve the problem of long-tailed distribution in the Adversarial Tactics, Techniques and Common Knowledge (ATT&CK) datasets, which could further improve the performance of mapping. Experimental results demonstrate that HMCAT significantly outperforms previous ML and DL methods, improving precision, recall and F-Measure by 6.6%, 13.9% and 9.9% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 8465; Price includes VAT (Japan)

Softcover Book: JPY 10581; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AttacKG: Constructing Technique Knowledge Graph from Cyber Threat Intelligence Reports

Knowledge Mining in Cybersecurity: From Attack to Defense

Automatic Summarization of Critical Threat Intelligence Using Transfer Learning

References

Alves, P.M.M.R., Geraldo Filho, P.R., Gonçalves, V.P.: Leveraging BERT’s power to classify TTP from unstructured text. In: 2022 Workshop on Communication Networks and Power Systems (WCNPS), pp. 1–7. IEEE (2022)
Google Scholar
Ampel, B., Samtani, S., Ullman, S., Chen, H.: Linking common vulnerabilities and exposures to the MITRE ATT &CK framework: a self-distillation approach. arXiv preprint arXiv:2108.01696 (2021)
Antle, A.N.: The CTI framework: informing the design of tangible systems for children. In: Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 195–202 (2007)
Google Scholar
Applebaum, A., Miller, D., Strom, B., Korban, C., Wolf, R.: Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 363–373 (2016)
Google Scholar
Bendovschi, A.: Cyber-attacks-trends, patterns and security countermeasures. Procedia Econ. Financ. 28, 24–31 (2015)
Article Google Scholar
Brown, R., Lee, R.M.: The evolution of cyber threat intelligence (CTI): 2019 sans CTI survey. SANS Institute (2019). https://www.sans.org/white-papers/38790/. Accessed 12 July 2021
Brown, T.B., et al.: Language models are few-shot learners (2020)
Google Scholar
Chen, H., Ma, Q., Lin, Z., Yan, J.: Hierarchy-aware label semantics matching network for hierarchical text classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4370–4379 (2021)
Google Scholar
MITRE Corporation. MITRE ATT &CK. https://attack.mitre.org/
CTID. Fin6 adversary plan. https://github.com/center-for-threat-informed defense/adversary_emulation_library/tree/master/fin6
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dionísio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from twitter using deep neural networks (2019)
Google Scholar
Hemberg, E., et al.: Linking threat tactics, techniques, and patterns with defensive weaknesses, vulnerabilities and affected platform configurations for cyber hunting. arXiv preprint arXiv:2010.00533 (2020)
Hutchins, E., Cloppert, M., Amin, R.: Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains, whitepaper, lockheed martin corp., nov. 2011 (2010)
Google Scholar
Legoy, V., Caselli, M., Seifert, C., Peter, A.: Automated retrieval of ATT &CK tactics and techniques for cyber threat reports. arXiv preprint arXiv:2004.14322 (2020)
Lewis, D.D., Yang, Y., Russell-Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
Google Scholar
Liu, C., Wang, J., Chen, X.: Threat intelligence ATT &CK extraction based on the attention transformer hierarchical recurrent neural network. Appl. Soft. Comput. 122, 108826 (2022)
Article Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mazzini, D., Napoletano, P., Piccoli, F., Schettini, R.: A novel approach to data augmentation for pavement distress segmentation. Comput. Ind. 121, 103225 (2020)
Article Google Scholar
Oosthoek, K., Doerr, C.: SoK: ATT &CK techniques and trends in windows malware. In: Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A. (eds.) SecureComm 2019. LNICST, vol. 304, pp. 406–425. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37228-6_20
Chapter Google Scholar
Orbinato, V., Barbaraci, M., Natella, R., Cotroneo, D.: Automatic mapping of unstructured cyber threat intelligence: an experimental study. arXiv preprint arXiv:2208.12144 (2022)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Tosh, D., Sengupta, S., Kamhoua, C.A., Kwiat, K.A.: Establishing evolutionary game models for cyber security information exchange (CYBEX). J. Comput. Syst. Sci. 98, 27–52 (2018)
Article MathSciNet Google Scholar
Wang, B., Chen, L., Sun, W., Qin, K., Li, K., Zhou, H.: Ranking-based autoencoder for extreme multi-label classification. arXiv preprint arXiv:1904.05937 (2019)
Wu, Y., et al.: Price tag: towards semi-automatically discovery tactics, techniques and procedures of e-commerce cyber threat intelligence. IEEE Trans. Dependable Secure Comput. (2021)
Google Scholar
You, Y., et al.: Tim: threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 5(1), 3 (2022)
Article Google Scholar
Zhongkun, Yu., Wang, J.F., Tang, B.H., Li, L.: Tactics and techniques classification in cyber threat intelligence. Comput. J. 66(8), 1870–1881 (2022)
Google Scholar
Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Software Institute, Nanjing University, Nanjing, China
Zhiqiang Hao, Chuanyi Li, Xiao Fu & Bin Luo
State Key Laboratory for Novel Software Technology, Nanjing, China
Zhiqiang Hao, Chuanyi Li, Xiao Fu & Bin Luo
Stevens Institute of Technology, Hoboken, NJ, 07030, USA
Xiaojiang Du

Authors

Zhiqiang Hao
View author publications
You can also search for this author in PubMed Google Scholar
Chuanyi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Fu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojiang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiqiang Hao .

Editor information

Editors and Affiliations

Institut Polytechnique de Paris, Palaiseau, France
Joaquin Garcia-Alfaro
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland
Rafał Kozik
Bydgoszcz University of Science and Technology, Bydgoszcz, Poland
Michał Choraś
Norwegian University of Science and Technology - NTNU, Gjøvik, Norway
Sokratis Katsikas

Appendices

A The Comparison of Dataset Distributions

The initial distribution, depicted in Fig. 6(a), reveals numerous categories with a scarcity of samples. In some categories, as few as 3–5 samples are available due to limited data availability. Therefore, we use a data augmentation technique to enhance scarce categories. The augmented distribution and results are presented in Fig. 6(b). After using the data augmentation technique to increase samples, the model has a stronger ability to recognize different attack techniques and the text representation of different attack techniques becomes farther and farther away to avoid some classification confusion.

B Experimental Setup

In this Appendix, we introduce the datasets used in this paper and present the experimental results that demonstrate the advantages of our methods from a comparative perspective.

1.1 B.1 Datasets and Evaluation Metrics

Mapping Datasets: We follow Orbinato’s method [21] to build the datasets that use the public knowledge base of the MITRE ATT&CK framework. Each sample in the datasets corresponds to a specific malicious technique and has been annotated with a label, representing a technique from the MITRE ATT&CK framework taxonomy. A detailed description and category distribution of the initial datasets are shown in Table 6 and Fig. 6(a), respectively. It’s obvious that the initial datasets have a long-tailed distribution which means there are a few samples for most of the categories. To address this problem, we use a method of data Augmentation with in-context learning to build the categories balance augmented datasets. We first divide the original datasets into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification. Then we keep the same validation set and test set for all experiments which would reduce data leakage and ensure fairness. Finally, for the training datasets, we use data augmentation on categories with less than 50 samples so that the model can better learn the characteristics of these categories. Specifically, we iteratively use in-context learning for data augmentation until the final number of samples reaches 50, and filter out low-quality samples through critic model.

NER Dataset: We use Dionisio et al. [12] dataset collected from tweets through a selection of manually curated accounts and passed through a filter. The fine-grained NER targets are manually labeled. Specifically, the dataset contains 11074 tweets and 12356 entities with five different types of IOCs: ID, ORG, PRO, VER and VUL, accounting for 5770, 926, 3349, 1445 and 866 respectively. The same as the mapping dataset, we divide the NER dataset into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification to train our NER model and evaluate it.

Table 6. A detailed description of datasets

Full size table

Metrics: To compare the performance of different classification models, we consider three representative metrics: Precision, Recall and F-Measure.

1.2 B.2 Implementation Details

HMACT: We employ the ChatGPT(gpt-3.5-turbo-0301) model from the OpenAI API, with 175 billion parameters, as our LLM to generate new samples in the data augmentation step and we also fix the temperature to 0 to get the most certainty answer which can greatly reduce the number of low-quality samples. For categories that have less than 50 samples, we randomly select four demonstration examples as context to elicit ChatGPT to generate new samples until the total number of samples reaches 60. To further filter generated samples, we use SBERT as the critic model, which is based on dense sentence encoding. We set the upper and lower thresholds of similarity between generated samples and original samples as 0.8 and 0.3, respectively, which can eliminate duplicate and hallucinated samples. For the text encoder, we use two models: BERT [11] and RoBERTa [18]. For the label encoder in the hierarchical module, we initialize the GCN network with prior knowledge about the relationships between labels in the predefined hierarchy and corpus to accelerate the convergence of training. For the hierarchy-aware mapping, we set \(\gamma \) to 0.5 to penalize siblings with the target label and 0.7, 1 for non-sibling nodes and higher-level nodes. For the NER model, we use the BERT-base model that adopts a 12-layer structure and maps text to 768 dimensions. During the training process, a fine-tuning strategy is applied, wherein the first four layers of the model are kept fixed, and the weights of the subsequent eight layers are adjusted with a low learning rate of \(2\times 10^{-5}\). This approach ensures the effectiveness of training while reducing both training time and the required training set. The hidden state dimension of the BiLSTM is set to 256, and a dropout layer with a dropout rate of 0.3 is added to prevent overfitting. Because the characteristics and functions of the LSTM, BERT and CRF models are different, it is necessary to adopt a hierarchical learning rate method to learn better model parameters. The learning rate of BERT is set to \(2\times 10^{-5}\), and the learning rate of LSTM is 0.001. However, due to the large transfer matrix of CRF, a relatively large learning rate of 0.01 needs to be set. The batch size is set to 16, the training epoch is 10, Adam is used as the optimizer, CRF.loss is used as the LOSS function and CRF.accuracy is selected as the evaluation function. When the validation loss is no longer reduced, it is reduced by a factor of 0.1 overall learning rate.

Hardware Environment: CPU is Intel Xeon Gold 6248R with 14 cores and 2.00GHz, memory is 72G. GPU uses A100-PCIE with a memory of 40G.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, Z., Li, C., Fu, X., Luo, B., Du, X. (2024). Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14985. Springer, Cham. https://doi.org/10.1007/978-3-031-70903-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-70903-6_4
Published: 05 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70902-9
Online ISBN: 978-3-031-70903-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques