Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques | SpringerLink
Skip to main content

Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques

  • Conference paper
  • First Online:
Computer Security – ESORICS 2024 (ESORICS 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14985))

Included in the following conference series:

  • 982 Accesses

Abstract

With the advancement of cyber technology, proactive security methods such as adversary emulation and leveraging Cyber Threat Intelligence (CTI) have become increasingly essential. Currently, some methods have achieved automatic mapping of unstructured text Cyber Threat Intelligence to attack techniques that could facilitate proactive security. However, these methods do not consider the semantic relationships between CTI and attack techniques at different abstraction levels, which leads to poor performance in the classification. In this work, we propose a Hierarchy-aware method for Mapping of CTI to Attack Techniques (HMCAT). Specifically, HMCAT first extracts Indicators of Compromise (IOC) entities in the CTI with two steps, then projects the CTI with IOC entities and the corresponding attack technique into a joint embedding space. Finally, HMCAT captures the semantics relationship among text descriptions, coarse-grained techniques, fine-grained techniques and unrelated techniques through a hierarchy-aware mapping loss. Meanwhile, we also propose a data augmentation technique based on in-context learning to solve the problem of long-tailed distribution in the Adversarial Tactics, Techniques and Common Knowledge (ATT&CK) datasets, which could further improve the performance of mapping. Experimental results demonstrate that HMCAT significantly outperforms previous ML and DL methods, improving precision, recall and F-Measure by 6.6%, 13.9% and 9.9% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alves, P.M.M.R., Geraldo Filho, P.R., Gonçalves, V.P.: Leveraging BERT’s power to classify TTP from unstructured text. In: 2022 Workshop on Communication Networks and Power Systems (WCNPS), pp. 1–7. IEEE (2022)

    Google Scholar 

  2. Ampel, B., Samtani, S., Ullman, S., Chen, H.: Linking common vulnerabilities and exposures to the MITRE ATT &CK framework: a self-distillation approach. arXiv preprint arXiv:2108.01696 (2021)

  3. Antle, A.N.: The CTI framework: informing the design of tangible systems for children. In: Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 195–202 (2007)

    Google Scholar 

  4. Applebaum, A., Miller, D., Strom, B., Korban, C., Wolf, R.: Intelligent, automated red team emulation. In: Proceedings of the 32nd Annual Conference on Computer Security Applications, pp. 363–373 (2016)

    Google Scholar 

  5. Bendovschi, A.: Cyber-attacks-trends, patterns and security countermeasures. Procedia Econ. Financ. 28, 24–31 (2015)

    Article  Google Scholar 

  6. Brown, R., Lee, R.M.: The evolution of cyber threat intelligence (CTI): 2019 sans CTI survey. SANS Institute (2019). https://www.sans.org/white-papers/38790/. Accessed 12 July 2021

  7. Brown, T.B., et al.: Language models are few-shot learners (2020)

    Google Scholar 

  8. Chen, H., Ma, Q., Lin, Z., Yan, J.: Hierarchy-aware label semantics matching network for hierarchical text classification. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 4370–4379 (2021)

    Google Scholar 

  9. MITRE Corporation. MITRE ATT &CK. https://attack.mitre.org/

  10. CTID. Fin6 adversary plan. https://github.com/center-for-threat-informed defense/adversary_emulation_library/tree/master/fin6

  11. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Dionísio, N., Alves, F., Ferreira, P.M., Bessani, A.: Cyberthreat detection from twitter using deep neural networks (2019)

    Google Scholar 

  13. Hemberg, E., et al.: Linking threat tactics, techniques, and patterns with defensive weaknesses, vulnerabilities and affected platform configurations for cyber hunting. arXiv preprint arXiv:2010.00533 (2020)

  14. Hutchins, E., Cloppert, M., Amin, R.: Intelligence-driven computer network defense informed by analysis of adversary campaigns and intrusion kill chains, whitepaper, lockheed martin corp., nov. 2011 (2010)

    Google Scholar 

  15. Legoy, V., Caselli, M., Seifert, C., Peter, A.: Automated retrieval of ATT &CK tactics and techniques for cyber threat reports. arXiv preprint arXiv:2004.14322 (2020)

  16. Lewis, D.D., Yang, Y., Russell-Rose, T., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)

    Google Scholar 

  17. Liu, C., Wang, J., Chen, X.: Threat intelligence ATT &CK extraction based on the attention transformer hierarchical recurrent neural network. Appl. Soft. Comput. 122, 108826 (2022)

    Article  Google Scholar 

  18. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  19. Mazzini, D., Napoletano, P., Piccoli, F., Schettini, R.: A novel approach to data augmentation for pavement distress segmentation. Comput. Ind. 121, 103225 (2020)

    Article  Google Scholar 

  20. Oosthoek, K., Doerr, C.: SoK: ATT &CK techniques and trends in windows malware. In: Chen, S., Choo, K.-K.R., Fu, X., Lou, W., Mohaisen, A. (eds.) SecureComm 2019. LNICST, vol. 304, pp. 406–425. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-37228-6_20

    Chapter  Google Scholar 

  21. Orbinato, V., Barbaraci, M., Natella, R., Cotroneo, D.: Automatic mapping of unstructured cyber threat intelligence: an experimental study. arXiv preprint arXiv:2208.12144 (2022)

  22. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)

  23. Tosh, D., Sengupta, S., Kamhoua, C.A., Kwiat, K.A.: Establishing evolutionary game models for cyber security information exchange (CYBEX). J. Comput. Syst. Sci. 98, 27–52 (2018)

    Article  MathSciNet  Google Scholar 

  24. Wang, B., Chen, L., Sun, W., Qin, K., Li, K., Zhou, H.: Ranking-based autoencoder for extreme multi-label classification. arXiv preprint arXiv:1904.05937 (2019)

  25. Wu, Y., et al.: Price tag: towards semi-automatically discovery tactics, techniques and procedures of e-commerce cyber threat intelligence. IEEE Trans. Dependable Secure Comput. (2021)

    Google Scholar 

  26. You, Y., et al.: Tim: threat context-enhanced TTP intelligence mining on unstructured threat data. Cybersecurity 5(1), 3 (2022)

    Article  Google Scholar 

  27. Zhongkun, Yu., Wang, J.F., Tang, B.H., Li, L.: Tactics and techniques classification in cyber threat intelligence. Comput. J. 66(8), 1870–1881 (2022)

    Google Scholar 

  28. Zhou, J., et al.: Hierarchy-aware global model for hierarchical text classification. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1106–1117 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiqiang Hao .

Editor information

Editors and Affiliations

Appendices

A The Comparison of Dataset Distributions

The initial distribution, depicted in Fig. 6(a), reveals numerous categories with a scarcity of samples. In some categories, as few as 3–5 samples are available due to limited data availability. Therefore, we use a data augmentation technique to enhance scarce categories. The augmented distribution and results are presented in Fig. 6(b). After using the data augmentation technique to increase samples, the model has a stronger ability to recognize different attack techniques and the text representation of different attack techniques becomes farther and farther away to avoid some classification confusion.

B Experimental Setup

In this Appendix, we introduce the datasets used in this paper and present the experimental results that demonstrate the advantages of our methods from a comparative perspective.

Fig. 6.
figure 6

The comparison of distributions between the initial dataset and the augmented dataset.

1.1 B.1 Datasets and Evaluation Metrics

Mapping Datasets: We follow Orbinato’s method [21] to build the datasets that use the public knowledge base of the MITRE ATT&CK framework. Each sample in the datasets corresponds to a specific malicious technique and has been annotated with a label, representing a technique from the MITRE ATT&CK framework taxonomy. A detailed description and category distribution of the initial datasets are shown in Table 6 and Fig. 6(a), respectively. It’s obvious that the initial datasets have a long-tailed distribution which means there are a few samples for most of the categories. To address this problem, we use a method of data Augmentation with in-context learning to build the categories balance augmented datasets. We first divide the original datasets into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification. Then we keep the same validation set and test set for all experiments which would reduce data leakage and ensure fairness. Finally, for the training datasets, we use data augmentation on categories with less than 50 samples so that the model can better learn the characteristics of these categories. Specifically, we iteratively use in-context learning for data augmentation until the final number of samples reaches 50, and filter out low-quality samples through critic model.

NER Dataset: We use Dionisio et al. [12] dataset collected from tweets through a selection of manually curated accounts and passed through a filter. The fine-grained NER targets are manually labeled. Specifically, the dataset contains 11074 tweets and 12356 entities with five different types of IOCs: ID, ORG, PRO, VER and VUL, accounting for 5770, 926, 3349, 1445 and 866 respectively. The same as the mapping dataset, we divide the NER dataset into a training set, validation set and test set with a ratio of 6:2:2, while ensuring the balance of classification to train our NER model and evaluate it.

Table 6. A detailed description of datasets

Metrics: To compare the performance of different classification models, we consider three representative metrics: Precision, Recall and F-Measure.

1.2 B.2 Implementation Details

HMACT: We employ the ChatGPT(gpt-3.5-turbo-0301) model from the OpenAI API, with 175 billion parameters, as our LLM to generate new samples in the data augmentation step and we also fix the temperature to 0 to get the most certainty answer which can greatly reduce the number of low-quality samples. For categories that have less than 50 samples, we randomly select four demonstration examples as context to elicit ChatGPT to generate new samples until the total number of samples reaches 60. To further filter generated samples, we use SBERT as the critic model, which is based on dense sentence encoding. We set the upper and lower thresholds of similarity between generated samples and original samples as 0.8 and 0.3, respectively, which can eliminate duplicate and hallucinated samples. For the text encoder, we use two models: BERT [11] and RoBERTa [18]. For the label encoder in the hierarchical module, we initialize the GCN network with prior knowledge about the relationships between labels in the predefined hierarchy and corpus to accelerate the convergence of training. For the hierarchy-aware mapping, we set \(\gamma \) to 0.5 to penalize siblings with the target label and 0.7, 1 for non-sibling nodes and higher-level nodes. For the NER model, we use the BERT-base model that adopts a 12-layer structure and maps text to 768 dimensions. During the training process, a fine-tuning strategy is applied, wherein the first four layers of the model are kept fixed, and the weights of the subsequent eight layers are adjusted with a low learning rate of \(2\times 10^{-5}\). This approach ensures the effectiveness of training while reducing both training time and the required training set. The hidden state dimension of the BiLSTM is set to 256, and a dropout layer with a dropout rate of 0.3 is added to prevent overfitting. Because the characteristics and functions of the LSTM, BERT and CRF models are different, it is necessary to adopt a hierarchical learning rate method to learn better model parameters. The learning rate of BERT is set to \(2\times 10^{-5}\), and the learning rate of LSTM is 0.001. However, due to the large transfer matrix of CRF, a relatively large learning rate of 0.01 needs to be set. The batch size is set to 16, the training epoch is 10, Adam is used as the optimizer, CRF.loss is used as the LOSS function and CRF.accuracy is selected as the evaluation function. When the validation loss is no longer reduced, it is reduced by a factor of 0.1 overall learning rate.

Hardware Environment: CPU is Intel Xeon Gold 6248R with 14 cores and 2.00GHz, memory is 72G. GPU uses A100-PCIE with a memory of 40G.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hao, Z., Li, C., Fu, X., Luo, B., Du, X. (2024). Leveraging Hierarchies: HMCAT for Efficiently Mapping CTI to Attack Techniques. In: Garcia-Alfaro, J., Kozik, R., Choraś, M., Katsikas, S. (eds) Computer Security – ESORICS 2024. ESORICS 2024. Lecture Notes in Computer Science, vol 14985. Springer, Cham. https://doi.org/10.1007/978-3-031-70903-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70903-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70902-9

  • Online ISBN: 978-3-031-70903-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics