Relation Extraction Techniques in Cyber Threat Intelligence | SpringerLink
Skip to main content

Relation Extraction Techniques in Cyber Threat Intelligence

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2024)

Abstract

Cyber Threat Intelligence (CTI) provides a structured and interconnected model for threat information through Cybersecurity Knowledge Graphs. This allows researchers and practitioners to represent and organize complex relationships and entities in a more coherent form. Above all, the discovery of hidden relationships between different CTI entities, such as threat actors, malware, infrastructure, and attacks, is becoming a crucial task in this domain, facilitating proactive defense measures and helping to identify Tactics, Techniques, and Procedures (TTPs) employed by malicious parties. In this paper, we provide a Systematization of Knowledge (SoK) to analyze the existing literature and give insights into the important CTI task of Relation Extraction. In particular, we design a categorization of the relations used in CTI; we analyze the techniques employed for their extraction, the emerging trends and open issues in this context, and the main future directions. This work provides a novel and fresh perspective that can help the reader understand how relationships among entities can be schematized to provide a better view of the cyber threat landscape.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 14871
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11725
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://cve.mitre.org.

  2. 2.

    https://nvd.nist.gov.

  3. 3.

    The paper’s publication venue significance is assessed using Scimago and Core.edu rankings for journals and conferences, respectively.

  4. 4.

    https://demos.explosion.ai/displacy.

  5. 5.

    https://docs.oasis-open.org/cti/stix/v2.0/stix-v2.0-part1-stix-core.html.

References

  1. Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E.: SecureBERT: a domain-specific language model for cybersecurity. In: Security and Privacy in Communication Networks: 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings, pp. 39–56. Springer (2023). https://doi.org/10.1007/978-3-031-25538-0_3

  2. Ahmed, K., Khurshid, S.K., Hina, S.: CyberEntRel: joint extraction of cyber entities and relations using deep learning. Comput. Secur. 136, 103579 (2024)

    Article  Google Scholar 

  3. Arazzi, M., Arikkat, D.R., Nicolazzo, S., Nocera, A., Conti, M., et al.: NLP-based techniques for cyber threat intelligence. arXiv preprint arXiv:2311.08807 (2023)

  4. Arazzi, M., Nicolazzo, S., Nocera, A., Zippo, M.: The importance of the language for the evolution of online communities: an analysis based on twitter and reddit. Expert Syst. Appl. 222, 119847 (2023)

    Article  Google Scholar 

  5. Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A., Ursino, D.: Measuring betweenness centrality in social internetworking scenarios. In: Demey, Y.T., Panetto, H. (eds.) OTM 2013. LNCS, vol. 8186, pp. 666–673. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41033-8_84

    Chapter  Google Scholar 

  6. Chang, Y., et al.: A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. (2023)

    Google Scholar 

  7. Church, K.W.: Word2vec. Nat. Lang. Eng. 23(1), 155–162 (2017)

    Article  Google Scholar 

  8. Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., Wang, G.: Towards the detection of inconsistencies in public security vulnerability reports. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 869–885 (2019)

    Google Scholar 

  9. Ebrahimi, M., Surdeanu, M., Samtani, S., Chen, H.: Detecting cyber threats in non-english dark net markets: a cross-lingual transfer learning approach. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 85–90. IEEE (2018)

    Google Scholar 

  10. Fan, M., Luo, X., Liu, J., Nong, C., Zheng, Q., Liu, T.: CTDroid: leveraging a corpus of technical blogs for android malware analysis. IEEE Trans. Reliab. 69(1), 124–138 (2019)

    Article  Google Scholar 

  11. Ferrag, M.A., et al.: Revolutionizing cyber threat detection with large language models: a privacy-preserving BERT-based lightweight model for IoT/IIoT devices. IEEE Access (2024)

    Google Scholar 

  12. Gao, P., et al.: Enabling efficient cyber threat hunting with cyber threat intelligence. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 193–204. IEEE (2021)

    Google Scholar 

  13. Guo, Y., et al.: CyberRel: joint entity and relation extraction for cybersecurity concepts. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds.) Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, November 19-21, 2021, Proceedings, Part I, pp. 447–463. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86890-1_25

    Chapter  Google Scholar 

  14. Guo, Y., et al.: A framework for threat intelligence extraction and fusion. Comput. Secur. 132, 103371 (2023)

    Article  Google Scholar 

  15. Happe, A., Cito, J.: Getting pwn’d by AI: penetration testing with large language models. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2082–2086 (2023)

    Google Scholar 

  16. Huang, C.C., et al.: Building cybersecurity ontology for understanding and reasoning adversary tactics and techniques. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4266–4274. IEEE (2022)

    Google Scholar 

  17. Huang, Y.T., Lin, C.Y., Guo, Y.R., Lo, K.C., Sun, Y.S., Chen, M.C.: Open source intelligence for malicious behavior discovery and interpretation. IEEE Trans. Dependable Secure Comput. 19(2), 776–789 (2021)

    Google Scholar 

  18. Jones, C.L., Bridges, R.A., Huffer, K.M., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts. In: Proceedings of the 10th Annual Cyber and Information Security Research Conference, pp. 1–4 (2015)

    Google Scholar 

  19. Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020)

    Article  Google Scholar 

  20. Li, Z., Zeng, J., Chen, Y., Liang, Z.: AttacKG: constructing technique knowledge graph from cyber threat intelligence reports. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds.) Computer Security – ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26–30, 2022, Proceedings, Part I, pp. 589–609. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-17140-6_29

    Chapter  Google Scholar 

  21. Liu, J., et al.: TriCTI: an actionable cyber threat intelligence discovery system via trigger-enhanced neural network. Cybersecurity 5(1), 8 (2022)

    Article  Google Scholar 

  22. Lu, G., Ju, X., Chen, X., Pei, W., Cai, Z.: Grace: empowering LLM-based software vulnerability detection with graph structure and in-context learning. J. Syst. Softw., 112031 (2024)

    Google Scholar 

  23. Ma, X., Wang, L., Lv, Q., Wang, Y., Zhang, Q., Jiang, J.: CyEvent2vec: attributed heterogeneous information network based event embedding framework for cyber security events analysis. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–08. IEEE (2022)

    Google Scholar 

  24. McIntosh, T., et al.: Harnessing GPT-4 for generation of cybersecurity GRC policies: a focus on ransomware attack mitigation. Comput. Secur. 134, 103424 (2023)

    Article  Google Scholar 

  25. Mfogo, V.S., Zemkoho, A., Njilla, L., Nkenlifack, M., Kamhoua, C.: AIIPot: Adaptive intelligent-interaction honeypot for IoT devices. In: 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6. IEEE (2023)

    Google Scholar 

  26. Mitra, S., Piplai, A., Mittal, S., Joshi, A.: Combating fake cyber threat intelligence using provenance in cybersecurity knowledge graphs. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3316–3323. IEEE (2021)

    Google Scholar 

  27. Mu, D., Cuevas, A., Yang, L., Hu, H., Xing, X., Mao, B., Wang, G.: Understanding the reproducibility of crowd-reported security vulnerabilities. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 919–936 (2018)

    Google Scholar 

  28. Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Examining zero-shot vulnerability repair with large language models. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 2339–2356. IEEE (2023)

    Google Scholar 

  29. Perrina, F., Marchiori, F., Conti, M., Verde, N.V.: AGIR: automating cyber threat intelligence reporting with natural language generation. In: 2023 IEEE International Conference on Big Data (BigData), pp. 3053–3062. IEEE (2023)

    Google Scholar 

  30. Pingle, A., Piplai, A., Mittal, S., Joshi, A., Holt, J., Zak, R.: RelExt: relation extraction using deep learning approaches for cybersecurity knowledge graph improvement. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 879–886 (2019)

    Google Scholar 

  31. Quattrone, G., Nicolazzo, S., Nocera, A., Quercia, D., Capra, L.: Is the sharing economy about sharing at all? A linguistic analysis of airbnb reviews. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, issue 1 (2018)

    Google Scholar 

  32. Rahman, M.R., Hezaveh, R.M., Williams, L.: What are the attackers doing now? Automating cyberthreat intelligence extraction from text on pace with the changing threat landscape: a survey. ACM Comput. Surv. 55(12), 1–36 (2023)

    Article  Google Scholar 

  33. Ranade, P., Piplai, A., Joshi, A., Finin, T.: CyBERT: contextualized embeddings for the cybersecurity domain. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3334–3342. IEEE (2021)

    Google Scholar 

  34. Ranade, P., Piplai, A., Mittal, S., Joshi, A., Finin, T.: Generating fake cyber threat intelligence using transformer-based models. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2021)

    Google Scholar 

  35. Sameera, K.M., Nicolazzo, S., Arazzi, M., Nocera, A., Rafidha Rehiman, K.A., Conti, M., et al.: Privacy-preserving in blockchain-based federated learning systems. arXiv e-prints–2401 (2024)

    Google Scholar 

  36. Satvat, K., Gjomemo, R., Venkatakrishnan, V.: Extractor: extracting attack behavior from threat reports. In: 2021 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 598–615. IEEE (2021)

    Google Scholar 

  37. Satyapanich, T., Ferraro, F., Finin, T.: CASIE: extracting cybersecurity event information from text. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(05), pp. 8749–8757 (2020)

    Google Scholar 

  38. Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 2371–2378 (2016)

    Google Scholar 

  39. Sun, N., Ding, M., Jiang, J., Xu, W., Mo, X., Tai, Y., Zhang, J.: Cyber threat intelligence mining for proactive cybersecurity defense: a survey and new perspectives. IEEE Commun. Surv. Tutorials (2023)

    Google Scholar 

  40. Syed, Z., Padia, A., Finin, T., Mathews, L., Joshi, A.: UCO: a unified cybersecurity ontology. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  41. Vörös, T., Bergeron, S.P., Berlin, K.: Web content filtering through knowledge distillation of large language models. In: 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 357–361. IEEE (2023)

    Google Scholar 

  42. Wang, X., Xiong, M., Luo, Y., Li, N., Jiang, Z., Xiong, Z.: Joint learning for document-level threat intelligence relation extraction and coreference resolution based on GCN. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 584–591. IEEE (2020)

    Google Scholar 

  43. Wei, Y., Bo, L., Sun, X., Li, B., Zhang, T., Tao, C.: Automated event extraction of CVE descriptions. Inf. Softw. Technol. 158, 107178 (2023)

    Article  Google Scholar 

  44. Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on ChatGPT and beyond. ACM Trans. Knowl. Discov. Data (2023)

    Google Scholar 

  45. Yitagesu, S., Xing, Z., Zhang, X., Feng, Z., Li, X., Han, L.: Unsupervised labeling and extraction of phrase-based concepts in vulnerability descriptions. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 943–954. IEEE (2021)

    Google Scholar 

  46. Yitagesu, S., Xing, Z., Zhang, X., Feng, Z., Li, X., Han, L.: Extraction of phrase-based concepts in vulnerability descriptions through unsupervised labeling. ACM Trans. Softw. Eng. Methodol. 32(5), 1–45 (2023)

    Article  Google Scholar 

  47. Yu, F., Martin, M.V.: Honey, i chunked the passwords: generating semantic honeywords resistant to targeted attacks using pre-trained language models. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 89–108. Springer (2023). https://doi.org/10.1007/978-3-031-35504-2_5

  48. Yu, L., Lu, J., Liu, X., Yang, L., Zhang, F., Ma, J.: PSCVFinder: a prompt-tuning based framework for smart contract vulnerability detection. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 556–567. IEEE (2023)

    Google Scholar 

  49. Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344 (2014)

    Google Scholar 

  50. Zhang, D., Wang, D.: Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006 (2015)

  51. Zhang, H., Shen, G., Guo, C., Cui, Y., Jiang, C.: EX-Action: automatically extracting threat actions from cyber threat intelligence report based on multimodal learning. Secur. Commun. Netw. 2021, 1–12 (2021)

    Google Scholar 

  52. Zhao, X., Jiang, R., Han, Y., Li, A., Peng, Z.: A survey on cybersecurity knowledge graph construction. Comput. Secur., 103524 (2023)

    Google Scholar 

  53. Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp. 207–212 (2016)

    Google Scholar 

  54. Zhou, Y., Ren, Y., Yi, M., Xiao, Y., Tan, Z., Moustafa, N., Tian, Z.: CDTier: a Chinese dataset of threat intelligence entity relationships. IEEE Trans. Sustain. Comput. (2023)

    Google Scholar 

Download references

Acknowledgments

This work was supported by HORIZON Europe Framework Programme partly supported this work through the project “OPTIMA - Organization sPecific Threat Intelligence Mining and sharing” (101063107), and PRIN 2022 Project “HOMEY: a Human-centric IoE-based Framework for Supporting the Transition Towards Industry 5.0” (code: 2022NX7WKE, CUP: F53D23004340006) funded by the European Union - Next Generation EU, and SERICS (PE00000014) project under the NRRP MUR program funded by the EU - NGEU. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. Vinod .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Arikkat, D.R., Vinod, P., K. A., R.R., Nicolazzo, S., Nocera, A., Conti, M. (2024). Relation Extraction Techniques in Cyber Threat Intelligence. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14762. Springer, Cham. https://doi.org/10.1007/978-3-031-70239-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70239-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70238-9

  • Online ISBN: 978-3-031-70239-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics