Abstract
Cyber Threat Intelligence (CTI) provides a structured and interconnected model for threat information through Cybersecurity Knowledge Graphs. This allows researchers and practitioners to represent and organize complex relationships and entities in a more coherent form. Above all, the discovery of hidden relationships between different CTI entities, such as threat actors, malware, infrastructure, and attacks, is becoming a crucial task in this domain, facilitating proactive defense measures and helping to identify Tactics, Techniques, and Procedures (TTPs) employed by malicious parties. In this paper, we provide a Systematization of Knowledge (SoK) to analyze the existing literature and give insights into the important CTI task of Relation Extraction. In particular, we design a categorization of the relations used in CTI; we analyze the techniques employed for their extraction, the emerging trends and open issues in this context, and the main future directions. This work provides a novel and fresh perspective that can help the reader understand how relationships among entities can be schematized to provide a better view of the cyber threat landscape.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The paper’s publication venue significance is assessed using Scimago and Core.edu rankings for journals and conferences, respectively.
- 4.
- 5.
References
Aghaei, E., Niu, X., Shadid, W., Al-Shaer, E.: SecureBERT: a domain-specific language model for cybersecurity. In: Security and Privacy in Communication Networks: 18th EAI International Conference, SecureComm 2022, Virtual Event, October 2022, Proceedings, pp. 39–56. Springer (2023). https://doi.org/10.1007/978-3-031-25538-0_3
Ahmed, K., Khurshid, S.K., Hina, S.: CyberEntRel: joint extraction of cyber entities and relations using deep learning. Comput. Secur. 136, 103579 (2024)
Arazzi, M., Arikkat, D.R., Nicolazzo, S., Nocera, A., Conti, M., et al.: NLP-based techniques for cyber threat intelligence. arXiv preprint arXiv:2311.08807 (2023)
Arazzi, M., Nicolazzo, S., Nocera, A., Zippo, M.: The importance of the language for the evolution of online communities: an analysis based on twitter and reddit. Expert Syst. Appl. 222, 119847 (2023)
Buccafurri, F., Lax, G., Nicolazzo, S., Nocera, A., Ursino, D.: Measuring betweenness centrality in social internetworking scenarios. In: Demey, Y.T., Panetto, H. (eds.) OTM 2013. LNCS, vol. 8186, pp. 666–673. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41033-8_84
Chang, Y., et al.: A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. (2023)
Church, K.W.: Word2vec. Nat. Lang. Eng. 23(1), 155–162 (2017)
Dong, Y., Guo, W., Chen, Y., Xing, X., Zhang, Y., Wang, G.: Towards the detection of inconsistencies in public security vulnerability reports. In: 28th USENIX Security Symposium (USENIX Security 19), pp. 869–885 (2019)
Ebrahimi, M., Surdeanu, M., Samtani, S., Chen, H.: Detecting cyber threats in non-english dark net markets: a cross-lingual transfer learning approach. In: 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 85–90. IEEE (2018)
Fan, M., Luo, X., Liu, J., Nong, C., Zheng, Q., Liu, T.: CTDroid: leveraging a corpus of technical blogs for android malware analysis. IEEE Trans. Reliab. 69(1), 124–138 (2019)
Ferrag, M.A., et al.: Revolutionizing cyber threat detection with large language models: a privacy-preserving BERT-based lightweight model for IoT/IIoT devices. IEEE Access (2024)
Gao, P., et al.: Enabling efficient cyber threat hunting with cyber threat intelligence. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 193–204. IEEE (2021)
Guo, Y., et al.: CyberRel: joint entity and relation extraction for cybersecurity concepts. In: Gao, D., Li, Q., Guan, X., Liao, X. (eds.) Information and Communications Security: 23rd International Conference, ICICS 2021, Chongqing, China, November 19-21, 2021, Proceedings, Part I, pp. 447–463. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86890-1_25
Guo, Y., et al.: A framework for threat intelligence extraction and fusion. Comput. Secur. 132, 103371 (2023)
Happe, A., Cito, J.: Getting pwn’d by AI: penetration testing with large language models. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 2082–2086 (2023)
Huang, C.C., et al.: Building cybersecurity ontology for understanding and reasoning adversary tactics and techniques. In: 2022 IEEE International Conference on Big Data (Big Data), pp. 4266–4274. IEEE (2022)
Huang, Y.T., Lin, C.Y., Guo, Y.R., Lo, K.C., Sun, Y.S., Chen, M.C.: Open source intelligence for malicious behavior discovery and interpretation. IEEE Trans. Dependable Secure Comput. 19(2), 776–789 (2021)
Jones, C.L., Bridges, R.A., Huffer, K.M., Goodall, J.R.: Towards a relation extraction framework for cyber-security concepts. In: Proceedings of the 10th Annual Cyber and Information Security Research Conference, pp. 1–4 (2015)
Li, T., Sahu, A.K., Talwalkar, A., Smith, V.: Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37(3), 50–60 (2020)
Li, Z., Zeng, J., Chen, Y., Liang, Z.: AttacKG: constructing technique knowledge graph from cyber threat intelligence reports. In: Atluri, V., Di Pietro, R., Jensen, C.D., Meng, W. (eds.) Computer Security – ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, September 26–30, 2022, Proceedings, Part I, pp. 589–609. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-031-17140-6_29
Liu, J., et al.: TriCTI: an actionable cyber threat intelligence discovery system via trigger-enhanced neural network. Cybersecurity 5(1), 8 (2022)
Lu, G., Ju, X., Chen, X., Pei, W., Cai, Z.: Grace: empowering LLM-based software vulnerability detection with graph structure and in-context learning. J. Syst. Softw., 112031 (2024)
Ma, X., Wang, L., Lv, Q., Wang, Y., Zhang, Q., Jiang, J.: CyEvent2vec: attributed heterogeneous information network based event embedding framework for cyber security events analysis. In: 2022 International Joint Conference on Neural Networks (IJCNN), pp. 01–08. IEEE (2022)
McIntosh, T., et al.: Harnessing GPT-4 for generation of cybersecurity GRC policies: a focus on ransomware attack mitigation. Comput. Secur. 134, 103424 (2023)
Mfogo, V.S., Zemkoho, A., Njilla, L., Nkenlifack, M., Kamhoua, C.: AIIPot: Adaptive intelligent-interaction honeypot for IoT devices. In: 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–6. IEEE (2023)
Mitra, S., Piplai, A., Mittal, S., Joshi, A.: Combating fake cyber threat intelligence using provenance in cybersecurity knowledge graphs. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3316–3323. IEEE (2021)
Mu, D., Cuevas, A., Yang, L., Hu, H., Xing, X., Mao, B., Wang, G.: Understanding the reproducibility of crowd-reported security vulnerabilities. In: 27th USENIX Security Symposium (USENIX Security 18), pp. 919–936 (2018)
Pearce, H., Tan, B., Ahmad, B., Karri, R., Dolan-Gavitt, B.: Examining zero-shot vulnerability repair with large language models. In: 2023 IEEE Symposium on Security and Privacy (SP), pp. 2339–2356. IEEE (2023)
Perrina, F., Marchiori, F., Conti, M., Verde, N.V.: AGIR: automating cyber threat intelligence reporting with natural language generation. In: 2023 IEEE International Conference on Big Data (BigData), pp. 3053–3062. IEEE (2023)
Pingle, A., Piplai, A., Mittal, S., Joshi, A., Holt, J., Zak, R.: RelExt: relation extraction using deep learning approaches for cybersecurity knowledge graph improvement. In: Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pp. 879–886 (2019)
Quattrone, G., Nicolazzo, S., Nocera, A., Quercia, D., Capra, L.: Is the sharing economy about sharing at all? A linguistic analysis of airbnb reviews. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12, issue 1 (2018)
Rahman, M.R., Hezaveh, R.M., Williams, L.: What are the attackers doing now? Automating cyberthreat intelligence extraction from text on pace with the changing threat landscape: a survey. ACM Comput. Surv. 55(12), 1–36 (2023)
Ranade, P., Piplai, A., Joshi, A., Finin, T.: CyBERT: contextualized embeddings for the cybersecurity domain. In: 2021 IEEE International Conference on Big Data (Big Data), pp. 3334–3342. IEEE (2021)
Ranade, P., Piplai, A., Mittal, S., Joshi, A., Finin, T.: Generating fake cyber threat intelligence using transformer-based models. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2021)
Sameera, K.M., Nicolazzo, S., Arazzi, M., Nocera, A., Rafidha Rehiman, K.A., Conti, M., et al.: Privacy-preserving in blockchain-based federated learning systems. arXiv e-prints–2401 (2024)
Satvat, K., Gjomemo, R., Venkatakrishnan, V.: Extractor: extracting attack behavior from threat reports. In: 2021 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 598–615. IEEE (2021)
Satyapanich, T., Ferraro, F., Finin, T.: CASIE: extracting cybersecurity event information from text. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(05), pp. 8749–8757 (2020)
Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pp. 2371–2378 (2016)
Sun, N., Ding, M., Jiang, J., Xu, W., Mo, X., Tai, Y., Zhang, J.: Cyber threat intelligence mining for proactive cybersecurity defense: a survey and new perspectives. IEEE Commun. Surv. Tutorials (2023)
Syed, Z., Padia, A., Finin, T., Mathews, L., Joshi, A.: UCO: a unified cybersecurity ontology. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)
Vörös, T., Bergeron, S.P., Berlin, K.: Web content filtering through knowledge distillation of large language models. In: 2023 IEEE International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), pp. 357–361. IEEE (2023)
Wang, X., Xiong, M., Luo, Y., Li, N., Jiang, Z., Xiong, Z.: Joint learning for document-level threat intelligence relation extraction and coreference resolution based on GCN. In: 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pp. 584–591. IEEE (2020)
Wei, Y., Bo, L., Sun, X., Li, B., Zhang, T., Tao, C.: Automated event extraction of CVE descriptions. Inf. Softw. Technol. 158, 107178 (2023)
Yang, J., et al.: Harnessing the power of LLMs in practice: a survey on ChatGPT and beyond. ACM Trans. Knowl. Discov. Data (2023)
Yitagesu, S., Xing, Z., Zhang, X., Feng, Z., Li, X., Han, L.: Unsupervised labeling and extraction of phrase-based concepts in vulnerability descriptions. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 943–954. IEEE (2021)
Yitagesu, S., Xing, Z., Zhang, X., Feng, Z., Li, X., Han, L.: Extraction of phrase-based concepts in vulnerability descriptions through unsupervised labeling. ACM Trans. Softw. Eng. Methodol. 32(5), 1–45 (2023)
Yu, F., Martin, M.V.: Honey, i chunked the passwords: generating semantic honeywords resistant to targeted attacks using pre-trained language models. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 89–108. Springer (2023). https://doi.org/10.1007/978-3-031-35504-2_5
Yu, L., Lu, J., Liu, X., Yang, L., Zhang, F., Ma, J.: PSCVFinder: a prompt-tuning based framework for smart contract vulnerability detection. In: 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), pp. 556–567. IEEE (2023)
Zeng, D., Liu, K., Lai, S., Zhou, G., Zhao, J.: Relation classification via convolutional deep neural network. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 2335–2344 (2014)
Zhang, D., Wang, D.: Relation classification via recurrent neural network. arXiv preprint arXiv:1508.01006 (2015)
Zhang, H., Shen, G., Guo, C., Cui, Y., Jiang, C.: EX-Action: automatically extracting threat actions from cyber threat intelligence report based on multimodal learning. Secur. Commun. Netw. 2021, 1–12 (2021)
Zhao, X., Jiang, R., Han, Y., Li, A., Peng, Z.: A survey on cybersecurity knowledge graph construction. Comput. Secur., 103524 (2023)
Zhou, P., Shi, W., Tian, J., Qi, Z., Li, B., Hao, H., Xu, B.: Attention-based bidirectional long short-term memory networks for relation classification. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (volume 2: Short papers), pp. 207–212 (2016)
Zhou, Y., Ren, Y., Yi, M., Xiao, Y., Tan, Z., Moustafa, N., Tian, Z.: CDTier: a Chinese dataset of threat intelligence entity relationships. IEEE Trans. Sustain. Comput. (2023)
Acknowledgments
This work was supported by HORIZON Europe Framework Programme partly supported this work through the project “OPTIMA - Organization sPecific Threat Intelligence Mining and sharing” (101063107), and PRIN 2022 Project “HOMEY: a Human-centric IoE-based Framework for Supporting the Transition Towards Industry 5.0” (code: 2022NX7WKE, CUP: F53D23004340006) funded by the European Union - Next Generation EU, and SERICS (PE00000014) project under the NRRP MUR program funded by the EU - NGEU. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Arikkat, D.R., Vinod, P., K. A., R.R., Nicolazzo, S., Nocera, A., Conti, M. (2024). Relation Extraction Techniques in Cyber Threat Intelligence. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14762. Springer, Cham. https://doi.org/10.1007/978-3-031-70239-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-70239-6_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70238-9
Online ISBN: 978-3-031-70239-6
eBook Packages: Computer ScienceComputer Science (R0)