Abstract
Nowadays, domain names are becoming crucial digital assets for any business. However, the media never stopped reporting phishing and identity theft attacks held by third-party entities that rely on domain names to mislead Internet users. Thus, Palo Alto Networks revealed in their studies 20 largely cyber-squatted domain names targeting popular brands. Based on their behavior, domain names appear in public lists that objectively evaluate their reputation. Blacklists contain domain names that have previously committed suspicious acts, whereas whitelists include the most popular and trustworthy domain names. For a long time, this listing technique has been used as a reactive approach to counter domain name-based attacks. However, it suffers from the limitation of responding late to attacks. Nowadays, techniques tend to be much more proactive. They operate before any attack occurs. As part of the CSNET conference, we published a short paper that describes a plethora of domain name attacks and their associated detection techniques using their lexical features (Hamroun et al. 2022). In this paper, we present an extended version of the original one which discusses the previously mentioned points in more detail and adds some elements of understanding when it comes to malicious domain name detection. Hence, we provide a literature review of malicious domain name detection techniques that use only the lexical features of domain names. These features are available, privacy-preserving, and highly improve detection results. The review covers recent works that report relevant performance categorized according to a new taxonomy. Moreover, we introduce a new criterion for comparing all the existing works based on targeted maliciousness type before discussing the limitations and the newly emerging research directions in this field.
Similar content being viewed by others
Notes
Also known as context-free, textual, semantic, statistical, or linguistic features.
Example from: https://unit42.paloaltonetworks.com/cybersquatting/.
The process of splitting a domain name into a set of co-occurring character sequences by advancing N characters each time
References
Hamroun C, Amamou A, Haddadou K, Haroun H, Pujolle G (2022) A review on lexical based malicious domain name detection methods. In: 2022 6th Cyber security in networking conference (CSNet), IEEE, pp 1–7
Domain names - implementation and specification. RFC Editor (1987). https://doi.org/10.17487/RFC1035. https://rfc-editor.org/rfc/rfc1035.txt
Zhao H, Chang Z, Bao G, Zeng X (2019) Malicious domain names detection algorithm based on n-gram. J. Comp Netw Commun 2019
Zago M, Gil Perez M, Martinez Perez G (2020) Scalable detection of botnets based on DGA. Soft Comput 24(8):5517–5537
Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E (2016) A comprehensive measurement study of domain generating malware. In: 25th USENIX Security Symposium (USENIX Security 16), USENIX Association, Austin, TX, pp 263–278. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/plohmann
Zhauniarovich Y, Khalil I, Yu T, Dacier M (2018) A survey on malicious domains detection through DNS data analysis. ACM Comput Surv 51(4):1–36
Fasllija E, Enişer HF, Prünster B (2019) Phish-hook: detecting phishing certificates using certificate transparency logs. In: International conference on security and privacy in communication systems, Springer, pp 320–334
Moubayed A, Aqeeli E, Shami A (2021) Detecting DNS typo-squatting using ensemble-based feature selection & classification models. IEEE Can J Electr Comput Eng 44(4):456–466. https://doi.org/10.1109/ICJECE.2021.3072008
Dinaburg A (2011) Bitsquatting: DNS hijacking without exploitation. Proceedings of BlackHat Security
Nikiforakis N, Van Acker S, Meert W, Desmet L, Piessens F, Joosen W. Bitsquatting: exploiting bit-flips for fun, or profit? In: Proceedings of the 22nd international conference on world wide web. WWW ’13, Association for Computing Machinery, New York, NY, USA, pp 989–998. https://doi.org/10.1145/2488388.2488474
Kintis P, Miramirkhani N, Lever C, Chen Y, Romero-Gómez R, Pitropakis N, Nikiforakis N, Antonakakis M (2017) Hiding in plain sight: a longitudinal study of combosquatting abuse. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. CCS ’17, Association for Computing Machinery, New York, NY, USA, pp 569–586. https://doi.org/10.1145/3133956.3134002
Du K, Yang H, Li Z, Duan H, Hao S, Liu B, Ye Y, Liu M, Su X, Liu G et al (2019) Tl; dr hazard: a comprehensive study of levelsquatting scams. In: International Conference on security and privacy in communication systems, Springer, pp 3–25
Rossow C, Dietrich CJ, Grier C, Kreibich C, Paxson V, Pohlmann N, Bos H, Steen MV (2012) Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE Symposium on Security and Privacy, pp 65–79. https://doi.org/10.1109/SP.2012.14
Selvi J, Rodriguez RJ, Soria-Olivas E (2019) Detection of algorithmically generated malicious domain names using masked n-grams. Expert Syst Appl 124:156–163
Zago M, Perez MG, Perez GM (2020) UMUDGA: a dataset for profiling DGA-based botnet. Computers & Security 92:101719
Suryotrisongko H (2020) Botnet DGA dataset. https://doi.org/10.21227/rg6z-z622
Le Pochat V, Van Goethem T, Tajalizadehkhoob S, Korczyński M, Joosen W (2019) Tranco: a research-oriented top sites ranking hardened against manipulation. In: Proceedings of the 26th annual network and distributed system security symposium. NDSS 2019. https://doi.org/10.14722/ndss.2019.23386
Vinayakumar R, Soman K, Poornachandran P, Alazab M, Thampi S (2019) Amritadga: a comprehensive data set for domain generation algorithms (DGAs) based domain name detection systems and application of deep learning, 455–485
Yadav S, Reddy AKK, Reddy ALN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. IMC ’10, Association for Computing Machinery, New York, NY, USA, pp 48–61. https://doi.org/10.1145/1879141.1879148
Schiavoni S, Maggi F, Cavallaro L, Zanero S (2014) Phoenix: DGA-based botnet tracking and intelligence. In: International conference on detection of intrusions and malware, and vulnerability assessment, Springer, pp 192–211
Zhang P, Liu T, Zhang Y, Ya J, Shi J, Wang Y (2017) Domain watcher: detecting malicious domains based on local and global textual features. Procedia Comput Sci 108:2408–2412
Vranken H, Alizadeh H (2022) Detection of DGA-generated domain names with TF-IDF. Electronics 11(3):414
Schüppen S, Teubert D, Herrmann P, Meyer U (2018) \(\{\)FANCI\(\}\): feature-based automated \(\{\)NXDomain\(\}\) classification and intelligence. In: 27th USENIX Security Symposium (USENIX Security 18), pp 1165–1181
Almashhadani AO, Kaiiali M, Carlin D, Sezer S (2020) Maldomdetector: a system for detecting algorithmically generated domain names with machine learning. Computers & Security 93:101787
GP A, Gladston A (2020) A machine learning framework for domain generating algorithm based malware detection. Secur Priv 3(6):127
Mvula PK, Branco P, Jourdan G-V, Viktor HL (2022) COVID-19 malicious domain names classification. Expert Syst Appl 117553
Cersosimo M, Lara A (2022) Detecting malicious domains using the splunk machine learning toolkit. In: NOMS 2022-2022 IEEE/ifip network operations and management symposium, IEEE, pp 1–6
Zhao H, Chen Z, Yan R (2022) Malicious domain names detection algorithm based on statistical features of urls. In: 2022 IEEE 25th International conference on computer supported cooperative work in design (CSCWD), IEEE, pp 11–16
Sun Y, Jian K, Cui L, Jiang G, Zhang S, Zhang Y, Pei D (2022) Online malicious domain name detection with partial labels for large-scale dependable systems. J Syst Softw 190:111322
Xu C, Shen J, Du X (2019) Detection method of domain names generated by DGAs based on semantic representation and deep neural network. Computers & Security 85:77–88
Qiao Y, Zhang B, Zhang W, Sangaiah AK, Wu H (2019) DGA domain name classification method based on long short-term memory with attention mechanism. Appl Sci 9(20):4205
Yang L, Liu G, Dai Y, Wang J, Zhai J (2020) Detecting stealthy domain generation algorithms using heterogeneous deep neural network framework. IEEE Access 8:82876–82889
Aarthi B, Jeenath Shafana N, Flavia J, Chelliah BJ (2022) A hybrid multiclass classifier approach for the detection of malicious domain names using rnn model, 471–482
Huang X, Li H, Liu J, Liu F, Wang J, Xie B, Chen B, Zhang Q, Xue T (2022) A malicious domain detection model based on improved deep learning. Comput Intell Neurosci 2022
Niu Y, Guan M, Yuan W, Chen Y, Chen L, Yu Q (2022) A Bayesian optimization-based LSTM model for DGA domain name identification approach. In: Journal of Physics: Conference Series, vol. 2303, IOP Publishing, p 012015
Sarojini S, Asha S (2022) Detection for domain generation algorithm (DGA) domain botnet based on neural network with multi-head self-attention mechanisms. Int J Syst Assur Eng Manag 1–16
Zhang W, Gong J, Liu X, Hu X et al (2016) Lightweight domain name detection algorithm based on morpheme features. J Softw 27(9):2348–2364
Buber E, Diri B, Sahingoz OK (2017) NLP based phishing attack detection from URLS. In: International conference on intelligent systems design and applications, Springer, pp 608–618
Yang L, Zhai J, Liu W, Ji X, Bai H, Liu G, Dai Y (2019) Detecting word-based algorithmically generated domains using semantic analysis. Symmetry 11(2):176
Yang L, Liu G, Wang J, Zhai J, Dai Y (2022) A semantic element representation model for malicious domain name detection. J Inf Secur Appl 66:103148
Liang J, Chen S, Wei Z, Zhao S, Zhao W (2022) Hagdetector: heterogeneous DGA domain name detection model. Computers & Security 102803
Wang Z, Guo Y, Montgomery D (2022) Machine learning-based algorithmically generated domain detection. Comput Electr Eng 100:107841
Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M (2021) Algorithmically generated malicious domain names detection based on n-grams features. Expert Syst Appl 170:114551
Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443
Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443
Anderson HS, Woodbridge J, Filar B (2016) Deepdga: adversarially-tuned domain generation and detection. In: Proceedings of the 2016 ACM workshop on artificial intelligence and security, pp 13–21
Peck J, Nie C, Sivaguru R, Grumer C, Olumofin F, Yu B, Nascimento A, De Cock M (2019) Charbot: a simple and effective method for evading DGA classifiers. IEEE Access 7:91759–91771
Sidi L, Nadler A, Shabtai A (2020) Maskdga: an evasion attack against DGA classifiers and adversarial defenses. IEEE Access 8:161580–161592
Yun X, Huang J, Wang Y, Zang T, Zhou Y, Zhang Y (2019) Khaos: an adversarial neural network DGA with high anti-detection ability. IEEE Trans Inf Forensics Secur 15:2225–2240
Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hamroun, C., Amamou, A., Haddadou, K. et al. A review on lexical based malicious domain name detection methods. Ann. Telecommun. 79, 457–473 (2024). https://doi.org/10.1007/s12243-024-01043-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12243-024-01043-3