A review on lexical based malicious domain name detection methods | Annals of Telecommunications Skip to main content
Log in

A review on lexical based malicious domain name detection methods

  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

Nowadays, domain names are becoming crucial digital assets for any business. However, the media never stopped reporting phishing and identity theft attacks held by third-party entities that rely on domain names to mislead Internet users. Thus, Palo Alto Networks revealed in their studies 20 largely cyber-squatted domain names targeting popular brands. Based on their behavior, domain names appear in public lists that objectively evaluate their reputation. Blacklists contain domain names that have previously committed suspicious acts, whereas whitelists include the most popular and trustworthy domain names. For a long time, this listing technique has been used as a reactive approach to counter domain name-based attacks. However, it suffers from the limitation of responding late to attacks. Nowadays, techniques tend to be much more proactive. They operate before any attack occurs. As part of the CSNET conference, we published a short paper that describes a plethora of domain name attacks and their associated detection techniques using their lexical features (Hamroun et al. 2022). In this paper, we present an extended version of the original one which discusses the previously mentioned points in more detail and adds some elements of understanding when it comes to malicious domain name detection. Hence, we provide a literature review of malicious domain name detection techniques that use only the lexical features of domain names. These features are available, privacy-preserving, and highly improve detection results. The review covers recent works that report relevant performance categorized according to a new taxonomy. Moreover, we introduce a new criterion for comparing all the existing works based on targeted maliciousness type before discussing the limitations and the newly emerging research directions in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. Also known as context-free, textual, semantic, statistical, or linguistic features.

  2. Example from: https://unit42.paloaltonetworks.com/cybersquatting/.

  3. https://www.unb.ca/cic/datasets/dns-2021.html

  4. The process of splitting a domain name into a set of co-occurring character sequences by advancing N characters each time

  5. https://github.com/mitchellkrogza/Phishing.Database

  6. https://github.com/no-cmyk/Search-Engine-Spam-Blocklist

References

  1. Hamroun C, Amamou A, Haddadou K, Haroun H, Pujolle G (2022) A review on lexical based malicious domain name detection methods. In: 2022 6th Cyber security in networking conference (CSNet), IEEE, pp 1–7

  2. Domain names - implementation and specification. RFC Editor (1987). https://doi.org/10.17487/RFC1035. https://rfc-editor.org/rfc/rfc1035.txt

  3. Zhao H, Chang Z, Bao G, Zeng X (2019) Malicious domain names detection algorithm based on n-gram. J. Comp Netw Commun 2019

  4. Zago M, Gil Perez M, Martinez Perez G (2020) Scalable detection of botnets based on DGA. Soft Comput 24(8):5517–5537

    Article  Google Scholar 

  5. Plohmann D, Yakdan K, Klatt M, Bader J, Gerhards-Padilla E (2016) A comprehensive measurement study of domain generating malware. In: 25th USENIX Security Symposium (USENIX Security 16), USENIX Association, Austin, TX, pp 263–278. https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/plohmann

  6. Zhauniarovich Y, Khalil I, Yu T, Dacier M (2018) A survey on malicious domains detection through DNS data analysis. ACM Comput Surv 51(4):1–36

    Article  Google Scholar 

  7. Fasllija E, Enişer HF, Prünster B (2019) Phish-hook: detecting phishing certificates using certificate transparency logs. In: International conference on security and privacy in communication systems, Springer, pp 320–334

  8. Moubayed A, Aqeeli E, Shami A (2021) Detecting DNS typo-squatting using ensemble-based feature selection & classification models. IEEE Can J Electr Comput Eng 44(4):456–466. https://doi.org/10.1109/ICJECE.2021.3072008

    Article  Google Scholar 

  9. Dinaburg A (2011) Bitsquatting: DNS hijacking without exploitation. Proceedings of BlackHat Security

  10. Nikiforakis N, Van Acker S, Meert W, Desmet L, Piessens F, Joosen W. Bitsquatting: exploiting bit-flips for fun, or profit? In: Proceedings of the 22nd international conference on world wide web. WWW ’13, Association for Computing Machinery, New York, NY, USA, pp 989–998. https://doi.org/10.1145/2488388.2488474

  11. Kintis P, Miramirkhani N, Lever C, Chen Y, Romero-Gómez R, Pitropakis N, Nikiforakis N, Antonakakis M (2017) Hiding in plain sight: a longitudinal study of combosquatting abuse. In: Proceedings of the 2017 ACM SIGSAC conference on computer and communications security. CCS ’17, Association for Computing Machinery, New York, NY, USA, pp 569–586. https://doi.org/10.1145/3133956.3134002

  12. Du K, Yang H, Li Z, Duan H, Hao S, Liu B, Ye Y, Liu M, Su X, Liu G et al (2019) Tl; dr hazard: a comprehensive study of levelsquatting scams. In: International Conference on security and privacy in communication systems, Springer, pp 3–25

  13. Rossow C, Dietrich CJ, Grier C, Kreibich C, Paxson V, Pohlmann N, Bos H, Steen MV (2012) Prudent practices for designing malware experiments: status quo and outlook. In: 2012 IEEE Symposium on Security and Privacy, pp 65–79. https://doi.org/10.1109/SP.2012.14

  14. Selvi J, Rodriguez RJ, Soria-Olivas E (2019) Detection of algorithmically generated malicious domain names using masked n-grams. Expert Syst Appl 124:156–163

    Article  Google Scholar 

  15. Zago M, Perez MG, Perez GM (2020) UMUDGA: a dataset for profiling DGA-based botnet. Computers & Security 92:101719

    Article  Google Scholar 

  16. Suryotrisongko H (2020) Botnet DGA dataset. https://doi.org/10.21227/rg6z-z622

  17. Le Pochat V, Van Goethem T, Tajalizadehkhoob S, Korczyński M, Joosen W (2019) Tranco: a research-oriented top sites ranking hardened against manipulation. In: Proceedings of the 26th annual network and distributed system security symposium. NDSS 2019. https://doi.org/10.14722/ndss.2019.23386

  18. Vinayakumar R, Soman K, Poornachandran P, Alazab M, Thampi S (2019) Amritadga: a comprehensive data set for domain generation algorithms (DGAs) based domain name detection systems and application of deep learning, 455–485

  19. Yadav S, Reddy AKK, Reddy ALN, Ranjan S (2010) Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. IMC ’10, Association for Computing Machinery, New York, NY, USA, pp 48–61. https://doi.org/10.1145/1879141.1879148

  20. Schiavoni S, Maggi F, Cavallaro L, Zanero S (2014) Phoenix: DGA-based botnet tracking and intelligence. In: International conference on detection of intrusions and malware, and vulnerability assessment, Springer, pp 192–211

  21. Zhang P, Liu T, Zhang Y, Ya J, Shi J, Wang Y (2017) Domain watcher: detecting malicious domains based on local and global textual features. Procedia Comput Sci 108:2408–2412

    Article  Google Scholar 

  22. Vranken H, Alizadeh H (2022) Detection of DGA-generated domain names with TF-IDF. Electronics 11(3):414

    Article  Google Scholar 

  23. Schüppen S, Teubert D, Herrmann P, Meyer U (2018) \(\{\)FANCI\(\}\): feature-based automated \(\{\)NXDomain\(\}\) classification and intelligence. In: 27th USENIX Security Symposium (USENIX Security 18), pp 1165–1181

  24. Almashhadani AO, Kaiiali M, Carlin D, Sezer S (2020) Maldomdetector: a system for detecting algorithmically generated domain names with machine learning. Computers & Security 93:101787

    Article  Google Scholar 

  25. GP A, Gladston A (2020) A machine learning framework for domain generating algorithm based malware detection. Secur Priv 3(6):127

    Article  Google Scholar 

  26. Mvula PK, Branco P, Jourdan G-V, Viktor HL (2022) COVID-19 malicious domain names classification. Expert Syst Appl 117553

  27. Cersosimo M, Lara A (2022) Detecting malicious domains using the splunk machine learning toolkit. In: NOMS 2022-2022 IEEE/ifip network operations and management symposium, IEEE, pp 1–6

  28. Zhao H, Chen Z, Yan R (2022) Malicious domain names detection algorithm based on statistical features of urls. In: 2022 IEEE 25th International conference on computer supported cooperative work in design (CSCWD), IEEE, pp 11–16

  29. Sun Y, Jian K, Cui L, Jiang G, Zhang S, Zhang Y, Pei D (2022) Online malicious domain name detection with partial labels for large-scale dependable systems. J Syst Softw 190:111322

    Article  Google Scholar 

  30. Xu C, Shen J, Du X (2019) Detection method of domain names generated by DGAs based on semantic representation and deep neural network. Computers & Security 85:77–88

    Article  Google Scholar 

  31. Qiao Y, Zhang B, Zhang W, Sangaiah AK, Wu H (2019) DGA domain name classification method based on long short-term memory with attention mechanism. Appl Sci 9(20):4205

    Article  Google Scholar 

  32. Yang L, Liu G, Dai Y, Wang J, Zhai J (2020) Detecting stealthy domain generation algorithms using heterogeneous deep neural network framework. IEEE Access 8:82876–82889

    Article  Google Scholar 

  33. Aarthi B, Jeenath Shafana N, Flavia J, Chelliah BJ (2022) A hybrid multiclass classifier approach for the detection of malicious domain names using rnn model, 471–482

  34. Huang X, Li H, Liu J, Liu F, Wang J, Xie B, Chen B, Zhang Q, Xue T (2022) A malicious domain detection model based on improved deep learning. Comput Intell Neurosci 2022

  35. Niu Y, Guan M, Yuan W, Chen Y, Chen L, Yu Q (2022) A Bayesian optimization-based LSTM model for DGA domain name identification approach. In: Journal of Physics: Conference Series, vol. 2303, IOP Publishing, p 012015

  36. Sarojini S, Asha S (2022) Detection for domain generation algorithm (DGA) domain botnet based on neural network with multi-head self-attention mechanisms. Int J Syst Assur Eng Manag 1–16

  37. Zhang W, Gong J, Liu X, Hu X et al (2016) Lightweight domain name detection algorithm based on morpheme features. J Softw 27(9):2348–2364

    MathSciNet  Google Scholar 

  38. Buber E, Diri B, Sahingoz OK (2017) NLP based phishing attack detection from URLS. In: International conference on intelligent systems design and applications, Springer, pp 608–618

  39. Yang L, Zhai J, Liu W, Ji X, Bai H, Liu G, Dai Y (2019) Detecting word-based algorithmically generated domains using semantic analysis. Symmetry 11(2):176

    Article  Google Scholar 

  40. Yang L, Liu G, Wang J, Zhai J, Dai Y (2022) A semantic element representation model for malicious domain name detection. J Inf Secur Appl 66:103148

    Google Scholar 

  41. Liang J, Chen S, Wei Z, Zhao S, Zhao W (2022) Hagdetector: heterogeneous DGA domain name detection model. Computers & Security 102803

  42. Wang Z, Guo Y, Montgomery D (2022) Machine learning-based algorithmically generated domain detection. Comput Electr Eng 100:107841

    Article  Google Scholar 

  43. Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M (2021) Algorithmically generated malicious domain names detection based on n-grams features. Expert Syst Appl 170:114551

    Article  Google Scholar 

  44. Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443

    Article  Google Scholar 

  45. Fu Y, Yu L, Hambolu O, Ozcelik I, Husain B, Sun J, Sapra K, Du D, Beasley CT, Brooks RR (2017) Stealthy domain generation algorithms. IEEE Trans Inf Forensics Secur 12(6):1430–1443

    Article  Google Scholar 

  46. Anderson HS, Woodbridge J, Filar B (2016) Deepdga: adversarially-tuned domain generation and detection. In: Proceedings of the 2016 ACM workshop on artificial intelligence and security, pp 13–21

  47. Peck J, Nie C, Sivaguru R, Grumer C, Olumofin F, Yu B, Nascimento A, De Cock M (2019) Charbot: a simple and effective method for evading DGA classifiers. IEEE Access 7:91759–91771

  48. Sidi L, Nadler A, Shabtai A (2020) Maskdga: an evasion attack against DGA classifiers and adversarial defenses. IEEE Access 8:161580–161592

    Article  Google Scholar 

  49. Yun X, Huang J, Wang Y, Zang T, Zhou Y, Zhang Y (2019) Khaos: an adversarial neural network DGA with high anti-detection ability. IEEE Trans Inf Forensics Secur 15:2225–2240

    Article  Google Scholar 

  50. Hunter JD (2007) Matplotlib: a 2d graphics environment. Comput Sci Eng 9(3):90–95

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cherifa Hamroun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hamroun, C., Amamou, A., Haddadou, K. et al. A review on lexical based malicious domain name detection methods. Ann. Telecommun. 79, 457–473 (2024). https://doi.org/10.1007/s12243-024-01043-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12243-024-01043-3

Keywords

Navigation