Abstract
Every website on the Internet is somewhat vulnerable to security attacks. These attacks are constantly changing, and it is challenging to detect the latest, not known attacks. Our goal is automation of attack detection by incremental learning of the latest types of attacks. We have placed web traps around the Internet in a way that regular users cannot find and interact with them, while they are visible to standard hacker tools and methods. Consequently, we obtain continuous information about new types of attacks, contrary to most datasets from the literature created in artificial settings. In this paper, for the purpose of effective web attack detection without many false positives, we propose an efficient way to create a dataset by combining malicious requests from the traps and benign requests from a regular website. Since our goal is automation, we tested a significant number of shallow and deep machine learning models to separate regular from malicious HTTP requests, using only simple features, such as n-grams of characters. Additionally to our dataset, we have evaluated all the models on the large publicly available FWAF dataset. We also conducted model testing on zero-day attacks, in which training and validation requests were collected in separate time intervals. One of the biggest problems in machine learning is catastrophic forgetting. When training on new data, the model forgets the knowledge learned from previous examples. To mitigate that problem, we have implemented three incremental learning approaches for web attack detection and obtained good results during testing.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Code Availability
We will make our code available immediately upon acceptance of the paper. Matplotlib and Seaborn were used to create the artwork.
Notes
We had a discussion if standard HTTP commands, defined by the HTTP protocol (GET, POST, PUT, OPTIONS, etc.), should be used for tokenization or not. After a thorough analyses of malicious requests, we have discovered that attackers are using custom made HTTP commands, not defined by the HTTP protocol. Therefore, we decided to include HTTP commands in requests of our TBWIDD dataset, as it will allow detection of remote commands executed by attackers and their control of web instances.
abbreviation for the minimum number of occurrences
References
Jung H, et al. (2018) Less-forgetful learning for domain expansion in deep neural networks Thirty-Second AAAI Conference on Artificial Intelligence
Brown S, Lam R, Prasad S, Ramasubramanian S, Slauson J (2012) Honeypots in the cloud. University of Wisconsin-Madison, p 11
Saadi C, Chaoui H (2016) Cloud computing security using ids-am-clust, honeyd, honeywall and honeycomb, vol 85
Kondra JR, Bharti SK, Mishra SK, Babu KS (2016) Honeypot-based intrusion detection system: a performance analysis. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp. 2347–2351. IEEE
Ghourabi A, Abbes T, Bouhoula A (2014) Characterization of attacks collected from the deployment of Web service honeypot. Secur. Commun. Netw. 7(2):338–351
Matin IMM, Rahardjo B (2019) Malware detection using honeypot and machine learning. In: 2019 7th international conference on cyber and IT service management (CITSM), vol. 7, pp. 1–4. IEEE
Han X, Kheir N, Balzarotti D (2018) Deception techniques in computer security: a research perspective. ACM Computing Surveys (CSUR) 51(4):1–36
Lippmann R, Cunningham RK, Fried DJ, Graf I, Kendall KR, Webster SE, Zissman MA (1999) Results of the DARPA 1998 offline intrusion detection evaluation. In recent advances in intrusion detection, 99, pp 829–835
KDD Cup (1999) Intrusion detection dataset, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the kdd cup 99 data set. In: IEEE symposium on computational intelligence for security and defense applications, pp. 1–6. IEEE 2009
Shiravi A, Shiravi H, Tavallaee M, Ghorbani AA (2012) Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security, vol. 31
Kolias C, Kambourakis G, Stavrou A, Gritzalis S (2015) Intrusion detection in 802.11 networks: empirical evaluation of threats and a public dataset. IEEE Commun. Surv. Tutor. 18:184–208. https://doi.org/10.1109/COMST.2015.2402161
Moustafa N, Slay J (2015) Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set), in 2015 Military Communications and Information Systems Conference (milCIS), pp. 1–6 IEEE
Sharafaldin I, Habibi Lashkari A, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization, in proc of ICISSP
Wang W, Sheng Y, Wang J, Zeng X, Ye X, Huang Y, Zhu M (2017) HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 6:1792–1806
S. Schmidhuber J, Hochreiter S (1997) Long short-term memory, vol 9
Wu P, Guo H (2019) LuNET: a deep neural network for network intrusion detection. In: 2019 IEEE symposium series on computational intelligence (SSCI), pp. 617–624. IEEE
Wu P, Guo H, Moustafa N (2020) Pelican: A deep residual network for network intrusion detection. In: 2020 50th annual IEEE/IFIP international conference on dependable systems and networks workshops (DSN-W), pp. 55–62. IEEE
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Kasongo SM, Sun Y (2020) A deep long short-term memory based classifier for wireless intrusion detection system. ICT Express 6(2):98–103
Kasongo SM, Sun Y (2021) A deep gated recurrent unit based model for wireless intrusion detection system. ICT Express 7(1):81–87
Andalib A, Vakili VT (2020) An autonomous intrusion detection system using an ensemble of advanced learners. In: 2020 28th iranian conference on electrical engineering (ICEE), pp. 1–5. IEEE
Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 workshop on deep learning, 2014
Agarap AFM (2018) A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In: Proceedings of the 2018 10th International Conference on Machine Learning and Computing
Kanimozhi V, Prem Jacob T (2019) Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express 5(3):211–214
Kanimozhi V, Prem Jacob T (2020) Artificial Intelligence outflanks all other machine learning classifiers in Network Intrusion Detection System on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express
Rawat R, Shrivastav SK (2012) SQL injection attack Detection using SVM. Int. J. Comput. Appl. 42(13):1–4
Mohammadi B, Sabokrou M (2019) End-to-end adversarial learning for intrusion detection in computer networks IEEE 44th Conference on Local Computer Networks (LCN). IEEE 2019
Zhang Y, Zhang Y, Zhang N, Xiao M (2020) A network intrusion detection method based on deep learning with higher accuracy. Procedia Comput. Sci. 174:50–54
Almseidin M, et al. (2017) Evaluation of machine learning algorithms for intrusion detection system. 2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) IEEE
Farnaaz N, Jabbar MA (2016) Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 89:213–217
Rong W, Zhang B, Lv X (2019) Malicious web request detection using character-level CNN. International Conference on Machine Learning for Cyber Security. Springer, Cham
Ito M, Iyatomi H (2018) Web application firewall using character-level convolutional neural network. IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA). IEEE 2018
Zhang M, et al. (2017) A deep learning method to detect web attacks using a specially designed CNN. International Conference on Neural Information Processing. Springer Cham
Liang J, Zhao W, Ye W (2017) Anomaly-based web attack detection: a deep learning approach Proceedings of the 2017 VI. International Conference on Network, Communication and Computing
Burbeck K, Nadjm-Tehrani S (2007) Adaptive real-time anomaly detection with incremental clustering. Inf. Secur. Tech. Rep. 12(1):56–67
Ifzarne S, Tabbaa H, Hafidi I, Lamghari N (2021) Anomaly Detection Using Machine Learning Techniques in Wireless Sensor Networks. J. Phys. Conf. Ser. 1743:012021
Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for Non-Strongly convex composite objectives NIPS
Crammer K, Dekel O, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive-aggressive algorithms. J. Mach. Learn. Res. 7:551–585
Zhou Y, Cheng G, Jiang S, Dai M (2020) Building an efficient intrusion detection system based on feature selection and ensemble classifier. Computer Networks, 107247
Kim Y (2014) Convolutional Neural Networks for Sentence Classification. In: Proceedings of the Conference on empirical methods in natural language processing (EMNLP). Association for computational linguistics, 2014. https://doi.org/10.3115/v1/d14-1181
Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Soft (TOMS) 11(1):37–57
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M et al (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12:2825–2830
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An Imperative Style, High-Performance Deep Learning Library neurIPS
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Availability of Data and Material
We have made our TBWIDD dataset available, and put the link to it in the manuscript.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Stevanović, N., Todorović, B. & Todorović, V. Web attack detection based on traps. Appl Intell 52, 12397–12421 (2022). https://doi.org/10.1007/s10489-021-03077-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03077-9