Abstract
Intrusion detection is a primary concern in any modern computer system due to the ever-growing number of intrusions. Machine learning represents an effective solution to detect and prevent network intrusions. Many existing intrusion detection approaches capitalize on machine learning models learned on the top of individual public datasets and achieve detection accuracy close to 1. These highly performing detectors strongly depend on the training data, which may not be representative of real-life production environments. This paper aims to explore this proposition in the context of denial of service attacks. Different intrusion detectors learned on the top of CICIDS2017 (an established public dataset widely used as a benchmark) are tested against an unseen, although closely related, dataset. The test dataset is based on the same mixture of denial of service attacks in CICIDS2017 and some additional variants. The results indicate that the perfect detection figures obtained in the context of a public dataset may not transfer in practice.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Ahmad, Z., Shahid Khan, A., Wai Shiang, C., Abdullah, J., & Ahmad, F. (2021). Network intrusion detection system: A systematic study of machine learning and deep learning approaches. Transactions on Emerging Telecommunications Technologies, 32, e4150.
Ahmim, A., Maglaras, L., Ferrag, M. A., Derdour, M., & Janicke, H. (2019). A novel hierarchical intrusion detection system based on decision tree and rules-based models. In Proc. International Conference on Distributed Computing in Sensor Systems (pp. 228–233). IEEE.
Ali, O., & Cotae, P. (2018). Towards DoS/DDoS attack detection using artificial neural networks. In Proc. Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (pp. 229–234). IEEE.
Beer, F., Hofer, T., Karimi, D., & Bühler, U. (2017). A new attack composition for network security. In 10. DFN-Forum Kommunikationstechnologien (pp. 11–20). Gesellschaft fur Informatik e.V.
Bowen, T., Poylisher, A., Serban, C., Chadha, R., Jason Chiang, C., & Marvel, L. M. (2016). Enabling reproducible cyber research - Four labeled datasets. In Proc. Military Communications Conference (pp. 539–544). IEEE.
Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., & Villano, U. (2021a). USB-IDS-1: A public multilayer dataset of labeled network flows for IDS evaluation. In Proc. International Conference on Dependable Systems and Networks Workshops (pp. 1–6). IEEE.
Catillo, M., Del Vecchio, A., Pecchia, A., & Villano, U. (2021b). A critique on the use of machine learning on public datasets for intrusion detection. In A. C. R. Paiva, A. R. Cavalli, P. Ventura Martins, & R. Pérez-Castillo (Eds.), Quality of information and communications technology (pp. 253–266). Springer.
Catillo, M., Pecchia, A., Rak, M., & Villano, U. (2021). Demystifying the role of public intrusion datasets: A replication study of DoS network traffic data. Computers & Security, 108, 102341.
Catillo, M., Pecchia, A., & Villano, U. (2022). AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications, 191, 116263.
Engelen, G., Rimmer, V., & Joosen, W. (2021). Troubleshooting an intrusion detection dataset: The CICIDS2017 case study. In Proc. Security and Privacy Workshops (pp. 7–12). IEEE.
Filho, F., Silveira, F., Junior, A., Vargas-Solar, G., & Silveira, L. (2019). Smart detection: An online approach for DoS/DDoS attack detection using machine learning. Security and Communication Networks, 2019, 1574749.
Kayacık, H. G., & Zincir-Heywood, N. (2005). Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In P. Kantor, G. Muresan, F. Roberts, D. D. Zeng, F. Y. Wang, H. Chen, & R. C. Merkle (Eds.), Intelligence and security informatics (pp. 362–367). Springer.
Kenyon, A., Deka, L., & Elizondo, D. (2020). Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets. Computers & Security, 99, 102022.
Kshirsagar, D., & Kumar, S. (2021). An efficient feature reduction method for the detection of DoS attack. ICT Express, 7, 371–375.
Lee, J., Kim, J., Kim, I., & Han, K. (2019). Cyber threat detection based on artificial neural networks using event profiles. IEEE Access, 7, 165607–165626.
Li, X., & Ye, N. (2003). Decision tree classifiers for computer intrusion detection. In Real-time system security (p. 77-93). Nova Science Publishers, Inc.
Liu, H., & Lang, B. (2019). Machine learning and deep learning methods for intrusion detection systems: A survey. Applied Sciences, 9, 4396.
Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., & Therón, R. (2017). UGR’16: A new dataset for the evaluation of cyclostationarity-based network IDSs. Computer & Security, 73, 411–424.
McHugh, J. (2000). Testing Intrusion detection systems: A critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3, 262–294.
Moustafa, N., & Slay, J. (2015). UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proc. Military Communications and Information Systems Conference (pp. 1–6). IEEE.
Nguyen, S., Nguyen, V., Choi, J., & Kim, K. (2018). Design and implementation of intrusion detection system using convolutional neural network for DoS detection. In Proc. International Conference on Machine Learning and Soft Computing (p. 34-38). ACM.
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22, 1345–1359.
Qu, X., Yang, L., Guo, K., Ma, L., Feng, T., Ren, S., & Sun, M. (2019). Statistics-enhanced direct batch growth self-organizing mapping for efficient DoS attack detection. IEEE Access, 7, 78434–78441.
Resende, P. A. A., & Drummond, A. C. (2018). A survey of random forest based methods for intrusion detection systems. ACM Computing Surveys, 51, 48.
Ring, M., Wunderlich, S., Scheuring, D., Landes, D., & Hotho, A. (2019). A survey of network-based intrusion detection data sets. Computer & Security, 86, 147–167.
Sacramento, L., Medeiros, I., Bota, J., & Correia, M. (2018). FlowHacker: Detecting unknown network attacks in big traffic data using network flows. In Proc. International Conference On Trust, Security And Privacy In Computing And Communications / International Conference On Big Data Science And Engineering (pp. 567–572). IEEE.
Sharafaldin, I., Lashkari, A. H., & Ghorbani., A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proc. International Conference on Information Systems Security and Privacy (pp. 108–116). SciTePress.
Shenfield, A., Day, D., & Ayesh, A. (2018). Intelligent intrusion detection systems using artificial neural networks. ICT Express, 4, 95–99.
Silva, J. V. V., Lopez, M. A., & Mattos, D. M. F. (2020). Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset. In Proc. Conference on Cloud and Internet of Things (pp. 1–8). IEEE.
Sommer, R., & Paxson, V. (2010). Outside the closed world: On using machine learning for network intrusion detection. In Proc. Symposium on Security and Privacy (pp. 305–316). IEEE.
Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data set. In Proc. Symposium on Computational Intelligence for Security and Defense Applications (pp. 1–6). IEEE.
Tavallaee, M., Stakhanova, N., & Ghorbani, A. A. (2010). Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications and Reviews), 40, 516–524.
Verkerken, M., D’hooge, L., Wauters, T., Volckaert, B., & De Turck, F. (2021). Towards model generalization for intrusion detection: Unsupervised machine learning techniques. Journal of Network and Systems Management, 30, 12.
Viegas, E. K., Santin, A. O., & Oliveira, L. S. (2017). Toward a reliable anomaly-based intrusion detection in real-world environments. Computer Networks, 127, 200–216.
Wankhede, S., & Kshirsagar, D. (2018). DoS attack detection using machine learning and neural network. In Proc. International Conference on Computing Communication Control and Automation (pp. 1–5). IEEE.
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic.
Acknowledgements
Andrea Del Vecchio contributed to this work at the time he was hosted by the Department of Engineering at the University of Sannio under support by the “Orio Carlini” 2020 GARR Consortium Fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Catillo, M., Del Vecchio, A., Pecchia, A. et al. Transferability of machine learning models learned from public intrusion detection datasets: the CICIDS2017 case study. Software Qual J 30, 955–981 (2022). https://doi.org/10.1007/s11219-022-09587-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-022-09587-0