A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection | SpringerLink
Skip to main content

A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection

  • Conference paper
  • First Online:
Quality of Information and Communications Technology (QUATIC 2021)

Abstract

Intrusion detection has become an open challenge in any modern ICT system due to the ever-growing urge towards assuring security of present day networks. Various machine learning methods have been proposed for finding an effective solution to detect and prevent network intrusions. Many approaches, tuned and tested by means of public datasets, capitalize on well-known classifiers, which often reach detection accuracy close to 1. However, these results strongly depend on the training data, which may not be representative of real production environments and ever-evolving attacks. This paper is an initial exploration around this problem. After having learned a detector on the top of a public intrusion detection dataset, we test it against held-out data not used for learning and additional data gathered by attack emulation in a controlled network. The experiments presented are focused on Denial of Service attacks and based on the CICIDS2017 dataset. Overall, the figures gathered confirm that results obtained in the context of synthetic datasets may not generalize in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 13727
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 17159
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

  2. 2.

    https://www.unb.ca/cic/datasets/nsl.html.

  3. 3.

    https://www.unb.ca/cic/datasets/ids-2017.html.

  4. 4.

    https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/.

  5. 5.

    https://www.ixiacom.com/products/perfectstorm.

  6. 6.

    https://cve.mitre.org.

  7. 7.

    https://nesg.ugr.es/nesg-ugr16/.

  8. 8.

    https://www2.hs-fulda.de/NDSec/NDSec-1/Files/.

  9. 9.

    https://www.netresec.com/?page=ACS_MILCOM_2016.

  10. 10.

    https://secplab.ppgia.pucpr.br/?q=trabid.

  11. 11.

    https://httpd.apache.org/docs/2.4/mod/mod_reqtimeout.html.

  12. 12.

    https://github.com/httperf/httperf.

  13. 13.

    http://it.archive.ubuntu.com/ubuntubionic-updates/main amd64 Packages.

  14. 14.

    http://idsdata.ding.unisannio.it/.

  15. 15.

    https://github.com/gkbrk/slowloris.

  16. 16.

    https://github.com/ahlashkari/CICFlowMeter.

  17. 17.

    https://scikit-learn.org/stable/.

  18. 18.

    https://pypi.org/project/hypopt/.

References

  1. Ahmim, A., Maglaras, L., Ferrag, M.A., Derdour, M., Janicke, H.: A novel hierarchical intrusion detection system based on decision tree and rules-based models. In: Proceedings of International Conference on Distributed Computing in Sensor Systems, pp. 228–233 (2019)

    Google Scholar 

  2. Ali, O., Cotae, P.: Towards DoS/DDoS attack detection using artificial neural networks. In: Proceedings of 9th IEEE Annual Ubiquitous Computing, Electronics Mobile Communication Conference, pp. 229–234 (2018)

    Google Scholar 

  3. Beer, F., Hofer, T., Karimi, D., Bühler, U.: A new attack composition for network security. In: DFN-Forum Kommunikationstechnologien, pp. 11–20. Gesellschaft für Informatik e.V. (2017)

    Google Scholar 

  4. Bowen, T., Poylisher, A., Serban, C., Chadha, R., Jason Chiang, C., Marvel, L.M.: Enabling reproducible cyber research - four labeled datasets. In: Proceedings of Military Communications Conference, pp. 539–544. IEEE (2016)

    Google Scholar 

  5. Catillo, M., Del Vecchio, A., Ocone, L., Pecchia, A., Villano, U.: USB-IDS-1: a public multilayer dataset of labeled network flows for IDS evaluation. In: Proceedings of International Conference on Dependable Systems and Networks - Supplemental Volume. IEEE (2021)

    Google Scholar 

  6. Catillo, M., Pecchia, A., Rak, M., Villano, U.: A case study on the representativeness of public DoS network traffic data for cybersecurity research. In: Proceedings of International Conference on Availability, Reliability and Security, pp. 1–10, Art. no. 6. ACM (2020)

    Google Scholar 

  7. Catillo, M., Rak, M., Villano, U.: 2L-ZED-IDS: a two-level anomaly detector for multiple attack classes. In: Barolli, L., Amato, F., Moscato, F., Enokido, T., Takizawa, M. (eds.) WAINA 2020. AISC, vol. 1150, pp. 687–696. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44038-1_63

    Chapter  Google Scholar 

  8. Catillo, M., Pecchia, A., Rak, M., Villano, U.: Demystifying the role of public intrusion datasets: a replication study of DoS network traffic data. Comput. Secur. 102341 (2021)

    Google Scholar 

  9. Kayacık, H.G., Zincir-Heywood, N.: Analysis of three intrusion detection system benchmark datasets using machine learning algorithms. In: Kantor, P., et al. (eds.) ISI 2005. LNCS, vol. 3495, pp. 362–367. Springer, Heidelberg (2005). https://doi.org/10.1007/11427995_29

    Chapter  Google Scholar 

  10. Kenyon, A., Deka, L., Elizondo, D.: Are public intrusion datasets fit for purpose characterising the state of the art in intrusion event datasets. Comput. Secur. 99, 102022 (2020)

    Article  Google Scholar 

  11. Kshirsagar, D., Kumar, S.: An efficient feature reduction method for the detection of DoS attack. ICT Express (2021)

    Google Scholar 

  12. Lashkari, A.H., Gil, G.D., Mamun, M.S.I., Ghorbani, A.A.: Characterization of Tor traffic using time based features. In: Proceedings of International Conference on Information Systems Security and Privacy, pp. 253–262 (2017)

    Google Scholar 

  13. Lee, J., Kim, J., Kim, I., Han, K.: Cyber threat detection based on artificial neural networks using event profiles. IEEE Access 7, 165607–165626 (2019)

    Article  Google Scholar 

  14. Liu, H., Lang, B.: Machine learning and deep learning methods for intrusion detection systems: a survey. Appl. Sci. 9(20), 4396 (2019)

    Article  Google Scholar 

  15. Maciá-Fernández, G., Camacho, J., Magán-Carrión, R., García-Teodoro, P., Therón, R.: UGR’16: a new dataset for the evaluation of cyclostationarity-based network IDSs. Comput. Secur. 73, 411–424 (2017)

    Article  Google Scholar 

  16. McHugh, J.: Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Trans. Inf. Syst. Secur. 3(4), 262–294 (2000)

    Article  Google Scholar 

  17. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Proceedings of Military Communications and Information Systems Conference, pp. 1–6. IEEE (2015)

    Google Scholar 

  18. Ring, M., Wunderlich, S., Scheuring, D., Landes, D., Hotho, A.: A survey of network-based intrusion detection data sets. Comput. Secur. 86, 147–167 (2019)

    Article  Google Scholar 

  19. Sharafaldin, I., Lashkari, A.H., Ghorbani., A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of International Conference on Information Systems Security and Privacy, pp. 108–116. SciTePress (2018)

    Google Scholar 

  20. Silva, J.V.V., Lopez, M.A., Mattos, D.M.F.: Attackers are not stealthy: Statistical analysis of the well-known and infamous KDD network security dataset. In: Proceedings of Conference on Cloud and Internet of Things, pp. 1–8 (2020)

    Google Scholar 

  21. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the KDD CUP 99 data set. In: Proceedings of Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. IEEE (2009)

    Google Scholar 

  22. Tavallaee, M., Stakhanova, N., Ghorbani, A.A.: Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 40(5), 516–524 (2010)

    Article  Google Scholar 

  23. Viegas, E.K., Santin, A.O., Oliveira, L.S.: Toward a reliable anomaly-based intrusion detection in real-world environments. Comput. Netw. 127(C), 200–216 (2017)

    Article  Google Scholar 

  24. Wankhede, S., Kshirsagar, D.: DoS attack detection using machine learning and neural network. In: Proceedings of 4th International Conference on Computing Communication Control and Automation, pp. 1–5 (2018)

    Google Scholar 

Download references

Acknowledgment

Andrea Del Vecchio gratefully acknowledges support by the “Orio Carlini” 2020 GARR Consortium Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta Catillo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Catillo, M., Del Vecchio, A., Pecchia, A., Villano, U. (2021). A Critique on the Use of Machine Learning on Public Datasets for Intrusion Detection. In: Paiva, A.C.R., Cavalli, A.R., Ventura Martins, P., Pérez-Castillo, R. (eds) Quality of Information and Communications Technology. QUATIC 2021. Communications in Computer and Information Science, vol 1439. Springer, Cham. https://doi.org/10.1007/978-3-030-85347-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85347-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85346-4

  • Online ISBN: 978-3-030-85347-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics