Abstract
The application of anomaly detection approaches to network intrusion detection in real scenarios is difficult. The ability of techniques such as deep learning to estimate new data representations with higher levels of abstraction can be useful to address data analysis of network traffic data. For that reason, the performance of different anomaly detection techniques on feature representations obtained by an autoencoder and a variational autoencoder is compared. We have employed a variety of well-known anomaly detection algorithms, which addresses intrusion detection as a semi-supervised problem where patterns that deviate from a baseline model, estimated only from normal traffic, are labelled as anomalous. Furthermore, this assessment is performed on four publicly available benchmarks. The results show that the effect of feature representation on performance is highly dependent on the anomaly detection technique.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmed M, Mahmood AN, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31
Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Angelov P (2014) Anomaly detection based on eccentricity analysis. In: 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), pp 1–8, https://doi.org/10.1109/EALS.2014.7009497
Angelov PP, Gu X (2019) Anomaly detection-empirical approach. Springer International Publishing, Cham, pp 157–173
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Bhuyan MH, Bhattacharyya DK, Kalita JK (2014) Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor 16(1):303–336
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM Sigmod Record ACM 29:93–104
Buczak AL, Guven E (2015) A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Commun Surv Tutor 18(2):1153–1176
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection—a survey. ACM Comput Surv 41(3):15:1–15:44. https://doi.org/10.1145/1541880.1541882
Chen Y, Li Y, Cheng XQ, Guo L (2006) Survey and taxonomy of feature selection algorithms in intrusion detection system. In: International conference on information security and cryptology, Springer, New York, pp 153–167
Erfani SM, Rajasegarar S, Karunasekera S, Leckie C (2016) High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit 58:121–134
Garcia-Teodoro P, Diaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28(1):18–28
Giap CN, Son LH, Chiclana F (2018) Dynamic structural neural network. J Intell Fuzzy Syst 34(4):2479–2490
Goldstein M, Dengel A (2012) Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm. In: Wölfl S (ed) Poster and Demo Track of the 35th German Conference on Artificial Intelligence (KI-2012), pp 59–63
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT press, Cambridge
Gu X, Angelov P (2017) Autonomous anomaly detection. In: 2017 evolving and adaptive intelligent systems (EAIS), pp 1–8, https://doi.org/10.1109/EAIS.2017.7954831
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(1):1157–1182
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323
Javaid A, Niyaz Q, Sun W, Alam M (2016) A deep learning approach for network intrusion detection system. In: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), ICST, pp 21–26
Jolliffe I (2011) Principal component analysis. Springer, New York
Khammassi C, Krichen S (2017) A GA-LR wrapper approach for feature selection in network intrusion detection. Comput Secur 70:255–277
Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16(4):507–521
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. CoRR abs/1412.6980,
Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv preprint arXiv:13126114
Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer Science & Business Media, New York
Liu FT, Ting KM, hua Zhou Z (2008) Isolation forest. In: In ICDM ’08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. IEEE Computer Society, pp 413–422
Lvd Maaten, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Madhawa S, Balakrishnan P, Arumugam U (2018) Employing invariants for anomaly detection in software defined networking based industrial internet of things. J Intell Fuzzy Syst (Preprint):1–13
Mahoney MV, Chan PK (2003) An analysis of the 1999 DARPA/Lincoln laboratory evaluation data for network anomaly detection. In: International Workshop on Recent Advances in Intrusion Detection, Springer, New York, pp 220–237
Marir N, Wang H, Feng G, Li B, Jia M (2018) Distributed abnormal behavior detection approach based on deep belief network and ensemble svm using spark. IEEE Access 6:59657–59671
Martins RS, Angelov P, Sielly Jales Costa B (2018) Automatic detection of computer network traffic anomalies based on eccentricity analysis. In: 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp 1–8, 10.1109/FUZZ-IEEE.2018.8491507
McHugh J (2000) Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln laboratory. ACM Trans Inf Syst Secur (TISSEC) 3(4):262–294
Mirsky Y, Doitshman T, Elovici Y, Shabtai A (2018) Kitsune: an ensemble of autoencoders for online network intrusion detection. arXiv preprint arXiv:180209089
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), 2015, IEEE, pp 1–6
Muda Z, Yassin W, Sulaiman M, Udzir NI et al (2011) A k-means and naive bayes learning approach for better intrusion detection. Inf Technol J 10(3):648–655
Nguyen MN, Vien NA (2018) Scalable and interpretable one-class SVMs with deep learning and random fourier features. arXiv preprint arXiv:180404888
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Pérez D, Alonso S, Morán A, Prada MA, Fuertes JJ, Domínguez M (2019) Comparison of network intrusion detection performance using feature representation. In: Macintyre J, Iliadis L, Maglogiannis I, C J (eds) International Conference on Engineering Applications of Neural Networks. Communications in Computer and Information Science, vol. 1000, Springer, pp 463–475
Rezende DJ, Mohamed S, Wierstra D (2014) Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:14014082
Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167
Ringberg H, Soule A, Rexford J, Diot C (2007) Sensitivity of PCA for traffic anomaly detection. ACM SIGMETRICS Perform Eval Rev ACM 35:109–120
Rousseeuw PJ, Driessen KV (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3):212–223
Rubio JdJ, Cruz D, Elias Barrón I, Ochoa G, Balcazarand R, Aguilar A (2019) ANFIS system for classification of brain signals. J Intell Fuzzy Syst 37:4033–4041. https://doi.org/10.3233/JIFS-190207
Schölkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSP, pp 108–116
Sommer R, Paxson V (2010) Outside the closed world: On using machine learning for network intrusion detection. In: 2010 IEEE symposium on security and privacy, IEEE, pp 305–316
Song J, Takakura H, Okabe Y, Eto M, Inoue D, Nakao K (2011) Statistical analysis of honeypot data and building of Kyoto 2006+ dataset for NIDS evaluation. In: Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, ACM, pp 29–36
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the Second IEEE Symposium on Computational Intelligence for Security and Defence Applications 2009
Vinayakumar R, Alazab M, Soman K, Poornachandran P, Al-Nemrat A, Venkatraman S (2019) Deep learning approach for intelligent intrusion detection system. IEEE Access 7:41525–41550
Wang K, Stolfo SJ (2004) Anomalous payload-based network intrusion detection. In: International Workshop on Recent Advances in Intrusion Detection, Springer, pp 203–222
Yousefi-Azar M, Varadharajan V, Hamey L, Tupakula U (2017) Autoencoder-based feature learning for cyber security applications. In: 2017 International joint conference on neural networks (IJCNN), IEEE, pp 3854–3861
Zhang Z, Li J, Manikopoulos C, Jorgenson J, Ucles J (2001) HIDE: a hierarchical network intrusion detection system using statistical preprocessing and neural network classification. In: Proc. IEEE Workshop on Information Assurance and Security, pp 85–90
Funding
This study was funded by Junta de Castilla y León (ES) (Grant No. LE045P17).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Omitted for double-blind reviewing.
Rights and permissions
About this article
Cite this article
Pérez, D., Alonso, S., Morán, A. et al. Evaluation of feature learning for anomaly detection in network traffic. Evolving Systems 12, 79–90 (2021). https://doi.org/10.1007/s12530-020-09342-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-020-09342-5