Abstract
The chapter presents an approach to cybersecurity data analysis based on the combination of a set of machine learning methods and Big Data technologies for network attack and anomaly detection. The approach is characterized by several layers of data processing, including extraction and decomposition of datasets, compression of feature vectors, training, and classification. To reduce the dimension of the analyzed feature vectors, principal component analysis is applied. Various binary classifiers are used for analyzing the input vector using principal component analysis: support vector machine, k-nearest neighbors, Gaussian naïve Bayes, artificial neural network, and decision tree. In order to increase the precision of attack detection, it is proposed to combine these classifiers into a single weighted ensemble. This is constructed on the basis of weighted voting, soft voting, AdaBoost, and majority voting. Two different architectures of the distributed intrusion detection system based on Big Data technologies are used. In the first, parallel data processing is achieved by splitting data into several non-intersecting subsets, and a separate parallel thread is assigned to each of the formed chunks. In the second, several client-sensors and a server-collector are used, where each sensor contains several network analyzers and a balancer. The efficiency of the suggested approach for network attack and anomaly detection is experimentally evaluated using two different datasets: a dataset with Internet of Things traffic including several kinds of different classes of attacks; and a dataset with computer network traffic containing host scanning and DDoS attacks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge, MA, USA
Arslan B, Gunduz S, Sagiroglu S (2016) A review on mobile threats and machine learning based detection approaches. In: Bayrak C, Ozturk Y, Varol C (eds) Proceedings of the 4th International Symposium on Digital Forensics and Security. IEEE, pp 7–13. https://doi.org/10.1109/ISDFS.2016.7473509
Branitskiy A, Kotenko I (2015) Network attack detection based on combination of neural, immune and neuro-fuzzy classifiers. In: Plessl C, Baz DE, Cardoso JMP, Veiga L, Rauber T (eds) 18th International Conference on Computational Science and Engineering. IEEE, pp 152–159. https://doi.org/10.1109/CSE.2015.26
Branitskiy A, Kotenko I (2017a) Hybridization of computational intelligence methods for attack detection in computer networks. J Comput Sci 23:145–156. https://doi.org/10.1016/j.jocs.2016.07.010
Branitskiy A, Kotenko I (2017b) Network anomaly detection based on an ensemble of adaptive binary classifiers. In: Rak J, Bay J, Kotenko I, Popyack L, Skormin V, Szczypiorski K (eds) Computer network security. Springer, Cham, pp 143–157. https://doi.org/10.1007/978-3-319-65127-9_12
Branitskiy A, Kotenko I (2018) Applying artificial intelligence methods to network attack detection. In: Sikos LF (ed) AI in cybersecurity. Springer, Cham, pp 115–149. https://doi.org/10.1007/978-3-319-98842-9_5
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1023/A:1018054314350
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Chan PK, Lippmann RP (2006) Machine learning for computer security. J Mach Learn Res 7:2669–2672
Coates A, Ng AY (2012) Learning feature representations with k-means. In: Montavon G, Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Heidelberg, pp 561–580. https://doi.org/10.1007/978-3-642-35289-8_30
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1023/A:1022627411411
Derbeko P, Dolev S, Gudes E, Sharma S (2016) Security and privacy aspects in MapReduce on clouds: a survey. Comp Sci Rev 20:1–28. https://doi.org/10.1016/j.cosrev.2016.05.001
Evans D (2011) The Internet of Things: how the next evolution of the Internet is changing everything. https://www.cisco.com/c/dam/en_us/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf, CISCO white paper
Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media
Holmes A (2012) Hadoop in practice. Manning, Greenwich, CT, USA
Jagadish HV, Ooi BC, Tan KL, Yu C, Zhang R (2005) iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Trans Database Syst 30(2):364–397. https://doi.org/10.1145/1071610.1071612
Joseph AD, Laskov P, Roli F, Tygar JD, Nelson B (2012) Machine learning methods for computer security. Dagstuhl Manifestos 3(1):1–30. http://drops.dagstuhl.de/opus/volltexte/2013/4356/pdf/dagman-v003-i001-p001-12371.pdf
Kim MJ, Yu YS (2015) Development of real-time big data analysis system and a case study on the application of information in a medical institution. Int J Softw Eng Appl 9(7):93–102. https://doi.org/10.14257/ijseia.2015.9.7.10
Kotenko I, Fedorchenko A, Saenko I, Kushnerevich A (2018a) Parallelization of security event correlation based on accounting of event type links. In: Merelli I, LiòP, Kotenko I (eds) 26th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. IEEE, pp 462–469. https://doi.org/10.1109/PDP2018.2018.00080
Kotenko I, Saenko I, Branitskiy A (2018b) Applying big data processing and machine learning methods for mobile Internet of Things security monitoring. J Internet Serv Inf Secur 8(3):54–63. https://doi.org/10.22667/JISIS.2018.08.31.054
Kotenko I, Saenko I, Branitskiy A (2018c) Framework for mobile Internet of Things security monitoring based on big data processing and machine learning. IEEE Access 6:72,714–72,723. https://doi.org/10.1109/ACCESS.2018.2881998
Kotenko I, Saenko I, Branitskiy A (2019a) Detection of distributed cyber attacks based on weighted ensemble of classifiers and Big Data processing architecture. In: IEEE INFOCOM19 Workshop of BigSecurity. IEEE
Kotenko I, Saenko I, Kushnerevich A, Branitskiy A (2019b) Attack detection in IoT critical infrastructures: a machine learning and Big Data processing approach. In: 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. IEEE, pp 340–347. https://doi.org/10.1109/EMPDP.2019.8671571
Koutsoumpakis G (2014) Spark-based application for abnormal log detection. MSc thesis
Kriegel H, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):231–240. https://doi.org/10.1002/widm.30
Maleh Y, Abdellah E (2016) Towards an efficient datagram transport layer security for constrained applications in Internet of Things. Int Rev Comput Softw 11(7):611–621. https://doi.org/10.15866/irecos.v11i7.9438
Marchal S, Jiang X, State R, Engel T (2014) A Big Data architecture for large scale security monitoring. In: Chen P, Jain H (eds) 2014 IEEE International Congress on Big Data. IEEE, Piscataway, NJ, USA, pp 56–63. https://doi.org/10.1109/BigData.Congress.2014.18
Meidan Y, Bohadana M, Mathov Y, Mirsky Y, Shabtai A, Breitenbacher D, Elovici Y (2018) N-BaIoT–network-based detection of IoT botnet attacks using deep autoencoders. IEEE Pervas Comput 17(3):12–22. https://doi.org/10.1109/MPRV.2018.03367731
Nguyen KK, Hoang DT, Niyato D, Wang P, Nguyen D, Dutkiewicz E, (2018) Cyberattack detection in mobile cloud computing: a deep learning approach. In: IEEE Wireless Communications and Networking Conference. IEEE, Piscataway, NJ, USA. https://doi.org/10.1109/WCNC.2018.8376973
Saenko I, Kotenko I, Kushnerevich A (2017) Parallel processing of big heterogeneous data for security monitoring of IoT networks. In: Kotenko I, Cotronis Y, Daneshtalab M (eds) 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. IEEE, pp 329–336. https://doi.org/10.1109/PDP.2017.45
Sahs J, Khan L (2012) A machine learning approach to Android malware detection. In: Memon N, Zeng D (eds) 2012 European intelligence and security informatics conference. IEEE, pp 141–147. https://doi.org/10.1109/EISIC.2012.34
Sangameswar S (2014) Big Data—an introduction. CreateSpace Independent Publishing Platform
Seber GAF, Lee AJ (2012) Linear regression analysis. Wiley, Hoboken, NJ, USA
Shamili AS, Bauckhage C, Alpcan T (2010) Malware detection on mobile devices using distributed machine learning. In: 20th International Conference on Pattern Recognition. IEEE Computer Society, Los Alamitos, CA, USA, pp 4348–4351. https://doi.org/10.1109/ICPR.2010.1057
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusiontraffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, pp 108–116
Shcherbakov M, Kachalov D, Kamaev V, Shcherbakova N, Tyukov A, Sergey S (2015) A design of web application for complex event processing based on Hadoop and Java Servlets. Int J Soft Comput 10(3):218–219. https://doi.org/10.3923/ijscomp.2015.218.219
Shi ZJ, Yan H (2008) Software implementations of elliptic curve cryptography. Int J Netw Secur 7(1):157–166
Shoro AG, Soomro TR (2015) Big Data analysis: Ap Spark perspective. Glob J Comput Sci Technol Softw Data Eng 15(1):1–8
Xiao L, Wan X, Lu X, Zhang Y, Wu D (2018) IoT security techniques based on machine learning. https://arxiv.org/pdf/1801.06275.pdf
Zhang H (2004) The optimality of naïve Bayes. In: Barr V, Markov Z (eds) Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference. AAAI, Menlo Park, CA, USA, pp 562–567. https://aaai.org/Papers/FLAIRS/2004/Flairs04-097.pdf
Zygouras N, Zacheilas N, Kalogeraki V, Kinane D, Gunopulos D (2015) In: Proceedings of the 18th International Conference on Extending Database Technology, pp 653–664
Acknowledgements
Research is carried out with support of Ministry of Education and Science of the Russian Federation as part of Agreement No. 05.607.21.0322 (identifier RFMEFI60719X0322).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kotenko, I., Saenko, I., Branitskiy, A. (2020). Machine Learning and Big Data Processing for Cybersecurity Data Analysis. In: Sikos, L., Choo, KK. (eds) Data Science in Cybersecurity and Cyberthreat Intelligence. Intelligent Systems Reference Library, vol 177. Springer, Cham. https://doi.org/10.1007/978-3-030-38788-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-38788-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38787-7
Online ISBN: 978-3-030-38788-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)