Abstract
Corporate security is usually one of the matters in which companies invest more resources, since the loss of information directly translates into monetary losses. Security issues might have an origin in external attacks or internal security failures, but an important part of the security breaches is related to the lack of awareness that the employees have with regard to the use of the Web. In this work we have focused on the latter problem, describing the improvements to a system able to detect anomalous and potentially insecure situations that could be dangerous for a company. This system was initially conceived as a better alternative to what are known as black/white lists. These lists contain URLs whose access is banned or dangerous (black list), or URLs to which the access is permitted or allowed (white list). In this chapter, we propose a system that can initially learn from existing black/white lists and then classify a new, unknown, URL request either as “should be allowed” or “should be denied”. This system is described, as well as its results and the improvements made by means of an initial data pre-processing step based on applying Rough Set Theory for feature selection. We prove that high accuracies can be obtained even without including a pre-processing step, reaching between 96 and 97 % of correctly classified patterns. Furthermore, we also prove that including the use of Computational Intelligence techniques for pre-processing the data enhances the system performance, in terms of running time, while the accuracies remain close to 97 %. Indeed, among the obtained results, we demonstrate that it is possible to obtain interesting rules which are not based only on the URL string feature, for classifying new unknown URLs access requests as allowed or as denied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Taken from a log file released to us by a Spanish company.
- 2.
The set of rules has been written by the same company, with respect to its employees.
- 3.
Data which was gathered from the real world, and was not artificially generated.
- 4.
Format of Weka files.
- 5.
Trees can be deployed as rules.
References
Alfaro-Cid, E., Sharman, K., Esparcia-Alcázar, A.: A genetic programming approach for bankruptcy prediction using a highly unbalanced database. In: Giacobini, M. (ed.) Applications of Evolutionary Computing. Lecture Notes in Computer Science, vol. 4448, pp. 169–178. Springer, Heidelberg (2007). http://dx.doi.org/10.1007/978-3-540-71805-5_19
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breivik, G.: Abstract misuse patterns—a new approach to security requirements. Master thesis. Department of Information Science. University of Bergen, Bergen, N-5020 NORWAY (2002)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). http://dl.acm.org/citation.cfm?id=1622407.1622416
Chawla, N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, USA (2005). http://dx.doi.org/10.1007/0-387-25465-X_40
Chelly, Z.: New danger classification methods in an imprecise framework. Ph.D. thesis. Laboratoire de Recherche Opérationelle de Décision et de Contrôle de Processus, Institut Supérieur de Gestion, Tunisia (2014)
Cheswick, W.R., Bellovin, S.M., Rubin, A.D.: Firewalls and Internet Security: Repelling the Wily Hacker. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Danezis, G.: Inferring privacy policies for social networking services. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. AISec 2009, pp. 5–10. ACM, New York (2009). http://doi.acm.org/10.1145/1654988.1654991
Elomaa, T., Kaariainen, M.: An analysis of reduced error pruning. Artif. Intell. Res. 15, 163–187 (2001)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann, San Francisco (1998)
Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)
Greenstadt, R., Beal, J.: Cognitive security for personal devices. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 27–30. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456383
Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: Fourth International Conference on Natural Computation. ICNC 2008, vol. 4, pp. 192–201, October 2008
Harris, E.: The Next Step in the Spam Control War: Greylisting (2003)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449, October 2002. http://dl.acm.org/citation.cfm?id=1293951.1293954
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 17(1), 1 (2005)
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)
Kaeo, M.: Designing Network Security. Cisco Press, Indianapolis (2003)
Kelley, P.G., Hankes Drielsma, P., Sadeh, N., Cranor, L.F.: User-controllable learning of security and privacy policies. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 11–18. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456380
Lim, Y.T., Cheng, P.C., Clark, J., Rohatgi, P.: Policy evolution with genetic programming: a comparison of three approaches. In: IEEE Congress on Evolutionary Computation. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 1792–1800, June 2008
Lim, Y.T., Cheng, P.C., Rohatgi, P., Clark, J.A.: Mls security policy evolution with genetic programming. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1571–1578. ACM, New York (2008). http://doi.acm.org/10.1145/1389095.1389395
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer, USA (1998)
Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: Hämmerli, B.M., Sommer, R. (eds.) Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 20–39. Springer, Heidelberg (2007)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, no. 14. California, USA (1967)
Martin, B.: Instance-based learning: nearest neighbor with generalization. Master’s thesis, University of Waikato, Hamilton, New Zealand (1995)
McAfee: Mcafee labs threats report, June 2014 . http://www.mcafee.com/uk/about/newsroom/research-reports.aspx
Mora, A., De las Cuevas, P., Merelo, J.: Going a step beyond the black and white lists for url accesses in the enterprise by means of categorical classifiers. In: Proceedings of the International Conference on Evolutionary Computation Theory and Applications (ECTA). SCITEPRESS, pp. 125–134 (2014)
Mora, A., De las Cuevas, P., Merelo, J., Zamarripa, S., Juan, M., Esparcia-Alcázar, A., Burvall, M., Arfwedson, H., Hodaie, Z.: MUSES: a corporate user-centric system which applies computational intelligence methods. In: Shin, D. et al., (ed.) 29th Symposium On Applied Computing, pp. 1719–1723 (2014)
Netcraft: November 2014 web server survey (2014). http://news.netcraft.com/archives/category/web-server-survey/
Pawlak, Z., Polkowski, L., Skowron, A.: Rough set theory. In: Wah, B.W. (ed.) Wiley Encyclopedia of Computer Science and Engineering. Wiley, Hoboken (2008)
Quinlan, J.R.: Simplifying decision trees. Man Mach. Stud. 27(3), 221–234 (1987)
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Seigneur, J.M., Kölndorfer, P., Busch, M., Hochleitner, C.: A survey of trust and risk metrics for a BYOD mobile working world. In: Third International Conference on Social Eco-Informatics (2013)
Shen, Q., Jensen, R.: Rough sets, their extensions and applications. Int. J. Autom. Comput. 4(3), 217–228 (2007)
Stanton, J.M., Stam, K.R., Mastrangelo, P., Jolton, J.: Analysis of end user security behaviors. Comput. Secur. 24(2), 124–133 (2005)
Suarez-Tangil, G., Palomar, E., Fuentes, J., Blasco, J., Ribagorda, A.: Automatic rule generation based on genetic programming for event correlation. In: Herrero, A., Gastaldo, P., Zunino, R., Corchado, E. (eds.) Computational Intelligence in Security for Information Systems, Advances in Intelligent and Soft Computing, vol. 63, pp. 127–134. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-04091-7_16
Team, S.: Squid website (2013). http://www.squid-cache.org/
Team, S.: Squid faq—squid log files (2014)
Team, T.J.D.: Drools documentation. version 6.0.1.final (2013). http://docs.jboss.org/drools/release/6.0.1.Final/drools-docs/html/index.html
Team, T.J.D.: Drools website (2013). http://www.jboss.org/drools.html
Waikato, U.: Weka (1993), University of Waikato, September 2014, http://www.cs.waikato.ac.nz/ml/weka/
Wessels, D.: Squid: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2004)
Wiki, S.: Squid hierarchy (2014)
Wilson, D.C., Leake, D.B.: Maintaining case-based reasoners: dimensions and directions. Comput. Intell. 17(2), 196–213 (2001)
Zhong, N., Dong, J., Ohsuga, S.: Using rough sets with heuristics for feature selection. J. Intell. Inf. Syst. 16(3), 199–214 (2001)
Acknowledgments
The authors would like to thank GENIL-SSV’2015 for ensuring the visit of Dr. Zeineb Chelly to be part of this project. We thank Dr. Zeineb Chelly from Institut Supérieur de Gestion, Tunisia for her technical insight, recommendations and suggestions and for her assistance during the practical experiments. This paper has been funded in part by European project MUSES (FP7-318508), along with Spanish National project TIN2011-28627-C04-02 (ANYSELF), project P08-TIC-03903 (EVORQ) awarded by the Andalusian Regional Government, and projects 83 (CANUBE), and GENIL PYR-2014-17, both awarded by the CEI-BioTIC UGR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
de las Cuevas, P., Chelly, Z., Mora, A.M., Merelo, J.J., Esparcia-Alcázar, A.I. (2016). An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique. In: Abielmona, R., Falcon, R., Zincir-Heywood, N., Abbass, H. (eds) Recent Advances in Computational Intelligence in Defense and Security. Studies in Computational Intelligence, vol 621. Springer, Cham. https://doi.org/10.1007/978-3-319-26450-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-26450-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26448-6
Online ISBN: 978-3-319-26450-9
eBook Packages: EngineeringEngineering (R0)