Abstract
Classification is a major constituent of the data mining tool kit. Well-known methods for classification are either built on the principle of logic or on statistical reasoning. For imbalanced and noisy cases, classification may however fail to deliver on basic data mining goals, i.e., identifying statistical dependencies in data. In this article, we propose a novel strategy for data mining based on partitioning of the feature space through Voronoi tessellation and Genetic Algorithm, where the latter is applied to solve a combinatorial optimization problem. We apply the suggested methodology to a range of classification problems of varying imbalance and noise and compare the performance of the suggested method with well-known classification methods such as (SVM, KNN, and ANN). The results obtained indicate the proposed methodology to be well suited for data mining tasks in case of highly imbalanced classes and significant noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agresti, A., Coull, B.A.: Approximate is better than exact for interval estimation of binomial proportions. Am. Stat. 52(2), 119–126 (1998)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees (1984)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Clopper, C.J., Pearson, E.S.: The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4), 404–413 (1934)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Fidelis, M.V., Lopes, H.S., Freitas, A.A.: Discovering comprehensible classification rules with a genetic algorithm. In: Proceedings of the 2000 Congress on Evolutionary Computation, vol. 1, pp. 805–810. IEEE (2000)
Friedman, J.H.: Regularized discriminant analysis. J. Am. Stat. Assoc. 84(405), 165–175 (1989)
Khan, A.R., Schioler, H., Knudsen, T., Kulahci, M.: Statistical data mining for efficient quality control in manufacturing. In: 2015 IEEE 20th Conference on Emerging Technologies & Factory Automation (ETFA), pp. 1–4. IEEE (2015)
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Kotsiantis, S.B., Zaharakis, I., Pintelas, P.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3–24 (2007)
Lee, D.-T., Schachter, B.J.: Two algorithms for constructing a delaunay triangulation. Int. J. Comput. Inf. Sci. 9(3), 219–242 (1980)
Mehta, M., Agrawal, R., Rissanen, J.: SLIQ: a fast scalable classifier for data mining. In: Apers, P., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 18–32. Springer, Heidelberg (1996). https://doi.org/10.1007/BFb0014141
Niuniu, X., Yuxun, L.: Review of decision trees. In: 2010 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT), pp. 105–109 (2010)
Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. (2011)
Quinlan, J.: Programs for Machine Learning (1993)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Rastogi, R., Shim, K.: PUBLIC: a decision tree classifier that integrates building and pruning. VLDB 98, 24–27 (1998)
Ripley, B., Venables, W.: Package class. CRAN R Project (2015)
Rosenblatt, F.: Principles of Neurodynamics (1962)
Scholkopft, B., Mullert, K.-R.: Fisher discriminant analysis with kernels. Neural Netw. Signal Process. IX 1(1), 1 (1999)
Vladimir, V.N., Vapnik, V.: The Nature of Statistical Learning Theory (1995)
Wan, E.A.: Neural network classification: a bayesian interpretation. IEEE Trans. Neural Netw./A Publ. IEEE Neural Netw. Counc. 1(4), 303–305 (1989)
Williams, D.R.G.H.R., Hinton, G.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Khan, A.R., Schiøler, H., Zaki, M., Kulahci, M. (2018). Rare-Events Classification: An Approach Based on Genetic Algorithm and Voronoi Tessellation. In: Ganji, M., Rashidi, L., Fung, B., Wang, C. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 11154. Springer, Cham. https://doi.org/10.1007/978-3-030-04503-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-04503-6_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04502-9
Online ISBN: 978-3-030-04503-6
eBook Packages: Computer ScienceComputer Science (R0)