Abstract
In the paper a resampling approach for unbalanced datasets classification is proposed. The method suitably combines undersampling and oversampling by means of genetic algorithms according to a set of criteria and determines the optimal unbalance rate. The method has been tested on industrial and literature datasets. The achieved results put into evidence a sensible increase of the rare patterns detection rate and an improvement of the classification performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Borselli, A., Colla, V., Vannucci, M., Veroli, M.: A fuzzy inference system applied to defect detection in flat steel production. In: 2010 World Congress on Computational Intelligence, Barcelona, Spain, 18–23 July 2010, pp. 148–153 (2010)
Stepenosky, N., Polikar, R., Kounios, J., Clark, C.: Ensemble techniques with weighted combination rules for early diagnosis of alzheimer’s disease. In: International Joint Conference on Neural Networks, IJCNN 2006 (2006)
Vannucci, M., Colla, V., Nastasi, G., Matarese, N.: Detection of rare events within industrial datasets by means of data resampling and specific algorithms. Int. J. Simul. Syst. Sci. Technol. 11(3), 1–11 (2010)
García-Pedrajas, N., Ortiz-Boyer, D., García-Pedrajas, M.D., Fyfe, C.: Class imbalance methods for translation initiation site recognition. In: Proceedings of Trends in Applied Intelligent Systems, IEA/AIE 2010, Part I, pp. 327–336 (2010)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Estabrooks, A., Japkowicz, N.: A multiple resampling method for learning from imbalanced datasets. Comput. Intell. 20(1), 18–36 (2004)
Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning, ICML 1999, pp. 97–105 (1999)
Scholkopf, B., et al.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)
Vannucci, M., Colla, V.: Novel classification method for sensitive problems and uneven datasets based on neural networks and fuzzy logic. Appl. Soft Comput. 11(2), 2383–2390 (2011)
Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded neural networks for sensitive industrial classification tasks. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009. LNCS, vol. 5517, pp. 1320–1327. Springer, Heidelberg (2009)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Proceedings of Artificial Intelligence in Medicine: 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66 (2001)
Japkowicz, N.: The class imbalance problem: significance and strategies. In: International Conference on Artificial Intelligence, Las Vegas, Nevada, pp. 111–117 (2000)
Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 69–76 (2004)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135, 32–41 (2014)
Vannucci, M., Colla, V.: Smart under-sampling for the detection of rare patterns in unbalanced datasets. Smart Innov. Syst. Technol. 56, 395–404 (2016)
Lichman, M.: UCI ML Repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Vannucci, M., Colla, V. (2018). Genetic Algorithms Based Resampling for the Classification of Unbalanced Datasets. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-59424-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59423-1
Online ISBN: 978-3-319-59424-8
eBook Packages: EngineeringEngineering (R0)