Abstract
The imbalance problem is receiving an increasing attention in the literature. Studies on binary cases are recurrent but limited when considering the multiple classes approach. Solutions to imbalance domains may be divided into two groups, data level approaches, and algorithmic approaches. The first approach is more common and focuses on changing the training data aiming to balance the data set, oversampling the smallest classes, undersampling the biggest ones or using a combination of both. Instance reduction is another approach to the problem. It tries to find the best-reduced set of instances that represent the original training set. In this work, we propose a new Prototype Generation method called DCIA. It dynamically inserts new prototypes for each class and then adjusts their positions with a search algorithm. The set of generated prototypes may be used to train any classifier. Experiments showed its potentiality by enabling an 1NN classifier to perform sometimes as well or even better than some ensemble classifiers created for different multiclass imbalanced domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcalá-Fdez, J., et al.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Log. Soft Comput. 17(2–3), 255–287 (2011)
Asuncion, A., Newman, D.: UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences (2007). http://www.ics.uci.edu/~mlearn/MLRepository.html
Bi, J., Zhang, C.: An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl. Based Syst. 158, 81–93 (2018). https://doi.org/10.1016/j.knosys.2018.05.037
Cheng, R., Jin, Y.: A competitive swarm optimizer for large scale optimization. IEEE Trans. Cybern. 45(2), 191–204 (2015). https://doi.org/10.1109/TCYB.2014.2322602
Fernández, A., López, V., Galar, M., del Jesus, M.J., Herrera, F.: Analysing the classification of imbalanced data-sets with multiple classes: binarization techniques and ad-hoc approaches. Knowl. Based Syst. 42, 97–110 (2013). https://doi.org/10.1016/j.knosys.2013.01.018
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl. Based Syst. 25(1), 13–21 (2012). https://doi.org/10.1016/j.knosys.2011.06.013
Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft Comput. 22(3), 811–822 (2018). https://doi.org/10.1007/s00500-016-2385-6
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017). https://doi.org/10.1016/j.eswa.2016.12.035
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers, Waltham (2012)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002). https://doi.org/10.3233/IDA-2002-6504
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). https://doi.org/10.1016/j.ins.2013.07.007
López, V., Triguero, I., Carmona, C.J., García, S., Herrera, F.: Addressing imbalanced classification with instance generation techniques: IPADE-ID. Neurocomputing 126, 15–28 (2014). https://doi.org/10.1016/j.neucom.2013.01.050
Mafarja, M., Mirjalili, S.: Whale optimization approaches for wrapper feature selection. Appl. Soft Comput. 62, 441–453 (2018). https://doi.org/10.1016/j.asoc.2017.11.006
Millán-Giraldo, M., García, V., Sánchez, J.S.: Prototype selection in imbalanced data for dissimilarity representation - a preliminary study. In: Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, ICPRAM, vol. 1, pp. 242–247 (2012). https://doi.org/10.5220/0003795502420247
Moayedikia, A., Ong, K., Boo, Y.L., Yeoh, W.G.S., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017). https://doi.org/10.1016/j.engappai.2016.10.008
Napierala, K., Stefanowski, J.: BRACID: a comprehensive approach to learning rules from imbalanced data. J. Intell. Inf. Syst. 39(2), 335–373 (2012). https://doi.org/10.1007/s10844-011-0193-0
Oliveira, D.V.R., Magalhaes, G.R., Cavalcanti, G.D.C., Ren, T.I.: Improved self-generating prototypes algorithm for imbalanced datasets. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp. 904–909. IEEE Computer Society (2012). https://doi.org/10.1109/ICTAI.2012.126
Oliveira, D.V.R., Cavalcanti, G.D.C., Ren, T.I., Silva, R.M.A.: Evolutionary adaptive self-generating prototypes for imbalanced datasets. In: 2015 International Joint Conference on Neural Networks, IJCNN 2015, Killarney, Ireland, 12–17 July 2015, pp. 1–8. IEEE (2015). https://doi.org/10.1109/IJCNN.2015.7280702
Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Monroy, R., Arroyo-Figueroa, G., Sucar, L.E., Sossa, H. (eds.) MICAI 2004. LNCS (LNAI), vol. 2972, pp. 312–321. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24694-7_32
Rashedi, E., Nezamabadi-pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179, 2232–2248 (2009). https://doi.org/10.1016/j.ins.2009.03.004
Silva, E.J.R., Zanchettin, C.: On the existence of a threshold in class imbalance problems. In: IEEE International Conference on Systems, Man, and Cybernetics, Hong Kong, pp. 2714–2719 (2015). https://doi.org/10.1109/SMC.2015.474
Silva, E.J.R., Zanchettin, C.: A voronoi diagram based classifier for multiclass imbalanced data sets. In: 2016 5th Brazilian Conference on Intelligent Systems (BRACIS), pp. 109–114 (2016). https://doi.org/10.1109/BRACIS.2016.030
Sun, Y., Kamel, M.S., Wang, Y.: Boosting for learning multiple classes with imbalanced class distribution. In: ICDM, pp. 592–602. IEEE Computer Society (2006). https://doi.org/10.1109/ICDM.2006.29
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. Part C 42(1), 86–100 (2012). https://doi.org/10.1109/TSMCC.2010.2103939
Verbiest, N., Ramentol, E., Cornelis, C., Herrera, F.: Improving SMOTE with fuzzy rough prototype selection to detect noise in imbalanced classification data. In: Pavón, J., Duque-Méndez, N.D., Fuentes-Fernández, R. (eds.) IBERAMIA 2012. LNCS (LNAI), vol. 7637, pp. 169–178. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34654-5_18
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B 42(4), 1119–1130 (2012). https://doi.org/10.1109/TSMCB.2012.2187280
Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl. Based Syst. 94, 88–104 (2016). https://doi.org/10.1016/j.knosys.2015.11.013
Acknowledgment
The authors would like to thank CNPq and FACEPE (Brazilian research agencies) for financial support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Silva, E.J.R., Zanchettin, C. (2019). Dynamic Centroid Insertion and Adjustment for Data Sets with Multiple Imbalanced Classes. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning. ICANN 2019. Lecture Notes in Computer Science(), vol 11728. Springer, Cham. https://doi.org/10.1007/978-3-030-30484-3_60
Download citation
DOI: https://doi.org/10.1007/978-3-030-30484-3_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30483-6
Online ISBN: 978-3-030-30484-3
eBook Packages: Computer ScienceComputer Science (R0)