Abstract
Imbalanced data will bring difficulties in data processing, which is very common in data engineering. These data usually have sophisticated distributions. Different resampling methods are required for dealing with data with different distributions, while fixed ones are adopted traditionally. Therefore, to select appropriate resampling methods for data with such characteristics, we propose a novel classification method for Imbalanced Data based on Ant Lion Optimizer, called ALOID. It combines adaptive resampling strategies, feature selection, and ensemble classifiers. The adaptive resampling strategy refers to utilizing roulette wheel selection to choose the most suitable resampling method with a greater probability for each dataset according to the variable probabilities of resampling methods. Then a two-stage approach is further used in feature selection: preprocessing and enhancing. In addition, we adopt an ensemble classifier with dynamic weights. The variable probabilities of resampling methods, features, and the weights of base classifiers are coded in individual solutions. A large number of comprehensive experiments have been carried out in this paper. ALOID is compared with 8 state-of-the-art algorithms on 33 publicly available imbalanced datasets. Using K-nearest neighbor as the base classifier, we have found ALOID outperforms other methods in most cases, especially on high-dimensional imbalanced datasets. Experiment results demonstrate the performance advantage of ALOID over other comparable algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guo, H., Li, Y., Jennifer, S., Gu, M., Huang, Y., Gong, B.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1–50 (2016)
Wang, C., Deng, C., Yu, Z., Hui, D., Gong, X., Luo, R.: Adaptive ensemble of classifiers with regularization for imbalanced data classification. Inf. Fusion 69, 81–102 (2021)
Alkuhlani, A., Nassef, M., Farag, I.: Multistage feature selection approach for high-dimensional cancer data. Soft Comput. 21, 6895–6906 (2017)
Mousavian, M., Chen, J., Greening, S.: Feature selection and imbalanced data handling for depression detection. In: Wang, S., et al. (eds.) BI 2018. LNCS (LNAI), vol. 11309, pp. 349–358. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05587-5_33
Sun, J., et al.: FDHelper: assist unsupervised fraud detection experts with interactive feature selection and evaluation. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12. Association for Computing Machinery (2020)
Al-Mandhari, I., Guan, L., Edirisinghe, E.A.: Impact of the structure of data pre-processing pipelines on the performance of classifiers when applied to imbalanced network intrusion detection system dataset. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1037, pp. 577–589. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29516-5_45
Sharma, S., Somayaji, A., Japkowicz, N.: Learning over subconcepts: strategies for 1-class classification. Comput. Intell. 34, 440–467 (2018)
Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2014)
Rodríguez, J.J., Díez-Pastor, J.F., Arnaiz-González, l., Kuncheva, L.I.: Random balance ensembles for multiclass imbalance learning. Knowl.-Based Syst. 193, 105434 (2020)
Liu, Y., Wang, Y., Ren, X., Zhou, H., Diao, X.: A classification method based on feature selection for imbalanced data. IEEE Access 7, 81794–81807 (2019)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Soltanzadeh, P., Hashemzadeh, M.: RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inf. Sci. 542, 92–111 (2021)
Turlapati, V.P.K., Prusty, M.R.: Outlier-smote: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3–4, 100023 (2020)
Hamidzadeh, J., Kashefi, N., Moradi, M.: Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 90, 103500 (2020)
Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion 39, 1–24 (2018)
Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)
Almaghrabi, F., Xu, D., Yang, J.: An evidential reasoning rule based feature selection for improving trauma outcome prediction. Appl. Soft Comput. 103, 107112 (2021)
Effrosynidis, D., Arampatzis, A.: An evaluation of feature selection methods for environmental data. Eco. Inform. 61, 101224 (2021)
Mena, L.J., Gonzalez, J.A.: Symbolic one-class learning from imbalanced datasets: application in medical diagnosis. Int. J. Artif. Intell. Tools 18(2), 273–309 (2009)
Tsai, C.F., Lin, W.C.: Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9, 13717–13726 (2021)
Lee, J., Lee, Y.C., Kim, J.T.: Fault detection based on one-class deep learning for manufacturing applications limited to an imbalanced database. J. Manuf. Syst. 57, 357–366 (2020)
Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)
Li, F., Zhang, X., Zhang, X., Du, C., Xu, Y., Tian, Y.: Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Inf. Sci. 422, 242–256 (2018)
Wang, Z., Wang, B., Cheng, Y., Li, D., Zhang, J.: Cost-sensitive fuzzy multiple kernel learning for imbalanced problem. Neurocomputing 366, 178–193 (2019)
Chen, Z., Duan, J., Kang, L., Qiu, G.: A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf. Sci. 554, 157–176 (2020)
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(250), 113–141 (2013)
Guo, L., Boukir, S.: Margin-based ordered aggregation for ensemble pruning. Pattern Recogn. Lett. 34(6), 603–609 (2013)
Seng, Z., Kareem, S.A., Varathan, K.D.: A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst. Appl. 168, 114246 (2021)
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 139–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_14
Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)
Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331 (2009)
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets (2018)
Beheshti, Z.: BMNABC: binary multi-neighborhood artificial bee colony for high-dimensional discrete optimization problems. Cybern. Syst. 49, 452–474 (2018)
He, X., Zhang, Q., Sun, N., Dong, Y.: Feature selection with discrete binary differential evolution. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 327–330 (2009)
Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(8), 371–381 (2016)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Yan, K., Zhang, D.: Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens. Actuators B Chem. 212, 353–363 (2015)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5), 360–363 (2005)
Chen, Y., Lin, C.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, M., Liu, Y., Zheng, Q., Li, X., Qin, W. (2022). A Classification Method for Imbalanced Data Based on Ant Lion Optimizer. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_26
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)