Abstract
Classification of imbalanced datasets is one of the challenges in machine learning and data mining domains. The traditional classifiers still need to handle with minority instances. In this paper, we propose an effective method which applies sampling method based on ensemble learning. It uses Adaboost-SVM based on spectral clustering to boost the performance. This method also uses over-sampling and under-sampling methods based on the misclassified instances got by ensemble learning. Compared with the preview algorithms, the experiment results show that the proposed method is effective in dealing with imbalanced data in binary classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. J. Acm SIGKDD Explor. Newslett. 6, 1–6 (2004)
Gao, J.W., Liang, J.Y.: Research and advancement of classification method of imbalanced data sets. J. Comput. Sci. 35, 10–13 (2008)
Tahir, M.A., Kittler, J., Yan, F.: Inverse random under sampling for class imbalance problem and its application to multi-label classification. J. Pattern Recogn. 45, 3738–3750 (2012)
Chawla, N.V., Cieslak, D.A., Hall, L.O.: Automatically countering imbalance and its empirical relationship to cost. J. Data Mining Knowl. Discov. 17, 225–252 (2008)
Sun, Z., Song, Q., Zhu, X.: A novel ensemble method for classifying imbalanced data. J. Pattern Recogn. 48, 1623–1637 (2015)
Chawla, N.V., Bowyer, K.W., Hall, L.O.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Han, H., Wang, W.Y., Mao, B.H.: Borderline-smote: a new over-sampling method in imbalanced data sets learning. J. Lect. Notes Comput. Sci. 3644, 878–887 (2005)
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. Adv. Knowl. Discov. Data Mining 5476, 475–482 (2009)
Fan, W., Stolfo, S.J, Zhang, J.: AdaCost: misclassification cost-sensitive boosting. In: Sixteenth International Conference on Machine Learning, pp. 97–105 . Morgan Kaufmann Publishers Inc. (1999)
Lertampaiporn, S., Thammarongtham, C., Nukoolkit, C.: Heterogeneous ensemble approach with discriminative features and modified-SMOTEbagging for pre-miRNA classification. J. Nucleic Acids Res. 41, e21 (2013)
Chawla, N.V., Lazarevic, A., Hall, L.O.: Smoteboost: improving prediction of the minority class in boosting. J. Lect. Notes Comput. Sci. 2838, 107–119 (2003)
Seiffert, C., Khoshgoftaar, T.M., Hulse, J.V.: Rusboost: a hybrid approach to alleviating class imbalance. J IEEE Trans. Syst. Man Cybern. 40, 185–197 (2010)
Wang, C., Hongye, S.U., Yu, Q.U.: Imbalanced data sets classification method based on over-sampling technique. J. Comput. Eng. Appl. 47, 139–143 (2011)
Li, X.F., Li, J., Dong, Y.F.: A new learning algorithm for imbalanced data—pcboost. J. Chinese J. Comput. 2, 202–209 (2012)
Sobhani, P., Viktor, H., Matwin, S.: Learning from imbalanced data using ensemble methods and cluster-based undersampling. In: Appice, A., Ceci, M., Loglisci, C., Manco, G., Masciari, E., Ras, Z.W. (eds.) NFMCP 2014. LNCS, vol. 8983, pp. 69–83. Springer, Cham (2015). doi:10.1007/978-3-319-17876-9_5
Sun, Z., Song, Q., Zhu, X.: Using coding-based ensemble learning to improve software defect prediction. J. IEEE Trans. Syst. Man Cybern. Part C 42, 1806–1817 (2012)
Schapire, R.E.: The strength of weak learnability. J. Mach. Learn. 5, 197–227 (1990)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1999)
Li, X., Wang, L., Sung, E.: Adaboost with SVM-based component classifiers. J. Eng. Appl. Artif. Intell. 21, 785–795 (2008)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. J. Pattern Recogn. 30, 1145–1159 (1997)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. J. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)
Luxburg, U.V., Belkin, M., Bousquet, O.: Consistency of spectral clustering. J. Ann. Stat. 36, 555–586 (2008)
Acknowledgement
This work was supported in part by National Natural Science Foundation of China (61273225, 61273303, 61373109), the Program for Outstanding Young Science and Technology Innovation Teams in Higher Education Institutions of Hubei Province (No. T201202), and the Program of Wuhan Subject Chief Scientist (201150530152), as well as National “Twelfth Five-Year” Plan for Science & Technology Support (2012BAC22B01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zhang, C., Zhang, X. (2017). An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data. In: Huang, DS., Hussain, A., Han, K., Gromiha, M. (eds) Intelligent Computing Methodologies. ICIC 2017. Lecture Notes in Computer Science(), vol 10363. Springer, Cham. https://doi.org/10.1007/978-3-319-63315-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-63315-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63314-5
Online ISBN: 978-3-319-63315-2
eBook Packages: Computer ScienceComputer Science (R0)