Abstract
Imbalanced data learning (IDL) is one of the most active and important fields in machine learning research. This paper focuses on exploring the efficiencies of four different SVM ensemble methods integrated with under-sampling in IDL. The experimental results on 20 UCI imbalanced datasets show that two new ensemble algorithms proposed in this paper, i.e., CABagE (which is bagging-style) and MABstE (which is boosting-style), can output the SVM ensemble classifiers with better minority-class-recognition abilities than the existing ensemble methods. Further analysis on the experimental results indicates that MABstE has the best overall classification performance, and we believe that this should be attributed to its more robust example-weighting mechanism.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations Newsletter 6, 1–6 (2004)
Yang, Q., Wu, X.: 10 Challenging Problems in Data Mining Research. International Journal of Information Technology & Decision Making 5, 597–604 (2006)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-Sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)
Kubat, M., Matwin, S.: Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data. ACM SIGKDD Explorations Newsletter 6, 20–29 (2004)
Liu, Y., An, A., Huang, X.J.: Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS, vol. 3918, pp. 107–118. Springer, Heidelberg (2006)
Liu, X.Y., Wu, J.X., Zhou, Z.H.: Exploratory under-Sampling for Class-Imbalance Learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 39, 539–550 (2009)
Dietterich, T.: Ensemble Learning. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, 2nd edn., pp. 110–125. The MIT Press, Cambridge (2002)
Wang, S.-J., Mathew, A., Chen, Y., Xi, L.-F., Ma, L., Lee, J.: Empirical Analysis of Support Vector Machine Ensemble Classifiers. Expert Systems with Applications 36, 6466–6476 (2008)
Kim, H.-C., Pang, S., Je, H.-M., Kim, D., Bang, S.Y.: Constructing Support Vector Machine Ensemble. Pattern Recognition 36, 2757–2767 (2003)
Breiman, L.: Bagging Predictors Machine Learning 24, 123–140 (1996)
Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning 36, 105–139 (1999)
Freund, Y., Schapire, R.E., Abe, N.: A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence 14, 771–780 (1999)
Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric Bagging and Random Subspace for Support Vector Machines-Based Relevance Feedback in Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1088–1099 (2006)
Caruana, R., Niculescu-Mizil, A.: Data Mining in Metric Space: An Empirical Analysis of Supervised Learning Performance Criteria. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 69–78. ACM, New York (2004)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000)
Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, Chichester (1999)
Chang, C.-C., Lin, C.-J.: Libsvm: A Library for Support Vector Machines (2001), http://www.Csie.Ntu.Edu.Tw/~Cjlin/Libsvm
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, Z., Hao, Z., Yang, X., Liu, X. (2009). Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_54
Download citation
DOI: https://doi.org/10.1007/978-3-642-03348-3_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)