Abstract
Since SVM is unfair to the rare class for the classification of unbalanced data, a new balancing strategy based on common strategy of undersampling the training data is presented. Firstly, the fuzzy C-means clustering algorithm is used to cluster the unbalanced data sets, and choose the negative class samples whose memberships are greater than a certain threshold (supposing the number of positive class samples is less than that of negative class samples). The selected samples and the positive class of original samples are combined into a new training data set. After that, the new data set are used to train a support vector machine. At last, the simulations on unbalanced data show that the proposed algorithm can compensate the ill-effect of tendency when support vector machine are utilized to deal with the unbalanced data classification. Moreover, compared with the traditional support vector machine and some other improved algorithm, the proposed algorithm performs superior classification ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Wu, G., Chang, E.: Class-boundary alignment for unbalanced dataset learning. In: ICML, workshop on learning from unbalanced data sets II, Washington, DC, vol. 6(1), pp. 7–19 (2003)
Bahlmann, C., Haasdonk, B.: On-line handwriting recognition with support vector machines-a kernel approach. Frontiers in Handwriting Recognition, 49–54 (2002)
Huang, H.P., Liu, Y.H.: Fuzzy support vector machines for pattern recognition and data mining. International Journal of Fuzzy Systems 4(3), 826–835 (2004)
Chawla, N.V., Bowyer, K.W.: Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16(3), 321–357 (2002)
Rehan, A., Stephen, K.: Applying support vector machines to imbalanced datasets. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 39–50. Springer, Heidelberg (2004)
Bezdek, J.C., Ehrlich, R.: FCM: The fuzzy c-means clustering algorithm. Computer & Geosciences 10(22), 191–203 (1981)
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
Peng, L., Zhang, K.: Support vector machines based on fuzzy C-means clustering. Industrial Control Computer 19(11), 43–44 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, B., Ha, M., Wang, C. (2010). An Improved Algorithm of Unbalanced Data SVM. In: Cao, By., Wang, Gj., Guo, Sz., Chen, Sl. (eds) Fuzzy Information and Engineering 2010. Advances in Intelligent and Soft Computing, vol 78. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14880-4_60
Download citation
DOI: https://doi.org/10.1007/978-3-642-14880-4_60
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14879-8
Online ISBN: 978-3-642-14880-4
eBook Packages: EngineeringEngineering (R0)