Abstract
The problem of imbalanced data, i.e., when the class labels are unequally distributed, is encountered in many real-life application, e.g., credit scoring, medical diagnostics. Various approaches aimed at dealing with the imbalanced data have been proposed. One of the most well known data pre-processing method is the Synthetic Minority Oversampling Technique (SMOTE). However, SMOTE may generate examples which are artificial in the sense that they are impossible to be drawn from the true distribution. Therefore, in this paper, we propose to apply Restricted Boltzmann Machine to learn an intermediate representation which transform the SMOTE examples to the ones approximately drawn from the true distribution. At the end of the paper we perform an experiment using credit scoring dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the classimbalanced problem. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 475–482. Springer, Heidelberg (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003)
Chen, S., He, H., Garcia, E.: RAMOBoost: Ranked minority oversampling in boosting. IEEE Transactions on Neural Networks 21(10), 1624–1642 (2010)
Ertekin, S., Huang, J., Giles, C.: Active learning for class imbalance problem. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 823–824. ACM (2007)
García, S., Fernández, A., Herrera, F.: Enhancing the effectiveness and interpretability of decision tree and rule induction classifiers with evolutionary training set selection over imbalanced problems. Applied Soft Computing 9(4), 1304–1314 (2009)
He, H., Garcia, E.A.: Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering 21(9), 1263–1284 (2009)
Hido, S., Kashima, H., Takahashi, Y.: Roughly balanced bagging for imbalanced data. Statistical Analysis and Data Mining 2(5–6), 412–426 (2009)
Hinton, G.: A practical guide to training restricted boltzmann machines. Momentum 9(1), 926 (2010)
Hinton, G.E.: A practical guide to training restricted boltzmann machines. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 599–619. Springer, Heidelberg (2012)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005)
Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
Maciejewski, T., Stefanowski, J.: Local neighbourhood extension of smote for mining imbalanced data. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 104–111. IEEE (2011)
Mani, J., Zhang, I.: KNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of International Conference on Machine Learning, Workshop Learning from Imbalanced Data Sets (2003)
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: Smote-rsb*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using smote and rough sets theory. Knowledge and Information Systems 33(2), 245–265 (2012)
Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Information Sciences (2014)
Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: Proceedings of the 24th International Conference on Machine Learning, pp. 791–798. ACM (2007)
Seiffert, C., Khoshgoftaar, T., Van Hulse, J., Napolitano, A.: RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans 40(1), 185–197 (2010)
Tang, Y., Zhang, Y., Huang, Z.: Development of two-stage SVM-RFE gene selection strategy for microarray expression data analysis. IEEE/ACM Transactions on Computational Biology and Bioinformatics 4(3), 365–381 (2007)
Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7), 1088–1099 (2006)
Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems, Man and Cybernetics 6(11), 769–772 (1976)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331. IEEE (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zięba, M., Tomczak, J.M., Gonczarek, A. (2015). RBM-SMOTE: Restricted Boltzmann Machines for Synthetic Minority Oversampling Technique. In: Nguyen, N., Trawiński, B., Kosala, R. (eds) Intelligent Information and Database Systems. ACIIDS 2015. Lecture Notes in Computer Science(), vol 9011. Springer, Cham. https://doi.org/10.1007/978-3-319-15702-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-15702-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15701-6
Online ISBN: 978-3-319-15702-3
eBook Packages: Computer ScienceComputer Science (R0)