Abstract
Several aspects may influence the performance achieved by a classifier created by a Machine Learning system. One of these aspects is related to the difference between the number of examples belonging to each class. When the difference is large, the learning system may have difficulties to learn the concept related to the minority class. In this work, we discuss some methods to decrease the number of examples belonging to the majority class, in order to improve the performance of the minority class. We also propose the use of the VDM metric in order to improve the performance of the classification techniques. Experimental application in a real world dataset confirms the efficiency of the proposed methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnard, E., Cole, R.A., Hou, L.: Location and Classification of Plosive Constants Using Expert Knowledge and Neural Nets Classifiers. Journal of the Acoustical Society of America 84(Supp. 1), 60 (1988)
Batista, G.E.A.P.A., Monard, M.C.: A Computational Environment to Measure Machine Learning Systems Performance. In: Proceedings I ENIA, pp. 41–45 (1997) (in Portuguese)
Blake, C., Keogh, E., Merz, C.J.: UCI Repository of Machine Learning Databases, Department of Information and Computer Science,University of California, Irvine, http://www.ics.uci.edu/mlearn/MLRepository.html
Chan, P.K., Stolfo, S.J.: Learning with Non-uniform Class and Cost Distributions: Effects and a Distributed Multi-Classifier Approach. In: KDD 1998 Workshop on Distributed Data Mining, pp. 1–9 (1998)
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning 10(1), 57–78 (1993)
Hart, P.E.: The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory IT-14, 515–516 (1968)
Holte, C.R.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11, 63–91 (1993)
Kubat, M., Matwin, S.: Addressing the Course of Imbalanced Training Sets: One- Sided Selection. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, pp. 179–186. Morgan Kaufmann, San Francisco (1997)
Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L.: Neural Network Classification and Prior Class Probabilities. In: Orr, G., Müller, K.R., Caruana, R. (eds.) Tricks of the trade, Lecture Notes in Computer Science State-of-the-art surveys, pp. 299–314. Springer, Heidelberg (1998)
Lewis, D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th International Conference on Machine Learning, ICML 1994, pp. 148–156. Morgan Kaufmann, San Francisco (1994)
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann Publishers, CA (1988)
Stanfill, C., Waltz, D.: Toward Memory-Based Reasoning. Communications of the ACM 29(12), 1213–1228 (1986)
Stolfo, S.J., Fan, D.W., Lee, W., Prodromidis, A.L., Chan, P.K.: Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results. In: Proc. AAAI 1997 Workshop on AI Methods in Fraud and Risk Management (1997)
Tomek, I.: Two Modifications of CNN. IEEE Transactions on Systems Man and Communications SMC-6, 769–772 (1976)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Batista, G.E.A.P.A., Carvalho, A.C.P.L.F., Monard, M.C. (2000). Applying One-Sided Selection to Unbalanced Datasets. In: Cairó, O., Sucar, L.E., Cantu, F.J. (eds) MICAI 2000: Advances in Artificial Intelligence. MICAI 2000. Lecture Notes in Computer Science(), vol 1793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10720076_29
Download citation
DOI: https://doi.org/10.1007/10720076_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67354-5
Online ISBN: 978-3-540-45562-2
eBook Packages: Springer Book Archive