Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Tyagi, Shivani; Mittal, Sangeeta

doi:10.1007/978-3-030-29407-6_17

Shivani Tyagi³⁹ &
Sangeeta Mittal³⁹

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 597))

4143 Accesses

Abstract

Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversampling minority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 22879; Price includes VAT (Japan)

Hardcover Book: JPY 28599; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Classification Accuracy Comparison for Imbalanced Datasets with Its Balanced Counterparts Obtained by Different Sampling Techniques

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Provost, F.: Machine learning from imbalanced data sets 101. Invited paper for the AAAI2000. Workshop on Imbalanced Data Sets, Menlo Park, CA (2000)
Google Scholar
Hanskunatai, A.: A new hybrid sampling approach for classification of imbalanced datasets. In: 3rd International Conference on Computer and Communication Systems (ICCCS), pp. 67–71. Nagoya (2018)
Google Scholar
Liu, X., Wu, J., Zhou, Z.: Exploratory undersampling for class-imbalance learning. In: in IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 2, pp. 539–550 (2009)
Google Scholar
Zhang, J., Cui, X., Li, J., Wang, R.: Imbalanced classification of mental workload using a cost-sensitive majority weighted minority oversampling strategy. Cogn. Technol. Work (2017)
Google Scholar
Practical Guide to deal with Imbalanced Classification Problems in R: https://www.analyticsvidhya.com/blog/2016/03/practical-guide-deal-imbalanced-classification-problems
Package unbalanced Documentation in R: https://cran.r-project.org/web/packages/unbalanced/unbalanced.pdf
Introduction to k-Nearest Neighbour Classification and Condensed Nearest Neighbour Data Reduction: http://www.math.le.ac.uk/people/ag153/homepage/KNN/OliverKNNTalk.pdf
Tomek, I.: Two modifications of cnn. IEEE Trans. Syst. Man Cybern. 6, 769–772 (1976)
MathSciNet MATH Google Scholar
Kubat, M., Matwin, S., et al.: Addressing the curse of imbalanced training sets: one-sided selection. In: Machine Learning-International Workshop Then Conference, pp. 179–186. Morgan Kaufmann Publishers, Inc (1997)
Google Scholar
Oversampling: https://en.wikipedia.org/wiki/Oversampling/
Wilson.D.: Asymptotic properties of nearest neighbor rules using edited data. In: IEEE Transactions on Systems, Man and Cybernetics, pp. 408–421 (1972)
Google Scholar
Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Artificial Intelligence in Medicine, pp. 63–66 (2001)
Google Scholar
Padmaja T.M., Dhulipalla N., Krishna P.R., Bapi R.S., Laha A.: An unbalanced data classification model using hybrid sampling technique for fraud detection. In: Lecture Notes in Computer Science, vol. 4815. Springer, Berlin, Heidelberg (2007)
Google Scholar
Smotefamily Package Documetation in R: https://cran.r-project.org/web/packages/smotefamily/smotefamily.pdf
Chawla, N.V., et al.: SMOTE: synthetic minority over-sampling technique. arXiv preprint arXiv:1106.1813 (2011)
Imabalnce Package Documetation in R: https://cran.r-project.org/web/packages/imbalance/imbalance.pdf
Barua, S., Islam, M.M., Yao, X., Murase, K.: Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans. Knowl. Data Eng. 26(2), 405425
Google Scholar
Gao, M., Hong, X., Chen, S., Harris, C.J., Khalaf, E.: PDFOS: Pdf estimation based oversampling for imbalanced two-class problems. Neurocomputing 138 (2014)
Google Scholar
Zhang, H., Li, M.: Rwo-sampling: a random walk over-sampling approach to imbalanced data classification. Info. Fusion 20, 99116
Google Scholar
He, H., Bai, Y., Garcia, E., Li, S.: ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In: Proceedings of IJCNN (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference. pp. 1322–1328 (2009)
Google Scholar
Das, B., Krishnan, N.C., Cook, D.J.: RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans. Knowl. Data Eng. 27(1), 222234
Google Scholar
Newman, D.J., Asuncion, A.: UCI machine learning repository. Transformed datasets are available at http://www.ulb.ac.be/di/map/adalpozz/imbalanced-datasets.zip(2007)
PbChem Bioassay Data, UCI machine learning repository datasets are available at: http://archive.ics.uci.edu/ml/datasets/PbChem+Bioassay+Data
Yeast DataSet, UCI machine learning repository datasets are available at: http://archive.ics.uci.edu/ml/datasets/Yeast?ref=datanews.io
Statlog, UCI machine learning repository datasets are available at: https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite)
Pima Indians Diabetes Database: https://www.kaggle.com/uciml/pima-indians-diabetes-database

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jaypee Institute of Information Technology Noida, Noida, U.P, India
Shivani Tyagi & Sangeeta Mittal

Authors

Shivani Tyagi
View author publications
You can also search for this author in PubMed Google Scholar
Sangeeta Mittal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shivani Tyagi .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jaypee University of Information Technology, Waknaghat, Himachal Pradesh, India
Pradeep Kumar Singh
Indian Institute of Technology Delhi, New Delhi, Delhi, India
Arpan Kumar Kar
Central University of Jammu, Jammu, Jammu and Kashmir, India
Yashwant Singh
Indian Institute of Technology Patna, Patna, Bihar, India
Maheshkumar H. Kolekar
Institute of Technology, Nirma University, Ahmedabad, Gujarat, India
Sudeep Tanwar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tyagi, S., Mittal, S. (2020). Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning. In: Singh, P., Kar, A., Singh, Y., Kolekar, M., Tanwar, S. (eds) Proceedings of ICRIC 2019 . Lecture Notes in Electrical Engineering, vol 597. Springer, Cham. https://doi.org/10.1007/978-3-030-29407-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-29407-6_17
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29406-9
Online ISBN: 978-3-030-29407-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Classification Accuracy Comparison for Imbalanced Datasets with Its Balanced Counterparts Obtained by Different Sampling Techniques

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Classification Accuracy Comparison for Imbalanced Datasets with Its Balanced Counterparts Obtained by Different Sampling Techniques

Experimental Analysis of Oversampling Techniques in Class Imbalance Problem

A Review of the Oversampling Techniques in Class Imbalance Problem

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation