Genetic Algorithms Based Resampling for the Classification of Unbalanced Datasets | SpringerLink
Skip to main content

Genetic Algorithms Based Resampling for the Classification of Unbalanced Datasets

  • Conference paper
  • First Online:
Intelligent Decision Technologies 2017 (IDT 2017)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 73))

Included in the following conference series:

Abstract

In the paper a resampling approach for unbalanced datasets classification is proposed. The method suitably combines undersampling and oversampling by means of genetic algorithms according to a set of criteria and determines the optimal unbalance rate. The method has been tested on industrial and literature datasets. The achieved results put into evidence a sensible increase of the rare patterns detection rate and an improvement of the classification performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
JPY 21449
Price includes VAT (Japan)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Borselli, A., Colla, V., Vannucci, M., Veroli, M.: A fuzzy inference system applied to defect detection in flat steel production. In: 2010 World Congress on Computational Intelligence, Barcelona, Spain, 18–23 July 2010, pp. 148–153 (2010)

    Google Scholar 

  2. Stepenosky, N., Polikar, R., Kounios, J., Clark, C.: Ensemble techniques with weighted combination rules for early diagnosis of alzheimer’s disease. In: International Joint Conference on Neural Networks, IJCNN 2006 (2006)

    Google Scholar 

  3. Vannucci, M., Colla, V., Nastasi, G., Matarese, N.: Detection of rare events within industrial datasets by means of data resampling and specific algorithms. Int. J. Simul. Syst. Sci. Technol. 11(3), 1–11 (2010)

    Google Scholar 

  4. García-Pedrajas, N., Ortiz-Boyer, D., García-Pedrajas, M.D., Fyfe, C.: Class imbalance methods for translation initiation site recognition. In: Proceedings of Trends in Applied Intelligent Systems, IEA/AIE 2010, Part I, pp. 327–336 (2010)

    Google Scholar 

  5. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  6. Estabrooks, A., Japkowicz, N.: A multiple resampling method for learning from imbalanced datasets. Comput. Intell. 20(1), 18–36 (2004)

    Article  Google Scholar 

  7. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: AdaCost: misclassification cost-sensitive boosting. In: Proceedings of the 16th International Conference on Machine Learning, ICML 1999, pp. 97–105 (1999)

    Google Scholar 

  8. Scholkopf, B., et al.: New support vector algorithms. Neural Comput. 12, 1207–1245 (2000)

    Article  Google Scholar 

  9. Vannucci, M., Colla, V.: Novel classification method for sensitive problems and uneven datasets based on neural networks and fuzzy logic. Appl. Soft Comput. 11(2), 2383–2390 (2011)

    Article  Google Scholar 

  10. Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded neural networks for sensitive industrial classification tasks. In: Cabestany, J., Sandoval, F., Prieto, A., Corchado, J.M. (eds.) IWANN 2009. LNCS, vol. 5517, pp. 1320–1327. Springer, Heidelberg (2009)

    Google Scholar 

  11. Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: one-sided selection. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 179–186 (1997)

    Google Scholar 

  12. Laurikkala, J.: Improving identification of difficult small classes by balancing class distribution. In: Proceedings of Artificial Intelligence in Medicine: 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63–66 (2001)

    Google Scholar 

  13. Japkowicz, N.: The class imbalance problem: significance and strategies. In: International Conference on Artificial Intelligence, Las Vegas, Nevada, pp. 111–117 (2000)

    Google Scholar 

  14. Ling, C.X., Yang, Q., Wang, J., Zhang, S.: Decision trees with minimal costs. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 69–76 (2004)

    Google Scholar 

  15. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004)

    Article  Google Scholar 

  16. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  17. Cateni, S., Colla, V., Vannucci, M.: A method for resampling imbalanced datasets in binary classification tasks for real-world problems. Neurocomputing 135, 32–41 (2014)

    Article  Google Scholar 

  18. Vannucci, M., Colla, V.: Smart under-sampling for the detection of rare patterns in unbalanced datasets. Smart Innov. Syst. Technol. 56, 395–404 (2016)

    Article  Google Scholar 

  19. Lichman, M.: UCI ML Repository. School of Information and Computer Science, University of California, Irvine (2013). http://archive.ics.uci.edu/ml

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Vannucci .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Vannucci, M., Colla, V. (2018). Genetic Algorithms Based Resampling for the Classification of Unbalanced Datasets. In: Czarnowski, I., Howlett, R., Jain, L. (eds) Intelligent Decision Technologies 2017. IDT 2017. Smart Innovation, Systems and Technologies, vol 73. Springer, Cham. https://doi.org/10.1007/978-3-319-59424-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59424-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59423-1

  • Online ISBN: 978-3-319-59424-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics