An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product | SpringerLink
Skip to main content

An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2021)

Abstract

Software Defect Prediction (SDP) is a major research field in the software development life cycle. The accurate SDP would assist software developers and engineers in developing a reliable software product. Several machine learning techniques for SDP have been reported in the literature. Most of these studies suffered in terms of prediction accuracy and other performance metrics. Many of these studies focus only on accuracy and this is not enough in measuring the performance of SDP. In this research, we propose a seven-ensemble machine learning model for SDP. The Cat boost, Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XgBoost), boosted cat boost, bagged logistic regression, boosted LGBM, and boosted XgBoost were used for the experimental analysis. We also used the separate individual base model of logistic regression for the analysis on six datasets. This paper extends the performance metrics from only the accuracy, the Area Under Curve (AUC), precision, recall, F-measure, and Matthew Correlation Coefficient (MCC) were used as performance metrics. The results obtained showed that the proposed ensemble Cat boost model gave an outstanding performance for all the three defects datasets as a result of being able to decrease overfitting and reduce the training time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 13727
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 17159
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alsawalqah, H., et al.: Software defect prediction using heterogeneous ensemble classification based on segmented patterns. Appl. Sci. 10(5), 1745 (2020)

    Article  Google Scholar 

  2. Bhattacharya, P., et al.: Graph-based analysis and prediction for software evolution, pp. 419–429

    Google Scholar 

  3. Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, pp. 181–190

    Google Scholar 

  4. Abaei, G., Selamat, A.: A survey on software fault detection based on different prediction approaches. Vietnam J. Comput. Sci. 1(2), 79–95 (2014)

    Google Scholar 

  5. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, Austin, Texas, pp. 297–308 (2016)

    Google Scholar 

  6. Hall, T., et al.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Software Eng. 38(6), 1276–1304 (2012)

    Article  Google Scholar 

  7. Menzies, T., et al.: Defect prediction from static code features: current results, limitations, new approaches. Automated Softw. Eng. 17(4), 375–407 (2010)

    Google Scholar 

  8. Li, Z., Reformat, M.: A practical method for the software fault-prediction, pp. 659–666

    Google Scholar 

  9. Vandecruys, O., et al.: Mining software repositories for comprehensible software fault prediction models. J. Syst. Softw. 81(5), 823–839 (2008)

    Google Scholar 

  10. Mendes-Moreira, J., et al.: Ensemble approaches for regression: a survey. ACM Comput. Surv. 45(1), (2012). Article 10

    Google Scholar 

  11. Rathore, S.S., Kuamr, S.: Comparative analysis of neural network and genetic programming for number of software faults prediction, pp. 328–332

    Google Scholar 

  12. Rathore, S.S., Kumar, S.: Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl. Based Syst. 119, 232–256, (2017)

    Google Scholar 

  13. Shatnawi, R., Li, W.: The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J. Syst. Softw. 81(11), 1868–1882 (2008)

    Google Scholar 

  14. Bowes, D., Hall, T., Petrić, J.: Software defect prediction: do different classifiers find the same defects? Software Qual. J. 26(2), 525–552 (2017). https://doi.org/10.1007/s11219-016-9353-3

    Article  Google Scholar 

  15. Rawat, M., Dubey, S.: Software defect prediction models for quality improvement: a literature study. Int. J. Comput. Sci. Issues 9, 288–296 (2012)

    Google Scholar 

  16. Singh, P.D., Chug, A.: Software defect prediction analysis using machine learning algorithms, pp. 775–781

    Google Scholar 

  17. Ge, J., Liu, J., Liu, W.: Comparative study on defect prediction algorithms of supervised learning software based on imbalanced classification data sets, pp. 399–406

    Google Scholar 

  18. Song, Q., Guo, Y., Shepperd, M.: A Comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45(12), 1253–1269 (2019)

    Article  Google Scholar 

  19. Chang, R., Mu, X., Zhang, L.: Software defect prediction using non-negative matrix factorization. JSW 6, 2114–2120 (2011)

    Google Scholar 

  20. Wahono, R., Suryana, N., Ahmad, S.: Metaheuristic optimization based feature selection for software defect prediction. J. Softw. 9, 1324–1333 (2014)

    Google Scholar 

  21. Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)

    Google Scholar 

  22. Gray, D., et al.: Using the support vector machine as a classification method for software defect prediction with static code metrics, pp. 223–234

    Google Scholar 

  23. Gong, L., et al.: Empirical evaluation of the impact of class overlap on software defect prediction, pp. 698–709

    Google Scholar 

  24. Mabayoje, M., et al.: Parameter tuning in KNN for software defect prediction: an empirical analysis. Jurnal Teknologi dan Sistem Komputer 7, 121–126 (2019)

    Google Scholar 

  25. Tong-Seng, Q., Mie Mie Thet, T.: Application of neural networks for software quality prediction using object-oriented metrics, pp. 116–125

    Google Scholar 

  26. Thwin, M.M.T., Quah, T.-S.: Application of neural networks for software quality prediction using object-oriented metrics. J. Syst. Softw. 76(2), 147–156 (2005)

    Google Scholar 

  27. Zhang, H., Zhang, X.: Comments on “data mining static code attributes to learn defect predictors.” IEEE Trans. Softw. Eng. 33(9), 635–637 (2007)

    Article  Google Scholar 

  28. Mori, T., Uchihira, N.: Balancing the trade-off between accuracy and interpretability in software defect prediction. Empir. Softw. Eng. 24(2), 779–825 (2018). https://doi.org/10.1007/s10664-018-9638-1

    Article  Google Scholar 

  29. Ramler, R., et al.: Key questions in building defect prediction models in practice, pp. 14–27

    Google Scholar 

  30. Gayatri, N., Savarimuthu, N., Reddy, A.: Feature selection using decision tree induction in class level metrics dataset for software defect predictions, Lecture Notes in Engineering and Computer Science, vol. 1 (2010)

    Google Scholar 

  31. Pelayo, L., Dick, S.: Applying novel resampling strategies to software defect prediction, pp. 69–72

    Google Scholar 

  32. Czibula, G., Marian, Z., Czibula, I.G.: Software defect prediction using relational association rule mining. Inf. Sci. 264, 260–278 (2014)

    Google Scholar 

  33. Catal, C., Diri, B.: Software fault prediction with object-oriented metrics based artificial immune recognition system, pp. 300–314

    Google Scholar 

  34. Aida, E., Nima Karimpour, D.: CBM-Of-TRaCE: an ontology-driven framework for the improvement of business service traceability, consistency management and reusability. Int. J. Soft Comput. Softw. Eng. [JSCSE], pp. 69–78

    Google Scholar 

  35. Moustafa, S., et al.: Software bug prediction using weighted majority voting techniques. Alexandria Eng. J. 57(4), 2763–2774 (2018)

    Google Scholar 

  36. Mousavi, R., Eftekhari, M., Rahdari, F.: Omni-ensemble learning (OEL): utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. Int. J. Artif. Intell. Tools 27(06), 1850024 (2018)

    Google Scholar 

  37. Tanwar, H., Kakkar, M.: A review of software defect prediction models. In: Proceedings of ICDMAI 2018, vol. 1, pp. 89–97 (2019)

    Google Scholar 

  38. Ibrahim, D.R., Ghnemat, R., Hudaib, A.: Software defect prediction using feature selection and random forest algorithm, pp. 252–257

    Google Scholar 

  39. Cai, X., et al.: An under-sampled software defect prediction method based on hybrid multi-objective cuckoo search. Concurr. Comput. Pract. Exp. 32(5), e5478 (2020)

    Google Scholar 

  40. Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Clust. Comput. 22(1), 77–88 (2018). https://doi.org/10.1007/s10586-018-1730-1

    Article  Google Scholar 

  41. Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Clust. Comput. 22(4), 9847–9863 (2018). https://doi.org/10.1007/s10586-018-1696-z

    Article  Google Scholar 

  42. Challagulla, V.U.B., et al.: Empirical assessment of machine learning based software defect prediction techniques. Int. J. Artif. Intell. Tools 17(02), 389–400 (2008)

    Google Scholar 

  43. Rong, X., Li, F., Cui, Z.: A model for software defect prediction using support vector machine based on CBA. Int. J. Intell. Syst. Technol. Appl. 15(1), 19–34 (2016)

    Google Scholar 

  44. Magal. K.R., Jacob, S.: Improved random forest algorithm for software defect prediction through data mining techniques. Int. J. Comput. Appl. 117, 18–22 (2015)

    Google Scholar 

  45. Aquil, M.A.I., Wan Ishak, W.H.: Predicting software defects using machine learning techniques. Int. J. Adv. Trends Comput. Sci. Eng. 9, 6609 (2020)

    Google Scholar 

  46. Aljamaan, H., Alazba, A.: Software defect prediction using tree-based ensembles. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10. Association for Computing Machinery (2020)

    Google Scholar 

  47. Shepperd, M., et al.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39(9), 1208–1215 (2013)

    Article  Google Scholar 

  48. Deng, K., et al.: A remaining useful life prediction method with long-short term feature processing for aircraft engines. Appl. Soft Comput. 93, 106344 (2020)

    Google Scholar 

  49. Dorogush, A., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support (2018)

    Google Scholar 

  50. Kavitha, G., Elango, N.M.: An approach to feature selection in intrusion detection systems using machine learning algorithms. Int. J. e-Collaboration (IJeC) 16(4), 48–58 (2020)

    Article  Google Scholar 

  51. Peng, C.-Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Narasimha Rao Vajjhala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saheed, Y.K., Longe, O., Baba, U.A., Rakshit, S., Vajjhala, N.R. (2021). An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds) Advances in Computing and Data Sciences. ICACDS 2021. Communications in Computer and Information Science, vol 1440. Springer, Cham. https://doi.org/10.1007/978-3-030-81462-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-81462-5_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-81461-8

  • Online ISBN: 978-3-030-81462-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics