Abstract
Software Defect Prediction (SDP) is a major research field in the software development life cycle. The accurate SDP would assist software developers and engineers in developing a reliable software product. Several machine learning techniques for SDP have been reported in the literature. Most of these studies suffered in terms of prediction accuracy and other performance metrics. Many of these studies focus only on accuracy and this is not enough in measuring the performance of SDP. In this research, we propose a seven-ensemble machine learning model for SDP. The Cat boost, Light Gradient Boosting Machine (LGBM), Extreme Gradient Boosting (XgBoost), boosted cat boost, bagged logistic regression, boosted LGBM, and boosted XgBoost were used for the experimental analysis. We also used the separate individual base model of logistic regression for the analysis on six datasets. This paper extends the performance metrics from only the accuracy, the Area Under Curve (AUC), precision, recall, F-measure, and Matthew Correlation Coefficient (MCC) were used as performance metrics. The results obtained showed that the proposed ensemble Cat boost model gave an outstanding performance for all the three defects datasets as a result of being able to decrease overfitting and reduce the training time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alsawalqah, H., et al.: Software defect prediction using heterogeneous ensemble classification based on segmented patterns. Appl. Sci. 10(5), 1745 (2020)
Bhattacharya, P., et al.: Graph-based analysis and prediction for software evolution, pp. 419–429
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction, pp. 181–190
Abaei, G., Selamat, A.: A survey on software fault detection based on different prediction approaches. Vietnam J. Comput. Sci. 1(2), 79–95 (2014)
Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, Austin, Texas, pp. 297–308 (2016)
Hall, T., et al.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Software Eng. 38(6), 1276–1304 (2012)
Menzies, T., et al.: Defect prediction from static code features: current results, limitations, new approaches. Automated Softw. Eng. 17(4), 375–407 (2010)
Li, Z., Reformat, M.: A practical method for the software fault-prediction, pp. 659–666
Vandecruys, O., et al.: Mining software repositories for comprehensible software fault prediction models. J. Syst. Softw. 81(5), 823–839 (2008)
Mendes-Moreira, J., et al.: Ensemble approaches for regression: a survey. ACM Comput. Surv. 45(1), (2012). Article 10
Rathore, S.S., Kuamr, S.: Comparative analysis of neural network and genetic programming for number of software faults prediction, pp. 328–332
Rathore, S.S., Kumar, S.: Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems. Knowl. Based Syst. 119, 232–256, (2017)
Shatnawi, R., Li, W.: The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. J. Syst. Softw. 81(11), 1868–1882 (2008)
Bowes, D., Hall, T., Petrić, J.: Software defect prediction: do different classifiers find the same defects? Software Qual. J. 26(2), 525–552 (2017). https://doi.org/10.1007/s11219-016-9353-3
Rawat, M., Dubey, S.: Software defect prediction models for quality improvement: a literature study. Int. J. Comput. Sci. Issues 9, 288–296 (2012)
Singh, P.D., Chug, A.: Software defect prediction analysis using machine learning algorithms, pp. 775–781
Ge, J., Liu, J., Liu, W.: Comparative study on defect prediction algorithms of supervised learning software based on imbalanced classification data sets, pp. 399–406
Song, Q., Guo, Y., Shepperd, M.: A Comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans. Softw. Eng. 45(12), 1253–1269 (2019)
Chang, R., Mu, X., Zhang, L.: Software defect prediction using non-negative matrix factorization. JSW 6, 2114–2120 (2011)
Wahono, R., Suryana, N., Ahmad, S.: Metaheuristic optimization based feature selection for software defect prediction. J. Softw. 9, 1324–1333 (2014)
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. J. Syst. Softw. 81(5), 649–660 (2008)
Gray, D., et al.: Using the support vector machine as a classification method for software defect prediction with static code metrics, pp. 223–234
Gong, L., et al.: Empirical evaluation of the impact of class overlap on software defect prediction, pp. 698–709
Mabayoje, M., et al.: Parameter tuning in KNN for software defect prediction: an empirical analysis. Jurnal Teknologi dan Sistem Komputer 7, 121–126 (2019)
Tong-Seng, Q., Mie Mie Thet, T.: Application of neural networks for software quality prediction using object-oriented metrics, pp. 116–125
Thwin, M.M.T., Quah, T.-S.: Application of neural networks for software quality prediction using object-oriented metrics. J. Syst. Softw. 76(2), 147–156 (2005)
Zhang, H., Zhang, X.: Comments on “data mining static code attributes to learn defect predictors.” IEEE Trans. Softw. Eng. 33(9), 635–637 (2007)
Mori, T., Uchihira, N.: Balancing the trade-off between accuracy and interpretability in software defect prediction. Empir. Softw. Eng. 24(2), 779–825 (2018). https://doi.org/10.1007/s10664-018-9638-1
Ramler, R., et al.: Key questions in building defect prediction models in practice, pp. 14–27
Gayatri, N., Savarimuthu, N., Reddy, A.: Feature selection using decision tree induction in class level metrics dataset for software defect predictions, Lecture Notes in Engineering and Computer Science, vol. 1 (2010)
Pelayo, L., Dick, S.: Applying novel resampling strategies to software defect prediction, pp. 69–72
Czibula, G., Marian, Z., Czibula, I.G.: Software defect prediction using relational association rule mining. Inf. Sci. 264, 260–278 (2014)
Catal, C., Diri, B.: Software fault prediction with object-oriented metrics based artificial immune recognition system, pp. 300–314
Aida, E., Nima Karimpour, D.: CBM-Of-TRaCE: an ontology-driven framework for the improvement of business service traceability, consistency management and reusability. Int. J. Soft Comput. Softw. Eng. [JSCSE], pp. 69–78
Moustafa, S., et al.: Software bug prediction using weighted majority voting techniques. Alexandria Eng. J. 57(4), 2763–2774 (2018)
Mousavi, R., Eftekhari, M., Rahdari, F.: Omni-ensemble learning (OEL): utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. Int. J. Artif. Intell. Tools 27(06), 1850024 (2018)
Tanwar, H., Kakkar, M.: A review of software defect prediction models. In: Proceedings of ICDMAI 2018, vol. 1, pp. 89–97 (2019)
Ibrahim, D.R., Ghnemat, R., Hudaib, A.: Software defect prediction using feature selection and random forest algorithm, pp. 252–257
Cai, X., et al.: An under-sampled software defect prediction method based on hybrid multi-objective cuckoo search. Concurr. Comput. Pract. Exp. 32(5), e5478 (2020)
Jayanthi, R., Florence, L.: Software defect prediction techniques using metrics based on neural network classifier. Clust. Comput. 22(1), 77–88 (2018). https://doi.org/10.1007/s10586-018-1730-1
Manjula, C., Florence, L.: Deep neural network based hybrid approach for software defect prediction using software metrics. Clust. Comput. 22(4), 9847–9863 (2018). https://doi.org/10.1007/s10586-018-1696-z
Challagulla, V.U.B., et al.: Empirical assessment of machine learning based software defect prediction techniques. Int. J. Artif. Intell. Tools 17(02), 389–400 (2008)
Rong, X., Li, F., Cui, Z.: A model for software defect prediction using support vector machine based on CBA. Int. J. Intell. Syst. Technol. Appl. 15(1), 19–34 (2016)
Magal. K.R., Jacob, S.: Improved random forest algorithm for software defect prediction through data mining techniques. Int. J. Comput. Appl. 117, 18–22 (2015)
Aquil, M.A.I., Wan Ishak, W.H.: Predicting software defects using machine learning techniques. Int. J. Adv. Trends Comput. Sci. Eng. 9, 6609 (2020)
Aljamaan, H., Alazba, A.: Software defect prediction using tree-based ensembles. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 1–10. Association for Computing Machinery (2020)
Shepperd, M., et al.: Data quality: some comments on the NASA software defect datasets. IEEE Trans. Softw. Eng. 39(9), 1208–1215 (2013)
Deng, K., et al.: A remaining useful life prediction method with long-short term feature processing for aircraft engines. Appl. Soft Comput. 93, 106344 (2020)
Dorogush, A., Ershov, V., Gulin, A.: CatBoost: gradient boosting with categorical features support (2018)
Kavitha, G., Elango, N.M.: An approach to feature selection in intrusion detection systems using machine learning algorithms. Int. J. e-Collaboration (IJeC) 16(4), 48–58 (2020)
Peng, C.-Y.J., Lee, K.L., Ingersoll, G.M.: An introduction to logistic regression analysis and reporting. J. Educ. Res. 96(1), 3–14 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Saheed, Y.K., Longe, O., Baba, U.A., Rakshit, S., Vajjhala, N.R. (2021). An Ensemble Learning Approach for Software Defect Prediction in Developing Quality Software Product. In: Singh, M., Tyagi, V., Gupta, P.K., Flusser, J., Ören, T., Sonawane, V.R. (eds) Advances in Computing and Data Sciences. ICACDS 2021. Communications in Computer and Information Science, vol 1440. Springer, Cham. https://doi.org/10.1007/978-3-030-81462-5_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-81462-5_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81461-8
Online ISBN: 978-3-030-81462-5
eBook Packages: Computer ScienceComputer Science (R0)