Abstract
An effective software fault prediction (SFP) model could help developers in the quick and prompt detection of faults and thus help enhance the overall reliability and quality of the software project. Variations in the prediction performance of learning techniques for different software systems make it difficult to select a suitable learning technique for fault prediction modeling. The evaluation of previously presented SFP approaches has shown that single machine learning-based models failed to provide the best accuracy in any context, highlighting the need to use multiple techniques to build the SFP model. To solve this problem, we present and discuss a software fault prediction approach based on selecting the most appropriate learning techniques from a set of competitive and accurate learning techniques for building a fault prediction model. In work, we apply the discussed SFP approach for the five Eclipse project datasets and nine Object-oriented (OO) project datasets and report the findings of the experimental study. We have used different performance measures, i.e., AUC, accuracy, sensitivity, and specificity, to assess the discussed approach’s performance. Further, we have performed a cost-benefit analysis to evaluate the economic viability of the approach. Results showed that the presented approach predicted the software’s faults effectively for the used accuracy, AUC, sensitivity, and specificity measures with the highest achieved values of 0.816, 0.835, 0.98, and 0.903 for AUC, accuracy, sensitivity, and specificity, respectively. The cost-benefit analysis of the approach showed that it could help reduce the overall software testing cost.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Krasner H (2018) The cost of poor quality software in the us: A 2018 report. Consortium for IT Software Quality, Tech. Rep, 10
IDC (2020) Analyze the future
Nukala S, Rau V (2018) Why sre documents matter. Queue 16(4):66–91
Raychev V, Vechev M, Krause A (2015) Predicting program properties from” big code”. ACM SIGPLAN Not 50(1):111–124
Steve Zdancewic (2018) Technical perspective: Building bug-free compilers. Commun ACM 61 (2):83–83
Chatterjee S, Maji B (2018) A bayesian belief network based model for predicting software faults in early phase of software development process. Appl Intell 48(8):2214–2228
Lanza M, Mocci A, Ponzanelli L (2016) The tragedy of defect prediction, prince of empirical software engineering research. IEEE Softw 33(6):102–105
Qiao L, Li X, Umer Q, Guo P (2020) Deep learning based software defect prediction. Neurocomputing 385:100–110
Xiaomeng L, Stambaugh RF, Yu Y (2017) Anomalies abroad Beyond data mining. Technical report, National Bureau of Economic Research
Kalaivani N, Beena R (2018) Overview of software defect prediction using machine learning algorithms. Int J Pure Appl Math 118(20):3863–3873
Arar ÖF, Ayan K (2017) A feature dependent naive bayes approach and its application to the software defect prediction problem. Appl Soft Comput 59:197–209
Hammouri A, Hammad M, Mohammad A, Fatima A (2018) Software bug prediction using machine learning approach. Int J Adv Comput Sci Appl 9(2):78–83
Kumar Lx, Sripada SK, Ashish S, Santanu KR (2018) Effective fault prediction model developed using least square support vector machine (lssvm). J Syst Softw 137:686–712
Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electric Eng 67:15–24
Begum M, Dohi T, et al. (2017) A neuro-based software fault prediction with box-cox power transformation. J Softw Eng Appl 10(03):288
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327
Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518
Li Z, Jing X-Y, Zhu X (2018) Progress on approaches to software defect prediction. IET Softw 12(3):161–175
Yucalar F, Ozcift A, Emin B, Deniz K (2020) Multiple-classifiers in software quality engineering: Combining predictors to improve software fault prediction ability. Eng Sci Technol Int J 23(4):938–950
Yang X, Lo D, Xia X, Tlel JS (2017) A two-layer ensemble learning approach for just-in-time defect prediction. Inf Softw Technol 87:206–220
Pandey SK, Mishra RB, Tripathi AK (2020) Bpdet: An effective software bug prediction model using deep representation and ensemble learning techniques. Expert Syst Appl 144:113085
Xu Z, Liu J, Luo X, Yang Z, Zhang Y, Yuan P, Tang Y, Zhang T (2019) Software defect prediction based on kernel pca and weighted extreme learning machine. Inf Softw Technol 106:182–200
Nucci Dario Di, Palomba Fabio, Rocco Oliveto, Andrea De Lucia (2017) Dynamic selection of classifiers in bug prediction: An adaptive method. IEEE Trans Emerg Top Comput Intell 1(3):202–212
Mousavi R, Eftekhari M, Rahdari F (2018) Omni-ensemble learning (oel): utilizing over-bagging, static and dynamic ensemble selection approaches for software defect prediction. Int J Artif Intell Tools 27 (06):1850024
Pecorelli F, Di Nucci D (2021) Adaptive selection of classifiers for bug prediction A large-scale empirical analysis of its performances and a benchmark study. Science of Computer Programming 102611
Mousavi R, Eftekhari M (2015) A new ensemble learning methodology based on hybridization of classifier ensemble selection approaches. Appl Soft Comput 37:652–666
Li N, Shepperd M, Guo Y (2020) A systematic review of unsupervised learning techniques for software defect prediction. Inf Softw Technol 122:106287
Bowes D, Hall T, Petrić J (2018) Software defect prediction: do different classifiers find the same defects? Softw Qual J 26(2):525–552
Malhotra R, Khanna M (2019) Dynamic selection of fitness function for software change prediction using particle swarm optimization. Inf Softw Technol 112:51–67
Sadowski C, Aftandilian E, Eagle A, Miller-Cushon L, Jaspan C (2018) Lessons from building static analysis tools at google. Commun ACM 61(4):58–66
Basirati MR, Otasevic M, Rajavi K, Böhm M, Krcmar H (2020) Understanding the relationship of conflict and success in software development projects. Inf Softw Technol 126:106331
Son LH, Pritam N, Khari M, Kumar R, Phuong PTM, Thong PH, et al. (2019) Empirical study of software defect prediction: a systematic mapping. Symmetry 11(2):212
Rathore SS, Kumar S (2020) An empirical study of ensemble techniques for software fault prediction. Appl Intell 1– 30
Yohannese CW, Li T, Simfukwe M, Khurshid F (2017) Ensembles based combined learning for improved software fault prediction: A comparative study. In: 2017 12th International conference on intelligent systems and knowledge engineering (ISKE), pp 1–6. IEEE
Alsaeedi A, Khan MZ (2019) Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. J Softw Eng Appl 12(5):85–100
Cruz RM, Sabourin R, Cavalcanti GDC (2018) Dynamic classifier selection: Recent advances and perspectives. Inf Fusion 41:195–216
Merz CJ (1996) Dynamical selection of learning algorithms. In: Learning from data. Springer, pp 281–290
Britto AS Jr, Sabourin R, Oliveira LES (2014) Dynamic selection of classifiers—a comprehensive review. Pattern Recogn 47(11):3665–3680
Souza MA, Cavalcanti GDC, Cruz RMO, Sabourin R (2019) Online local pool generation for dynamic classifier selection. Pattern Recogn 85:132–148
Brun AL, Britto AS Jr, Oliveira LS, Enembreck F, Sabourin R (2018) A framework for dynamic classifier selection oriented by the classification problem difficulty. Pattern Recogn 76:175– 190
García S, Zhang ZL, Altalhi A, Alshomrani S, Herrera F (2018) Dynamic ensemble selection for multi-class imbalanced datasets. Inf Sci 445:22–37
Cruz RMO, Sabourin R, Cavalcanti GDC (2018) Prototype selection for dynamic classifier and ensemble selection. Neural Comput Applic 29(2):447–457
Mohandes M, Deriche M, Aliyu SO (2018) Classifiers combination techniques: A comprehensive review. IEEE Access 6:19626–19639
Sagi O, Lior R (2018) Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8(4):e1249
Bühlmann P (2012) Bagging, boosting and ensemble methods. In: Handbook of computational statistics. Springer, pp 985–1022
Xia Y, Ke C, Yang Y (2020) Multi-label classification with weighted classifier selection and stacked ensemble. Information Sciences
Krawczyk B, Galar M, Woźniak M, Bustince H, Herrera F (2018) Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recogn 83:34–51
Mendes-Moreira J, Jorge AM, Soares C, de Sousa JF (2009) Ensemble learning: A study on different variants of the dynamic selection approach. In: International workshop on machine learning and data mining in pattern recognition. Springer, pp 191–205
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Kotsiantis SB, Zaharakis I, Pintelas P (2007) Supervised machine learning: A review of classification techniques. Emerg Artif Intell Appl Comput Eng 160(1):3–24
D’Ambros M, Lanza M, Robbes R (2010) An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working conference on mining software repositories (MSR 2010), pp 31–41, IEEE
He P, Li B, Liu X, Chen J, Ma Y (2015) An empirical study on software defect prediction with a simplified metric set. Inf Softw Technol 59:170–190
Jureczko M, Madeyski L (2010) Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th international conference on predictive models in software engineering, pp 1–10
Wagner S (2006) A literature survey of the quality economics of defect-detection techniques. In: Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering, pp 194–203
Kumar L, Misra S, Rath SK (2017) An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Comput Stand Int 53:1– 32
Software quality in 2010: a survey of the state of the art
Wilde N, Huitt R (1992) Maintenance support for object-oriented programs. IEEE Trans Softw Eng 18(12):1038
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this study, we have used python programming libraries, Sklearn and imbalance to implement different used base learners and the presented approach. Following parameter values have been set for these learning techniques and presented approach.
Techniques | Parameters values |
---|---|
Naïve Bayes | Priors= None, var_smoothing= 1e-9, epsilon= absolute additive value to variances, sigma= variance of each feature per class, theta= mean of each feature per class. |
Logistic Regression | penalty= l2, dual= False, tol= 0.0001, C = 1.0, fit_intercept= True, intercept_scaling= 1, class_weight= None, random_state= None, solver= ‘lbfgs’, max_iter= 500, multi_class= ‘auto’, verbose= 0, warm_start= False, n_jobs= None, l1_ratio= None |
K-nearest Neighbor | n_neighbors= 5, weights= ‘uniform’, algorithm= ‘auto’, leaf_size= 30, power parameter= 2, metric=‘Euclidean’, metric_params= None, n_jobs= None |
Decision Tree | criterion=‘gini’, splitter= ‘best’, max_depth= None, min_samples_split= 2, min_samples_leaf= 1, min_weight_fraction_leaf= 0.0, max_features= None, random_state= None, max_leaf_nodes= None, min_impurity_decrease= 0.0, min_impurity_split= None, class_weight= None, ccp_alpha= 0.0 |
Support Vector Machine | C = 1.0, kernel= ‘rbf’, degree= 3, gamma= ‘scale’, coef0 = 0.0, shrinking= True, probability= False, tol= 0.001, cache_size= 200, class_weight= None, verbose= False, max_iter= -1, decision_function_shape= ‘ovr’, break_ties= False, random_state= None |
Techniques | Parameters values |
---|---|
Multilayer Perceptron | hidden_layer_sizes= 100, activation= ‘relu’, solver= ‘adam’, alpha= 0.0001, batch_size= ‘auto’, learning_rate= ‘constant’, learning_rate_init= 0.001, power_t = 0.5, max_iter= 200, shuffle= True, random_state= None, tol= 0.0001, verbose= False, warm_start= False, momentum= 0.9, nesterovs_momentum= True, early_stopping= False, validation_fraction= 0.1, beta_1 = 0.9, beta_2 = 0.999, epsilon= 1e-08, n_iter_no_change= 10, max_fun= 15000 |
SMOTE | sampling_strategy= ‘auto’, random_state= None, k_neighbors= 5, n_jobs= None |
X-mean clustering (Weka) | binValue= 1.0, cutOffFactor= 0.5, debugLevel= 0, distanceF= Euclidian Distance-R first-last, maxIterations= 100, maxKMeans= 1000, maxKMeansForChildren= 1000, maxNumClusters= 10, minNumClusters= 2, seed= 10, and useKDTree= false |
Rights and permissions
About this article
Cite this article
Rathore, S.S., Kumar, S. Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study. Appl Intell 51, 8945–8960 (2021). https://doi.org/10.1007/s10489-021-02346-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02346-x