Abstract
Software defect prediction (SDP) plays a key role in the timely delivery of good quality software product. In the early development phases, it predicts the error-prone modules which can cause heavy damage or even failure of software in the future. Hence, it allows the targeted testing of these faulty modules and reduces the total development cost of the software ensuring the high quality of end-product. Support vector machines (SVMs) are extensively being used for SDP. The condition of unequal count of faulty and non-faulty modules in the dataset is an obstruction to accuracy of SVMs. In this work, a novel filtering technique (FILTER) is proposed for effective defect prediction using SVMs. Support vector machine (SVM) based classifiers (linear, polynomial and radial basis function) are designed utilizing the proposed filtering technique over five datasets and their performances are evaluated. The proposed FILTER enhances the performance of SVM based SDP model by 16.73%, 16.80% and 7.65% in terms of accuracy, AUC and F-measure respectively.
Similar content being viewed by others
References
Afzal W, Torkar R, Feldt R (2012) Resampling methods in software quality classification. Int J Softw Eng Knowl Eng 22(2):203–223
Cai X, Niu Y, Geng S, Zhang J, Cui Z, Li J, Chen J (2019) An under-sampled software defect prediction method based on hybrid multi-objective cuckoo search. Concurr Comput Pract Exp 32:e5478
Chen L, Fang B, Shang Z et al (2018) Tackling class overlap and imbalance problems in software defect prediction. Softw Qual J 26:97–125. https://doi.org/10.1007/s11219-016-9342-6
Chen J, Nair V, Krishna R, Menzies T (2019) “Sampling” as a baseline optimizer for search-based software engineering. IEEE Trans Softw Eng 45(6):597–614. https://doi.org/10.1109/TSE.2018.2790925
Erturk E, Sezer EA (2015) A comparison of some soft computing methods for software fault prediction. Expert Syst Appl 42:1872–1879
Felix EA, Lee SP (2019) Systematic literature review of preprocessing techniques for imbalanced data. IET Softw 13(6):479–496
Goyal S (2020) Heterogeneous stacked ensemble classifier for software defect prediction. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC), Waknaghat, Solan, India, pp 126–130. https://doi.org/10.1109/PDGC50313.2020.9315754
Goyal S (2021a) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28:14. https://doi.org/10.1007/s10515-021-00285-y
Goyal S (2021b) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev. https://doi.org/10.1007/s10462-021-10044-w
Goyal S, Bhatia P (2020b) Comparison of machine learning techniques for software quality prediction. Int J Knowl Syst Sci (IJKSS) 11(2):20–40
Goyal S, Bhatia PK (2019) A non-linear technique for effective software effort estimation using multi-layer perceptrons. In: 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon), Faridabad, India, pp 1–4. https://doi.org/10.1109/COMITCon.2019.8862256
Goyal S, Bhatia PK (2020) Feature selection technique for effective software effort estimation using multi-layer perceptrons. In: Proceedings of ICETIT 2019. Lecture notes in electrical engineering, Springer, Cham, vol 605, pp 183–194. https://doi.org/10.1007/978-3-030-30577-2_15
Guo H, Li Y, Jennifer S, Gu M, Huang Y, Gong B (2017) Learning from class-imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
Hanley J, McNeil BJ (1982) The meaning and use of the area under a Receiver Operating Characteristic ROC curve. Radiology 143:29–36
Huda S, Liu K, Abdelrazek M, Ibrahim A, Alyahya S, Al-Dossari H, Ahmad S (2018) An ensemble oversampling model for class imbalance problem in software defect prediction. IEEE Access 6:24184–24195. https://doi.org/10.1109/access.2018.2817572
Jaiswal A, Malhotra R (2018) Software reliability prediction using machine learning techniques. Int J Syst Assur Eng Manag 9(1):230–244
Kaur P, Gossain A (2019) FF-SMOTE: a metaheuristic approach to combat class imbalance in binary classification. J Appl Artif Intell 33(5):420–439
Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using Least Square Support Vector Machine (LSSVM). J Syst Softw 137:686–712
Lehmann EL, Romano JP (2008) Testing statistical hypothesis: springer texts in Statistics. Springer, New York
Ma Y, Pan W, Zhu S, Yin H, Luo J (2014) An improved semi-supervised learning method for software defect prediction. J Intell Fuzzy Syst 27:2473–2480. https://doi.org/10.3233/IFS-141220
Malhotra R, Kamal S (2019) An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data. Neurocomputing 343(28):120–140. https://doi.org/10.1016/j.neucom.2018.04.090
Menzies T, DiStefano J, Orrego A, Chapman R (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 32(11):1–12
NASA (2015) https://www.nasa.gov/sites/default/files/files/Space_Math_VI_2015.pdf
Ozakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239. https://doi.org/10.1016/j.jss.2018.06.025
Rao KN, Reddy CS (2020) A novel under sampling strategy for efficient software defect analysis of skewed distributed data. Evol Syst 11:119–131. https://doi.org/10.1007/s12530-018-9261-9
Rathore S, Kumar S (2017) Towards an ensemble-based system for predicting the number of software faults. Expert Syst Appl 82:357–382
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5
Rong X, Li F, Cui Z (2016) A model for software defect prediction using support vector machine based on CBA. Int J Intell Syst Technol Appl 15(1):19–34
Ross SM (2005) Probability and statistics for engineers and scientists, 3rd edn. Elsevier Press, Amsterdam (ISBN: 81-8147-730-8)
Sayyad S, Menzies T (2005) The PROMISE repository of software engineering databases. University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository
Siers MJ, Islam MZ (2015) Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Inf Syst 51:62–71
Son LH, Pritam N, Khari M, Kumar R, Phuong PTM, Thong PH (2019) Empirical study of software defect prediction: a systematic mapping. Symmetry. https://doi.org/10.3390/sym11020212
Song Q, Guo Y, Shepperd M (2018) A comprehensive investigation of the role of imbalanced learning for software defect prediction. IEEE Trans Softw Eng. https://doi.org/10.1109/TSE.2018.2836442
Sun Z, Zhang J, Sun H, Zhu X (2020) Collaborative filtering based recommendation of sampling methods for software defect prediction. Appl Soft Comput 90:106–163
Thomas J (1976) McCabe, a complexity measure. IEEE Trans Softw Eng 2(4):308–320
Tsai CF, Lin WC, Hu YH, Yao GT (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Wang K, Liu L, Yuan C, Wang Z (2021) Software defect prediction model based on LASSO–SVM. Neural Comput Appl 33(14):8249–8259
Wu XD, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou ZH, Steinbach M, Hand DJ, Steinberg D (2007) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37. https://doi.org/10.1007/s10115-007-0114-2
Yang X, Lo D, Xia X, Sun J (2017) TLEL: a two-layer ensemble learning approach for just-in-time defect prediction. J Inf Softw Technol 87:206–220
Funding
No funding has been availed for this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Goyal, S. Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag 13, 681–696 (2022). https://doi.org/10.1007/s13198-021-01326-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-021-01326-1