Abstract
Software testing is a fundamental activity in the software development process aimed to determine the quality of software. To reduce the effort and cost of this process, defect prediction methods can be used to determine fault-prone software modules through software metrics to focus testing activities on them. Because of model interpretation and easily used by programmers and testers some recent studies presented classification rules to make prediction models. This study presents a rule-based prediction approach based on kernel k-means clustering algorithm and Distance based Multi-objective Particle Swarm Optimization (DSMOPSO). Because of discrete search space, we modified this algorithm and named it DSMOPSO-D. We prevent best global rules to dominate local rules by dividing the search space with kernel k-means algorithm and by taking different approaches for imbalanced and balanced clusters, we solved imbalanced data set problem. The presented model performance was evaluated by four publicly available data sets from the PROMISE repository and compared with other machine learning and rule learning algorithms. The obtained results demonstrate that our model presents very good performance, especially in large data sets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arisholm E, Briand, LC, Johannessen E (2008) Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models. Dissertation, University of Oslo
Anil KJ (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666
de Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882
Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38(4):4626–4636
Catal C, Diri B (2009) A systematic review of software fault predictions studies. Expert Syst Appl 36(4):7346–7354
Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings of 9th IEEE international conference on software metrics symposium, pp 189–198
Coello CA, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evol Comput 8(3):256–279
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660
Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43
Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for IEEE clustering. Pattern Recogn 41(1):176–190
Freitas AA (2008) A review of evolutionary algorithms for data mining. In: Maimon O, Rockach L (eds) Soft computing for knowledge discovery and data mining, 2nd edn. Springer, New York, pp 79–111
He H (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Hu X, Eberhart R (2002) Multiobjective optimization using dynamic neighborhood paricle swarm optimization. In: Proceeding of second international conference on evolutionary computation, pp 1677–1681
Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceeding of IEEE international conference on neural networks, pp 1942–1948
Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: Proceeding of IEEE international conference on computational intelligence, pp 74–77
Kim DW, Lee KY, Lee D, Lee KH (2005) Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recogn 38(4):607–611
Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of 22nd IEEE international conference on tools with artificial intelligence, pp 137–144
Koru G, Liu H (2005) Building effective defect prediction models in practice. IEEE Softw 22(6):23–29
Kwedlo W, Iwanowicz P (2010) Using genetic algorithm for selection of initial cluster centers for the k-means method. In: Proceeding of 10th international conference on artifical intelligence and soft computing, pp 165–172
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Lletı R, Ortiz MC, Sarabia LA, Sánchez MS (2004) Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Anal Chim Acta 515(1):87–100
Lounis H, Ait-Mehedine L (2004) Machine-learning techniques for software product quality assessment. In: Proceeding of 4th IEEE international conference on quality software, pp 102–109
Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. Advances in Machine Learning Application in Software Engineering. doi:10.4018/978-1-59140-941-1.ch010
Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J 20(7):739–762
Mahaweerawat A, Sophatsathit P, Lursinsap C, Musilek P (2004) Fault prediction in object-oriented software using neural network techniques. In: Proceeding in Tech Conference on, pp 27–34
Mardia K, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Michalewicz Z (1994) Genetic algorithms + data structures = evolution programs. Springer, New York
Mostaghim S, Teich J (2003) Strategies for finding good local guides in multiobjective particle swarm optimization. In: Proceeding fo third IEEE international conference on Swarm intelligence, pp 26–33
Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202
Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686
Prez-Miana E, Gras J-J (2006) Improving fault prediction using bayesian networks for the development of embedded software applications: research articles. Softw Test Verification Reliab 16(3):157–174
Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231
Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30
Riquelme JC, Ruiz R, Rodríguez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 2(1):67–74
Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Shayeghi H, Mahdavi M, Bagheri A (2010) An improved DPSO with mutation based on similarity algorithm for optimization of transmission lines loading. Energy Convers Manag 51(12):2715–2723
Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceeding of IEEE international conference on information reuse and integration, pp 651–658
Singh Y, Kaur A, Malhotra R (2009) Software fault pronennes prediction using support vector machines. In: Proceeding of IEEE international conference on engineering
Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864
Tan KC, Yu Q, Ang JH (2006) A dual-objective evolutionary algorithm for rules extraction in data mining. Comput Optim Appl 34(2):273–294
Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173
Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443
Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction sbased on support vector machine. In: Proceeding of 16th IEEE international conference on software reliability engineering, pp 213–222
Zhongkai L, Zhencai Z, Shanzeng L (2010) A distance sorting based multi-objective particle swarm optimizer and its applications. Life Syst Model Intell Comput 98:30–36
Zitzler E, Thiele L (1998) An evolutionary algorithm for multiobjective optimization: the strength pareto approach. Swiss federal institute of technology, TIK-Report, No. 43. http://www.tik.ee.ethz.ch/sop/publicationListFiles/zt1998a.pdf
Acknowledgments
This work supported by Islamic Azad University of Shabestar (IAUS).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abdi, Y., Parsa, S. & Seyfari, Y. A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innovations Syst Softw Eng 11, 289–301 (2015). https://doi.org/10.1007/s11334-015-0258-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11334-015-0258-2