A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction | Innovations in Systems and Software Engineering Skip to main content
Log in

A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction

  • Original Paper
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

Software testing is a fundamental activity in the software development process aimed to determine the quality of software. To reduce the effort and cost of this process, defect prediction methods can be used to determine fault-prone software modules through software metrics to focus testing activities on them. Because of model interpretation and easily used by programmers and testers some recent studies presented classification rules to make prediction models. This study presents a rule-based prediction approach based on kernel k-means clustering algorithm and Distance based Multi-objective Particle Swarm Optimization (DSMOPSO). Because of discrete search space, we modified this algorithm and named it DSMOPSO-D. We prevent best global rules to dominate local rules by dividing the search space with kernel k-means algorithm and by taking different approaches for imbalanced and balanced clusters, we solved imbalanced data set problem. The presented model performance was evaluated by four publicly available data sets from the PROMISE repository and compared with other machine learning and rule learning algorithms. The obtained results demonstrate that our model presents very good performance, especially in large data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. http://promisedata.org/repository/.

References

  1. Arisholm E, Briand, LC, Johannessen E (2008) Data mining techniques, candidate measures and evaluation methods for building practically useful fault-proneness prediction models. Dissertation, University of Oslo

  2. Anil KJ (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  3. de Carvalho AB, Pozo A, Vergilio SR (2010) A symbolic fault prediction model based on multiobjective particle swarm optimization. J Syst Softw 83(5):868–882

    Article  Google Scholar 

  4. Catal C (2011) Software fault prediction: a literature review and current trends. Expert Syst Appl 38(4):4626–4636

    Article  Google Scholar 

  5. Catal C, Diri B (2009) A systematic review of software fault predictions studies. Expert Syst Appl 36(4):7346–7354

    Article  Google Scholar 

  6. Chulani S, Ray B, Santhanam P, Leszkowicz R (2003) Metrics for managing customer view of software quality. In: Proceedings of 9th IEEE international conference on software metrics symposium, pp 189–198

  7. Coello CA, Pulido GT, Lechuga MS (2004) Handling multiple objectives with particle swarm optimization. IEEE Trans Evol Comput 8(3):256–279

    Article  Google Scholar 

  8. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

  9. Elish KO, Elish MO (2008) Predicting defect-prone software modules using support vector machines. J Syst Softw 81(5):649–660

    Article  Google Scholar 

  10. Fenton N, Neil M, Marsh W, Hearty P, Marquez D, Krause P, Mishra R (2007) Predicting software defects in varying development lifecycles using bayesian nets. Inf Softw Technol 49(1):32–43

    Article  Google Scholar 

  11. Filippone M, Camastra F, Masulli F, Rovetta S (2008) A survey of kernel and spectral methods for IEEE clustering. Pattern Recogn 41(1):176–190

    Article  MATH  Google Scholar 

  12. Freitas AA (2008) A review of evolutionary algorithms for data mining. In: Maimon O, Rockach L (eds) Soft computing for knowledge discovery and data mining, 2nd edn. Springer, New York, pp 79–111

    Chapter  Google Scholar 

  13. He H (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  14. Hu X, Eberhart R (2002) Multiobjective optimization using dynamic neighborhood paricle swarm optimization. In: Proceeding of second international conference on evolutionary computation, pp 1677–1681

  15. Kennedy J, Eberhart RC (1995) Particle swarm optimization. In: Proceeding of IEEE international conference on neural networks, pp 1942–1948

  16. Kennedy J, Spears W (1998) Matching algorithms to problems: an experimental test of the particle swarm and some genetic algorithms on the multimodal problem generator. In: Proceeding of IEEE international conference on computational intelligence, pp 74–77

  17. Kim DW, Lee KY, Lee D, Lee KH (2005) Evaluation of the performance of clustering algorithms in kernel-induced feature space. Pattern Recogn 38(4):607–611

    Article  Google Scholar 

  18. Khoshgoftaar TM, Gao K, Seliya N (2010) Attribute selection and imbalanced data: problems in software defect prediction. In: Proceedings of 22nd IEEE international conference on tools with artificial intelligence, pp 137–144

  19. Koru G, Liu H (2005) Building effective defect prediction models in practice. IEEE Softw 22(6):23–29

    Article  Google Scholar 

  20. Kwedlo W, Iwanowicz P (2010) Using genetic algorithm for selection of initial cluster centers for the k-means method. In: Proceeding of 10th international conference on artifical intelligence and soft computing, pp 165–172

  21. Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496

    Article  Google Scholar 

  22. Lletı R, Ortiz MC, Sarabia LA, Sánchez MS (2004) Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes. Anal Chim Acta 515(1):87–100

    Article  Google Scholar 

  23. Lounis H, Ait-Mehedine L (2004) Machine-learning techniques for software product quality assessment. In: Proceeding of 4th IEEE international conference on quality software, pp 102–109

  24. Ma Y, Guo L, Cukic B (2006) A statistical framework for the prediction of fault-proneness. Advances in Machine Learning Application in Software Engineering. doi:10.4018/978-1-59140-941-1.ch010

    Google Scholar 

  25. Mahanti R, Antony J (2005) Confluence of six sigma, simulation and software development. Manag Audit J 20(7):739–762

    Article  Google Scholar 

  26. Mahaweerawat A, Sophatsathit P, Lursinsap C, Musilek P (2004) Fault prediction in object-oriented software using neural network techniques. In: Proceeding in Tech Conference on, pp 27–34

  27. Mardia K, Kent JT, Bibby JM (1979) Multivariate analysis. Academic Press, London

  28. Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13

    Article  Google Scholar 

  29. Michalewicz Z (1994) Genetic algorithms + data structures = evolution programs. Springer, New York

    Book  MATH  Google Scholar 

  30. Mostaghim S, Teich J (2003) Strategies for finding good local guides in multiobjective particle swarm optimization. In: Proceeding fo third IEEE international conference on Swarm intelligence, pp 26–33

  31. Müller KR, Mika S, Rätsch G, Tsuda K, Schölkopf B (2001) An introduction to kernel-based learning algorithms. IEEE Trans Neural Netw 12(2):181–202

    Article  Google Scholar 

  32. Pai GJ, Dugan JB (2007) Empirical analysis of software fault content and fault proneness using bayesian methods. IEEE Trans Softw Eng 33(10):675–686

    Article  Google Scholar 

  33. Prez-Miana E, Gras J-J (2006) Improving fault prediction using bayesian networks for the development of embedded software applications: research articles. Softw Test Verification Reliab 16(3):157–174

    Article  Google Scholar 

  34. Provost F, Fawcett T (2001) Robust classification for imprecise environments. Mach Learn 42(3):203–231

    Article  MATH  Google Scholar 

  35. Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30

    Article  Google Scholar 

  36. Riquelme JC, Ruiz R, Rodríguez D, Moreno J (2008) Finding defective modules from highly unbalanced datasets. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 2(1):67–74

  37. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  MATH  Google Scholar 

  38. Shayeghi H, Mahdavi M, Bagheri A (2010) An improved DPSO with mutation based on similarity algorithm for optimization of transmission lines loading. Energy Convers Manag 51(12):2715–2723

    Article  Google Scholar 

  39. Seiffert C, Khoshgoftaar TM, Hulse JV, Folleco A (2007) An empirical study of the classification performance of learners on imbalanced and noisy software quality data. In: Proceeding of IEEE international conference on information reuse and integration, pp 651–658

  40. Singh Y, Kaur A, Malhotra R (2009) Software fault pronennes prediction using support vector machines. In: Proceeding of IEEE international conference on engineering

  41. Tan KC, Yu Q, Ang JH (2006) A coevolutionary algorithm for rules discovery in data mining. Int J Syst Sci 37(12):835–864

    Article  MATH  MathSciNet  Google Scholar 

  42. Tan KC, Yu Q, Ang JH (2006) A dual-objective evolutionary algorithm for rules extraction in data mining. Comput Optim Appl 34(2):273–294

    Article  MATH  MathSciNet  Google Scholar 

  43. Tax DMJ, Duin RPW (2002) Uniform object generation for optimizing one-class classifiers. J Mach Learn Res 2:155–173

    MATH  Google Scholar 

  44. Wang S, Yao X (2013) Using class imbalance learning for software defect prediction. IEEE Trans Reliab 62(2):434–443

    Article  Google Scholar 

  45. Xing F, Guo P, Lyu MR (2005) A novel method for early software quality prediction sbased on support vector machine. In: Proceeding of 16th IEEE international conference on software reliability engineering, pp 213–222

  46. Zhongkai L, Zhencai Z, Shanzeng L (2010) A distance sorting based multi-objective particle swarm optimizer and its applications. Life Syst Model Intell Comput 98:30–36

    Article  Google Scholar 

  47. Zitzler E, Thiele L (1998) An evolutionary algorithm for multiobjective optimization: the strength pareto approach. Swiss federal institute of technology, TIK-Report, No. 43. http://www.tik.ee.ethz.ch/sop/publicationListFiles/zt1998a.pdf

Download references

Acknowledgments

This work supported by Islamic Azad University of Shabestar (IAUS).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yousef Abdi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdi, Y., Parsa, S. & Seyfari, Y. A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction. Innovations Syst Softw Eng 11, 289–301 (2015). https://doi.org/10.1007/s11334-015-0258-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-015-0258-2

Keywords

Navigation