An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework | SpringerLink
Skip to main content

An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework

  • Conference paper
  • First Online:
Advances in Artificial Intelligence - IBERAMIA 2016 (IBERAMIA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

Abstract

Today, it is common for software projects to collect measurement data through development processes. With these data, defect prediction software can try to estimate the defect proneness of a software module, with the objective of assisting and guiding software practitioners. With timely and accurate defect predictions, practitioners can focus their limited testing resources on higher risk areas. This paper reports a benchmarking study that uses a genetic algorithm that automatically generates and compares different learning schemes (preprocessing + attribute selection + learning algorithms). Performance of the software development defect prediction models (using AUC, Area Under the Curve) was validated using NASA-MDP and PROMISE data sets. Twelve data sets from NASA-MDP (8) and PROMISE (4) projects were analyzed running a \(M\times N\)-fold cross-validation. We used a genetic algorithm to select the components of the learning schemes automatically, and to evaluate and report those with the best performance. In all, 864 learning schemes were studied. The most common learning schemes were: data preprocessors: Log and CoxBox + attribute selectors: Backward Elimination, BestFirst and LinearForwardSelection + learning algorithms: NaiveBayes, NaiveBayesSimple, SimpleLogistic, MultilayerPerceptron, Logistic, LogitBoost, BayesNet, and OneR. The genetic algorithm reported steady performance and runtime among data sets, according to statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37, 356–370 (2011)

    Article  Google Scholar 

  2. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 308–320 (1976)

    Google Scholar 

  3. Halstead, M.H.: Elements of software science. IEEE Trans. Softw. Eng. (1977)

    Google Scholar 

  4. Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Software measurement data reduction using ensemble techniques. Neurocomputing 92, 124–132 (2012)

    Article  Google Scholar 

  5. Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38, 1276–1304 (2012)

    Article  Google Scholar 

  6. Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83, 2–17 (2010)

    Article  Google Scholar 

  7. Malhotra, R.: Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl. Soft Comput. 21, 286–297 (2014)

    Article  Google Scholar 

  8. Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the nasa software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)

    Article  Google Scholar 

  9. Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)

    Article  Google Scholar 

  10. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)

    MATH  Google Scholar 

  11. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 2–13 (2007)

    Article  Google Scholar 

  12. Murillo-Morera, J., Jenkins, M.: A software defect-proneness prediction framework: a new approach using genetic algorithms to generate learning schemes. In: The 27th International Conference on Software Engineering and Knowledge Engineering, SEKE 2015, Wyndham Pittsburgh University Center, Pittsburgh, PA, USA, 6–8 July 2015, pp. 445–450 (2015)

    Google Scholar 

Download references

Acknowledgments

This research was supported by University of Costa Rica, National University of Costa Rica and Ministry of Science, Technology and Telecommunications (MICITT).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Murillo-Morera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Murillo-Morera, J., Castro-Herrera, C., Arroyo, J., Fuentes-Fernández, R. (2016). An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47955-2_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47954-5

  • Online ISBN: 978-3-319-47955-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics