An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework

Murillo-Morera, Juan; Castro-Herrera, Carlos; Arroyo, Javier; Fuentes-Fernández, Rubén

doi:10.1007/978-3-319-47955-2_19

Juan Murillo-Morera¹⁷,
Carlos Castro-Herrera¹⁷,
Javier Arroyo¹⁸ &
…
Rubén Fuentes-Fernández¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10022))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1237 Accesses
1 Citations

Abstract

Today, it is common for software projects to collect measurement data through development processes. With these data, defect prediction software can try to estimate the defect proneness of a software module, with the objective of assisting and guiding software practitioners. With timely and accurate defect predictions, practitioners can focus their limited testing resources on higher risk areas. This paper reports a benchmarking study that uses a genetic algorithm that automatically generates and compares different learning schemes (preprocessing + attribute selection + learning algorithms). Performance of the software development defect prediction models (using AUC, Area Under the Curve) was validated using NASA-MDP and PROMISE data sets. Twelve data sets from NASA-MDP (8) and PROMISE (4) projects were analyzed running a \(M\times N\)-fold cross-validation. We used a genetic algorithm to select the components of the learning schemes automatically, and to evaluate and report those with the best performance. In all, 864 learning schemes were studied. The most common learning schemes were: data preprocessors: Log and CoxBox + attribute selectors: Backward Elimination, BestFirst and LinearForwardSelection + learning algorithms: NaiveBayes, NaiveBayesSimple, SimpleLogistic, MultilayerPerceptron, Logistic, LogitBoost, BayesNet, and OneR. The genetic algorithm reported steady performance and runtime among data sets, according to statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A genetic algorithm based framework for software effort prediction

Article Open access 31 May 2017

Evolutionary Computation-Based Techniques Over Multiple Data Sets: An Empirical Assessment

Article 17 June 2017

Cross-Project Software Defect Prediction Using Ensemble Model with Individual Data Balancing and Feature Selection

References

Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37, 356–370 (2011)
Article Google Scholar
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 308–320 (1976)
Google Scholar
Halstead, M.H.: Elements of software science. IEEE Trans. Softw. Eng. (1977)
Google Scholar
Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Software measurement data reduction using ensemble techniques. Neurocomputing 92, 124–132 (2012)
Article Google Scholar
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE Trans. Softw. Eng. 38, 1276–1304 (2012)
Article Google Scholar
Arisholm, E., Briand, L.C., Johannessen, E.B.: A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. J. Syst. Softw. 83, 2–17 (2010)
Article Google Scholar
Malhotra, R.: Comparative analysis of statistical and machine learning methods for predicting faulty modules. Appl. Soft Comput. 21, 286–297 (2014)
Article Google Scholar
Shepperd, M., Song, Q., Sun, Z., Mair, C.: Data quality: some comments on the nasa software defect datasets. IEEE Trans. Softw. Eng. 39, 1208–1215 (2013)
Article Google Scholar
Malhotra, R.: A systematic review of machine learning techniques for software fault prediction. Appl. Soft Comput. 27, 504–518 (2015)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)
MATH Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33, 2–13 (2007)
Article Google Scholar
Murillo-Morera, J., Jenkins, M.: A software defect-proneness prediction framework: a new approach using genetic algorithms to generate learning schemes. In: The 27th International Conference on Software Engineering and Knowledge Engineering, SEKE 2015, Wyndham Pittsburgh University Center, Pittsburgh, PA, USA, 6–8 July 2015, pp. 445–450 (2015)
Google Scholar

Download references

Acknowledgments

This research was supported by University of Costa Rica, National University of Costa Rica and Ministry of Science, Technology and Telecommunications (MICITT).

Author information

Authors and Affiliations

Doctoral Program in Computer Science, University of Costa Rica, San José, Costa Rica
Juan Murillo-Morera & Carlos Castro-Herrera
Department of Software Engineering and Artificial Intelligence, University Complutense of Madrid, Madrid, Spain
Javier Arroyo & Rubén Fuentes-Fernández

Authors

Juan Murillo-Morera
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Castro-Herrera
View author publications
You can also search for this author in PubMed Google Scholar
Javier Arroyo
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Fuentes-Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Murillo-Morera .

Editor information

Editors and Affiliations

INAOE , Tonantzintla, Mexico
Manuel Montes y Gómez
Astrofisica Optica y Electronica, INAOE , Puebla, Mexico
Hugo Jair Escalante
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Alberto Segura
Universidad Nacional de Costa Rica , Heredia, Costa Rica
Juan de Dios Murillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Murillo-Morera, J., Castro-Herrera, C., Arroyo, J., Fuentes-Fernández, R. (2016). An Empirical Validation of Learning Schemes Using an Automated Genetic Defect Prediction Framework. In: Montes y Gómez, M., Escalante, H., Segura, A., Murillo, J. (eds) Advances in Artificial Intelligence - IBERAMIA 2016. IBERAMIA 2016. Lecture Notes in Computer Science(), vol 10022. Springer, Cham. https://doi.org/10.1007/978-3-319-47955-2_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-47955-2_19
Published: 14 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47954-5
Online ISBN: 978-3-319-47955-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics