Abstract
Source code and metric mining have been used to successfully assist with software quality evaluation. This paper presents a data mining approach which incorporates clustering Java classes, as well as classifying extracted clusters, in order to assess internal software quality. We use Java classes as entities and static metrics as attributes for data mining. We identify outliers and apply K-means clustering in order to establish clusters of classes. Outliers indicate potentially fault prone classes, whilst clusters are examined so that we can establish common characteristics. Subsequently, we apply C4.5 to build classification trees for identifying metrics which determine cluster membership. We evaluate the proposed approach with two well known open source software systems, Jedit and Apache Geronimo. Results have consolidated key findings from previous work and indicated that combining clustering with classification produces better results than stand alone clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tian, J.: Quality-Evaluation Models and Measurements. IEEE Software 21, 84–91 (2004)
Li, H.F., Cheung, W.K.: An Experimental investigation of software metric and their relationship to software development effort. IEEE Transaction on Software Engineering 15(5), 649–653 (1989)
Kanellopoulos, Y., Makris, C., Tjortjis, C.: An Improved Methodology on Information Distillation by Mining Program Source Code. Data & Knowledge Engineering, Elsevier 61(2), 359–383 (2007)
Menzies, T., Greenwald, J., Frank, A.: Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering 32(11), 2–13 (2007)
Tribus, H., Morrigl, I., Axelsson, S.: Using Data Mining for Static Code Analysis of C. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 603–614. Springer, Heidelberg (2012)
Bush, W.R., Pincus, J.D., Sielaff, D.J.: A Static Analyzer for Finding Dynamic Programming Errors. Software-Practice and Experience 20, 775–802 (2000)
Spinnelis, D.: Code Quality the Open Source Perspective. Addison Wesley (2006)
Fenton, N.E.: Software Metrics: A Rigorous Approach. Cengage Learning EMEA (1991)
Chidamber, S.R., Kemerer, C.F.: Towards a Metrics Suite for Object Oriented Design. In: Proc. Conf. Object Oriented Programming Systems, Languages, and Applications (OOPSLA 1991), vol. 26(11), pp. 197–211 (1991)
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE Transactions on Software Engineering 20(6), 476–493 (1994)
Halstead, M.: Elements of Software Science. Elsevier (1977)
McCabe, T.J.: A Complexity Measure. IEEE Transactions on Software Engineering SE-2(4), 308–320 (1976)
Dick, S., Meeks, A., Last, M., Bunke, H., Kandel, A.: Data mining in software metrics databases. Fuzzy Sets and Systems 145(1), 81–100 (2004)
Zhong, S., Khoshgoftaar, T.M., Seliya, N.: Expert-Based Software Measurement Data Analysis with Clustering Techniques. IEEE Intelligent Systems, Special Issue on Data and Information Cleaning and Preprocessing, 22–30 (2004)
Nagappan, N., Ball, T., Zeller, A.: Mining Metrics to Predict Component Failures. In: Proc. 28th Int’l Conf. Software Engineering (ICSE 2006), pp. 452–461 (2006)
Kanellopoulos, Y., Antonellis, P., Antoniou, D., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Code Quality Evaluation methodology using the ISO/IEC 9126 Standard. Int’l Journal of Software Engineering & Applications 1(3), 17–36 (2010)
Antonellis, P., Antoniou, D., Kanellopoulos, Y., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Employing Clustering for Assisting Source Code Maintainability Evaluation according to ISO/IEC-9126. In: Proc. Artificial Intelligence Techniques in Software Engineering Workshop (AISEW 2008) in ECAI 2008 (2008)
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Pearson Education (2006)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann (2005)
Vartziotis, F.: Java Source Code Analyzer for Software Assessment, BSc Dissertation, Department of Computer Science & Engineering University of Ioannina (2012)
Kanellopoulos, Y., Heitlager, I., Tjortjis, C., Visser, J.: Interpretation of Source Code Clusters in Terms of the ISO/IEC-9126 Maintainability Characteristics. In: Proc. 12th European Conf. Software Maintenance and Reengineering (CSMR 2008), pp. 63–72. IEEE Comp. Soc. Press (2008)
Antonellis, P., Antoniou, D., Kanellopoulos, Y., Makris, C., Theodoridis, E., Tjortjis, C., Tsirakis, N.: Clustering for Monitoring Software Systems Maintainability Evolution. Electronic Notes in Theoretical Computer Science, Elsevier 233, 43–57 (2009)
Prasad, A.V.K., Krishna, S.R.: Data Mining for Secure Software Engineering-Source Code Management Tool Case Study. Int’l Journal of Engineering Science and Technology 2(7), 2667–2677 (2010)
JEdit website, http://www.jedit.org (last accessed: January 15, 2014)
Apache Geronimo website, http://geronimo.apache.org (last accessed: January 15, 2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Papas, D., Tjortjis, C. (2014). Combining Clustering and Classification for Software Quality Evaluation. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-07064-3_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07063-6
Online ISBN: 978-3-319-07064-3
eBook Packages: Computer ScienceComputer Science (R0)