Abstract
Classification of software artifacts, in particularly the source code files, are currently performed by administrator of a repository. Even though there exist automated classification on these repositories, nevertheless existing approach focuses on semantic analysis of keywords found in the artifact. This paper presents the use of structural information, that is the software metrics, in determining the appropriate application domain for a particular artifact. Results obtained from the study show that there is a difference in the metrics’ trend between files of different application domain. It is also learned that results obtained using k-nearest neighborhood outperformed C4.5 decision tree and the one generated based on Discriminant Analysis in classifying files of database and graphics domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C and c++ code counter, http://sourceforge.net/projects/cccc/ (last accessed on April 15, 2010)
Freshmeat, http://freshmeat.net/ (last accessed on May 10, 2010)
Sourceforge, http://sourceforge.net/ (last accessed on May 10, 2010)
Spss, http://www.spss.com (last accessed on April 15, 2010)
Chung, K.-P., Fun, C.C.: A hierarchical nonparametric discriminant analysis approach for a content-based image retrieval system. In: ICEBE 2005: Proceedings of the IEEE International Conference on e-Business Engineering, Washington, DC, USA, pp. 346–351. IEEE Computer Society, Los Alamitos (2005)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: Xsearch: A semantic search engine for xml. In: Proceedings of the 29th VLDB Conference, Berlin, Germany (2003)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory IT-13(1), 21–27 (1967)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science, 391–407 (1990)
Fuchs, N.E.: Specifications are (preferably) executable. Software Engineering Journal 7(5), 323–334 (1992)
Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining very large databases. Computer 32(8), 38–45 (1999)
Kawaguchi, S., Garg, P.K., Makoto, M., Inoue, K.: Automatic categorization algorithm for evolvable software archive. In: Proceedings of the Six International Workshop on Principles of Software Evolution, pp. 195–200 (2002)
Kawaguchi, S., Garg, P.K., Makoto, M., Inoue, K.: Mudablue: An automatic categorization system for open source repositories. In: Proceedings of the 11th Asia-Pacific Software Engineering Conference, pp. 184–193 (2004)
Klecka, W.R.: Discriminant Analysis, 1st edn. Sage Publications, Thousand Oaks (1980)
Kwon, O.-W., Lee, J.-H.: Text categorization based on k-nearest neighbor approach for web site classification. Information Processing Management 39(1), 25–44 (2003)
Lim, T.-S., Loh, W.-Y., Shih, Y.-S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning 40(3), 203–228 (2000)
Marcus, A., Sergeyev, A., Rajlich, V., Maletic, J.I.: An information retrieval approach to concept location in source code. In: WCRE 2004: Proceedings of the 11th Working Conference on Reverse Engineering (WCRE 2004), Washington, DC, USA, pp. 214–223. IEEE Computer Society, Los Alamitos (2004)
DSFP Modeling and Forecasting. Svm - support vector machines, http://www.dtreg.com/svm.htm (last accessed on April 15, 2010)
Nagappan, N.: Toward a software testing and reliability early warning metric suite. In: Proceedings of International Conference on Software Engineering, pp. 60–62 (2004)
U. of Waikato. Weka, http://www.cs.waikato.ac.nz/ml/weka (last accessed on April 15, 2010)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Ruggieri, S.: Efficient c4.5. IEEE Transactions on Knowledge and Data Engineering 14(2), 438–444 (2002)
Shafia, Mustafa, T., Raza, A., Jamil, U., Shahzad, F.: A classification model for software workbenches. European Journal of Scientific Research 41(1), 109–121 (2010)
Ugurel, S., Krovetz, R., Giles, C.L.: What’s the code?: Automatic classification of source code archives. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–638. ACM Press, New York (2002)
Walters, S., Rajashekhar, T.B.: Mapping of two schemes of classification for software classification. Cataloging and Classification Quarterly 41(1), 163–182 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yusof, Y., Rana, O.F. (2010). Classification of Software Artifacts Based on Structural Information. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based and Intelligent Information and Engineering Systems. KES 2010. Lecture Notes in Computer Science(), vol 6279. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15384-6_58
Download citation
DOI: https://doi.org/10.1007/978-3-642-15384-6_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15383-9
Online ISBN: 978-3-642-15384-6
eBook Packages: Computer ScienceComputer Science (R0)