doi: 10.17706/jsw.15.6.147-162
A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction
2Institute of Information Technology, University of Dhaka, Dhaka, Bangladesh.
3Jashore University of Science and Technology, Jashore, Bangladesh.
Abstract—Software defect prediction model is trained using code metrics and historical defect information to identify probable software defects. The accuracy and performance of a prediction model largely depend on the training dataset. In order to provide proper training dataset, it is required to make the dataset clustered with less variabilities using clustering algorithms. However, clustering process is hampered due to multiple attributes of dataset such as Coupling between Objects, Response for Class, Lines of Code, etc. This research will aim to predict software defects through reducing code metrics dimensions to two latent variables. It will finally help the clustering algorithms to group data properly for the defect prediction model. In this paper, the dataset similarities are analyzed by reducing code metrics’ attributes into two latent variables based on their impacts to defects. Their impacts to defects can be analyzed using regression analysis because it identifies the relationship among a set of dependent and independent variables. Then, the code metrics are merged into two variables - PosImpactValue and NegImpactValue based on their positive or negative impact, respectively. As a result, multi-dimensional dataset is mapped into two-dimensional dataset. Plotting those dimensions reduced datasets enable distance-based clustering algorithms to group those datasets based on their similarities. Experiments have been performed on 18 releases of 6 open source software datasets such as jEdit, Ant, Xalan, Synapse, Tomcat and Camel. For comparative analysis, one of the most commonly used dimension reduction techniques named Principle Component Analysis (PCA) and two popular clustering techniques in defect prediction – DBSCAN and WHERE have been used in the experiment. First, the dimensions of the experimental datasets have been reduced using the proposed technique and PCA separately. Then, the reduced datasets have been clustered using DBSCAN and WHERE independently for identifying number of defects accurately. The comparative result analysis shows that the defect prediction models based on the clustering algorithms are more accurate for the dataset reduced by the proposed technique than PCA.
Index Terms—Software defect prediction, principal component analysis, DBSCAN, WHERE clustering, code metrics’ dimension reduction technique, dataset pre-processing.
Cite: Rayhanul Islam, Abdus Satter, Atish Kumar Dipongkor, Md. Saeed Siddik, Kazi Sakib, "A Novel Approach for Converting N-Dimensional Dataset into Two Dimensions to Improve Accuracy in Software Defect Prediction," Journal of Software vol. 15, no. 6, pp. 147-162, 2020.
Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
General Information
ISSN: 1796-217X (Online)
Abbreviated Title: J. Softw.
Frequency: Quarterly
APC: 500USD
DOI: 10.17706/JSW
Editor-in-Chief: Prof. Antanas Verikas
Executive Editor: Ms. Cecilia Xie
Abstracting/ Indexing: DBLP, EBSCO,
CNKI, Google Scholar, ProQuest,
INSPEC(IET), ULRICH's Periodicals
Directory, WorldCat, etcE-mail: jsweditorialoffice@gmail.com
-
Oct 22, 2024 News!
Vol 19, No 3 has been published with online version [Click]
-
Jan 04, 2024 News!
JSW will adopt Article-by-Article Work Flow
-
Apr 01, 2024 News!
Vol 14, No 4- Vol 14, No 12 has been indexed by IET-(Inspec) [Click]
-
Apr 01, 2024 News!
Papers published in JSW Vol 18, No 1- Vol 18, No 6 have been indexed by DBLP [Click]
-
Jun 12, 2024 News!
Vol 19, No 2 has been published with online version [Click]