Abstract
This paper aims at empirically measuring the effect of clone refactoring on the size of unit test cases in object-oriented software. We investigated various research questions related to the: (1) impact of clone refactoring on source code attributes (particularly size, complexity and coupling) that are related to testability of classes, (2) impact of clone refactoring on the size of unit test cases, (3) correlations between the variations observed after clone refactoring in both source code attributes and the size of unit test cases and (4) variations after clone refactoring in the source code attributes that are more associated with the size of unit test cases. We used different metrics to quantify the considered source code attributes and the size of unit test cases. To investigate the research questions, and develop predictive and explanatory models, we used various data analysis and modeling techniques, particularly linear regression analysis and five machine learning algorithms (C4.5, KNN, Naïve Bayes, Random Forest and Support Vector Machine). We conducted an empirical study using data collected from two open-source Java software systems (ANT and ARCHIVA) that have been clone refactored. Overall, the paper contributions can be summarized as: (1) the results revealed that there is a strong and positive correlation between code clone refactoring and reduction in the size of unit test cases, (2) we showed how code quality attributes that are related to testability of classes are significantly improved when clones are refactored, (3) we observed that the size of unit test cases can be significantly reduced when clone refactoring is applied, and (4) complexity/size measures are commonly associated with the variations of the size of unit test cases when compared to coupling.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Fowler M (1999) Refactoring: improving the design of existing code. Addison Wesley, Boston
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74(7):470
Baker B (1995) On finding duplication and near-duplication in large software systems In: 2nd working conference on reverse engineering, WCRE
Baxter ID, Yahin A, Moura L, Sant’Anna M, Bier L (1998) Clone detection using abstract syntax trees. In: ICSM
Roy CK, Cordy JR (2008) An empirical study of function clones in open source software systems. In: 15th working conference on reverse engineering, WCRE
Mens T, Tourwé T (2004) A survey of software refactoring. IEEE Trans Softw Eng 30(2):126
Sajnani H, Saini V, Lopes CV (2014) A comparative study of bug patterns in java cloned and non-cloned code. In: 14th international working conference on source code analysis and manipulation, pp 21–30
Saini V, Sajnani H, Lopes C (2016) Comparing quality metrics for cloned and non-cloned java methods: a large scale empirical study. In: International conference on software maintenance and evolution, IEEE
Kaur P, Mittal P (2017) Impact of clones refactoring on external quality attributes of open source software. Int J Adv Res Comput Sci 8(5):1
Shahzad S, Hussain A, Nazir S (2017) A clone management framework to improve code quality of FOSS software. In: International conference on communication, computing and digital system (CCODE), IEEE
Fontana A, Zanoni F, Ranchetti A, Ranchetti D (2013) Software clone detection and refactoring. Hindawi Publishing Corporation, INRN Software Engineering, Cairo
Alshayeb M (2009) Empirical investigation of refactoring effect on software quality. Inf Softw Technol 51:1319–1326
Badri M, Kout A, Badri L (2012) On the effect of aspect-oriented refactoring on testability of classes: a case study. In: IEEE international conf computer systems and industrial informatics
Badri M, Kout A, Badri L (2017) Investigating the effect of aspect-oriented refactoring on the unit testing effort of classes: an empirical evaluation. Int J Softw Eng Knowl Eng 27(5):749–789
Bruntink M, Deursen AV (2004) Predicting class testability using object-oriented metrics. In: 4th international workshop on source code analysis and manipulation (SCAM)
Bruntink M, Deursen AV (2006) An empirical study into class testability. J Syst Softw 79(9):1219
Singh Y, Kaur A, Malhota R (2008) Predicting testability effort using artificial neural network. In: Proceedings of the world congress on engineering and computer science, San Francisco, USA
Singh Y, Saha A (2010) Predicting testability of Eclipse: a case study. J Softw Eng 4(2):122
Badri M, Touré F (2011) Empirical analysis for investigating the effect of control flow dependencies on testability of classes. In: 23rd international conference on software engineering and knowledge engineering, USA
Badri M, Touré F (2012) Empirical analysis of object-oriented design metrics for predicting unit testing effort of classes. J Softw Eng Appl 5(7):513
Zhou Y, Leung H, Song Q, Zhao J, Lu H, Chen L, Xu B (2012) An in-depth investigation into the relationships between structural metrics and unit testability in OOS. Inf Sci 55(12):2800
Toure F, Badri M, Lamontagne L (2014) Towards a metrics suite for JUnit test cases. In: 26th international conference on software engineering and knowledge engineering (SEKE), Vancouver
Toure F, Badri M, Lamontagne L (2014) A metrics suite for JUnit test code: a multiple case study on open source software. J Softw Eng Res Dev (JSERD) 2:14
Toure F, Badri M, Lamontagne L (2018) Predicting different levels of the unit testing effort of classes using source code metrics: a multiple case study on open-source software. Innov Syst Softw Eng 14:15–46
Chidamber SR, Kemerer CF (1994) A Metrics suite for OO design. IEEE Trans Softw Eng 20(6):476–493
Chidamber SR, Darcy DP, Kemerer CF (1998) Managerial use of metrics for object-oriented software: an exploratory analysis. IEEE Trans Softw Eng 24(8):629–639
Hegedüs G, Hrabovszki G (2010) Effect of object-oriented refactoring on testability, error proneness and other maintainability attributes. In: ECOOP’2010 Maribor, Slovenia EU, ACM
Kataoka Y, Imai T, Andou H, Fukaya T (2002) A quantitative evaluation of maintainability enhancement by refactoring. In: Proceedings of the international conference on software maintenance
Dandashi F (2002) A method for assessing the reusability of object-oriented code using a validated set of automated measurements. In: Proceedings of the ACM symposium on applied computing
Murgia A, Tonelli R, Marchesi M, Concas G, Counsell S, McFall J, Swift S (2012) Refactoring and its relationship with fan-in and fan-out: an empirical study. In: Proceedings of the 16th European conference on software maintenance and reengineering (CSMR)
Szöke G, Csaba Nagy G, Ferenc R, Gyimòthy T (2017) Empirical study on refactoring large-scale industrial systems and its effects on maintainability. J Syst Softw 129:107
Kadar I, Hegedüs P, Ferenc R, Gyimothy T (2016) A code refactoring dataset and its assessment regarding software maintainability. In: 23rd international conference on software analysis, evolution, and reengineering
Basit HA, Hammad M, Koschke R (2015) A survey on goal-oriented visualization of clone data. In: VISSOFT 2015, IEEE, Bremen
Sajnani H, Sainiy V, Svajlenkoz J, Roy CK, Lopesy CV, Sourcerer CC (2016) Scaling code clone detection to big-code. In: 38th international conference on software engineering. IEEE/ACM
Kapser C, Godfrey MW (2006) Cloning considered harmful. In: Proceedings of the 13th working conference on reverse engineering (WCRE’06), IEEE
Koschke R (2007) Survey of research on software clones. In: Proceedings of duplication, redundancy, and similarity in software
Toomim M, Begel A, Graham S (2004) Managing duplicated code with linked editing. In: 2004 IEEE symposium on visual languages and human centric computing
Kim M, Sazawal V, Notkin D, Murphy G (2005) An empirical study of code clone genealogies. In: Proceedings of FSE
Kapser C, Gofrey M (2008) Cloning considered harmful: patterns of cloning in software. Empir Softw Eng 13(6):645–692
Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 17(4–5):503–530
Mondal M, Rahman S, Saha RK, Roy CK, Krinke J, Schneider KA (2011) An empirical study of the impacts of clones in software maintenance. In: 19th international conference on program comprehension, IEEE
Rahman F, Bird C, Devanbu P (2012) Clones: what is that smell? Empir Softw Eng 7(4–5):503–530
Saidur Rahman M, Aryaniy A, Roy CK, Perinz F (2013) On the relationships between domain-based coupling and code clones: an exploratory study. ICSE, San Francisco
Saidur Rahman M, Roy CK (2017) On the relationships between stability and bug-proneness of code clones: an empirical study. In: 17th international working conference on source code analysis and manipulation, IEEE
Devi U, Sharma A, Kesswani N (2016) A review on quality models to analyze the impact of refactoring code on maintainability with reference to software product line. In: International conference on computing for sustainable global development (INDIACom)
Basili VR, Briand LC, Melo W (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751
Fenton N, Pfleeger SL (1997) Software metrics: a rigorous and practical approach. PWS Publishing Company, Boston
Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 3(10):897–910
Zhou Y, Leung H (2006) Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Trans Softw Eng 32(10):771–789
Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83:4
Shatnawi R, Li W, Swain J, Newman T (2010) Finding software metrics threshold values using ROC curves. J Softw Maint Evol Res Pract 22:1–16
Shatnawi R (2010) A quantitative investigation of the acceptable risk levels of object-oriented metrics in open-source systems. IEEE Trans Softw Eng 36:216–225
Srivastava S, Kumar R (2013) Indirect method to measure software quality using CK-OO suite. In: International conference on intelligent systems and signal processing (ISSP), IEEE
Isong B, Obeten E (2013) A systematic review of the empirical validation of object-oriented metrics towards fault-proneness prediction. Int J Softw Eng Knowl Eng 23:1513
Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38
Shatnawi R (2015) Deriving metrics thresholds using log transformation. J Softw 27(2):95–113
Binder RV (1994) Design for testability in object-oriented systems. Commun ACM 37(9):87–101
Malhotra R, Bansal AJ (2015) Fault prediction considering threshold effects of object-oriented metrics. Expert Syst 32(2):203
Kaur A, Kaur K (2014) Performance analysis of ensemble learning for predicting defects in open source software. In: 2014 international conference on advances in computing, communications and informatics (ICACCI)
Moeyersoms J, Junqué de Fortuny E, Dejaeger K, Baesens B, Martens D (2015) Comprehensible software fault and effort prediction: a data mining approach. J Syst Softw 100:203
Aha D, Kibler D (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66
Shatnawi R (2012) Improving software fault-prediction for imbalanced data. In: International conference on innovations in information technology, IIT
Hall T, Beecham S, Bowes D, Gray D, Counsell S (2012) A systematic literature review on fault prediction performance in software engineering. IEEE Trans Softw Eng 38(6):1276
Shatnawi R (2017) The application of ROC analysis in threshold identification, data im balance and metrics selection for software fault prediction. Innov Syst Softw Eng 13:201
Breiman L (2001) Random forests. Mach Learn 45:5
Moeyersoms J, Junqué de Fortuny E, Dejaeger K, Baesens B, Martens D (2015) Comprehensible software fault and effort prediction: a data mining approach. J Syst Softw 100:80–90
Malhotra R, Bansal AJ (2015) Fault prediction considering threshold effects of object-oriented metrics. Expert Syst 32(2):203–219
Malhotra R, Jain A (2012) Fault prediction using statistical and machine learning methods for improving software quality. J Inf Process Syst 8(2):241–262
Acknowledgements
This work was partially supported by a NSERC (Natural Sciences and Engineering Research Council of Canada) grant.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Badri, M., Badri, L., Hachemane, O. et al. Measuring the effect of clone refactoring on the size of unit test cases in object-oriented software: an empirical study. Innovations Syst Softw Eng 15, 117–137 (2019). https://doi.org/10.1007/s11334-019-00334-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11334-019-00334-6