Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is fast but yields clone candidates that are often not syntactic units. Current techniques based on abstract syntax trees—on the other hand—find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find syntactic clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of a large case study in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. The Bellon benchmark consists of clone pairs validated by humans for eight software systems written in C or Java from different application domains. The new contributions of this paper over the conference paper are the additional analysis of Java programs, the exploration of an alternative path that uses parse trees instead of abstract syntax trees, and the investigation of the impact on recall and precision when clone analyses insist on consistent parameter renaming.
A trie is an ordered tree data structure that is used to store an associative array where the keys are strings.
An alternative to suffix trees are suffix arrays, which offer the advantage of less space consumption (Manber and Myers 1991) but at the cost of more runtime.
Please be reminded that (a, b, l) denotes a clone pair starting at token index a and b, respectively, with length l.
Personal communication; March 2007.
Personal communication at Dagstuhl, July 2006.
We would like to thank Stefan Bellon for providing us with his benchmark and ccdiml, his support in the evaluation, and for comments on this paper.
Editors: Massimiliano Di Penta and Susan Sim
Falke, R., Frenzel, P. & Koschke, R. Empirical evaluation of clone detection using syntax suffix trees. Empir Software Eng 13, 601–643 (2008). https://doi.org/10.1007/s10664-008-9073-9
DOI: https://doi.org/10.1007/s10664-008-9073-9