计算机科学 ›› 2021, Vol. 48 ›› Issue (11A): 509-522.doi: 10.11896/jsjkx.210300310
乐乔艺, 刘建勋, 孙晓平, 张祥平
LE Qiao-yi, LIU Jian-xun, SUN Xiao-ping, ZHANG Xiang-ping
摘要: 软件系统中两个或两个以上的相似代码片段被称为代码克隆(code clone)。有研究表明,代码克隆在软件系统中大量存在,并且随着时间推移不断增长。随着代码开源成为潮流,代码克隆占比越来越高。已有研究工作发现软件系统中的代码克隆是有害的,会导致系统稳定性降低,造成代码库冗余和软件缺陷传播等问题。为了提高代码质量,目前学术界和工业界已经提出了多种代码克隆检测方法,按照获取代码的信息程度不同分为基于文本、词法、语法、语义、和度量值5种方法,不同的方法具有不同的性能和应用场景。文中分析了软件克隆出现的原因及优缺点,对软件系统中的代码克隆问题进行了分类,评价了5种不同类型检测方法各自的优势,详细介绍了部分方法的核心思想、检测语言、验证所用数据集及检测效果等技术特征。文章最后总结了克隆检测技术所适用的不同应用场景,对代码克隆检测方法和应用的发展方向做出了展望。
中图分类号:
[1]KIM M,BERGMAN L,LAU T,et al.An Ethnographic study of copy and paste programming practices in OOPL[C]//Proc.of 3rd International ACM-IEEE Symposium on Empirical Software Engineering (ISESE'04).2004:83-92. [2]ROY C K,CORDYJ R.A survey on software clone detection research[J].Queen's School of Computing TR,2007,541(115):64-68. [3]BURD E,BAILEY J.Evaluating Clone Detection Tools for Use during Preventative Maintenance[C]//Proc.of the 2nd IEEE International Workshop on Source Code Analysis andManipulation.2002:36-43. [4]RATTAN D,BHATIA R,SINGH M.Software clone detection:A systematic review[J].Information and Software Technology,2013:1165-1199. [5]ZHANG D,LUO P.Survey of Code Similarity Detection Method and Tools[J].Computer Science,2019,47(3):5-10. [6]SHENEAMER A,KALITA J.A survey of software clone detectiontechniques[J].International Journal of Computer Applications,2016,137(10):1-21. [7]CHEN Q Y,LI S P,YAN M,et al.Code clone detection:A literature review[J].Ruan Jian Xue Bao,2019,30(4):962-980. [8]KOSCHKE R.Survey of research on software clones.Dagstuhl Seminar Proceedings[J].Schloss Dagstuhl-Leibniz-Zentrum für Informatik,2007. [9]WALKER A,CERNY T,SONG E.Open-source tools andbenchmarks for code-clone detection:past,present,and future trends[J].ACM SIGAPP Applied Computing Review,2020,19(4):28-39. [10]SU X H,ZHANG F L.A survey for Management-Oriented Code clone Research[J].Chinese Journal of Computers,2018,41(3):628-651. [11]HOU M,ZHANG L P.Research on Software Clone DetectionTechnology[J].Computer Technology And Development,2019(8):86-91. [12]KAPSER C J,GODFREYM W.“Cloning considered harmful” considered harmful:patterns of cloning in software[J].Empirical Software Engineering,2008,13(6):645. [13]CORDY J R.Comprehending reality-practical barriers to industrial adoption of software maintenance automation[C]//11th IEEE International Workshop on Program Comprehension.2003:196-205. [14]FOWLER M,BECK K,OPDYKEW R.Refactoring:Improving the design of existing code[C]//11th European Conference.1997. [15]UEDA Y,KAMIYA T,KUSUMOTOS,et al.Gemini:Mainte-nance support environment based on code clone analysis[C]//Proceedings Eighth IEEE Symposium on Software Metrics.2002:67-76. [16]BAXTER I D,YAHIN A,MOURA L,et al.Clone detection using abstract syntax trees[C]//Proceedings International Conference on Software Maintenance.1998:368-377. [17]RAJAPAKSE D,STAN JARZABEK S.Using server pages to unify clones in web applications:a trade off analysis[C]//29th International Conference on Software Engineering(ICSE '07).IEEE Computer Society,2007:116-126. [18]MAYRAND J,LEBLANC C,MERLO E.Experiment on theAutomatic Detection of Function Clones in a SoftwareSystem Using Metrics[C]//ICSM.1996:244. [19]LAGÜE B,PROULX D,MERLO E,et al.Assessing the bene-fits of incorporating function clone detection in a development process[C]//Proceedings of the International Conference on Software Maintenance 1997.1997:314-321. [20]BAKER B.On finding duplication and near-duplication in large software systems[C]//Proceedings of the 2nd Working Conference on Reverse Engineering (WCRE'95).1995:86-95. [21]LI Z,LU S,MYAGMAR S,et al.CP-Miner:finding copy-paste and related bugs in large-scale software code[J].IEEE Transactions on Software Engineering,2006,32(3):176-192. [22]JIANG L,MISHERGHI G,SU Z,et al.Deckard:Scalable and accurate tree-based detection of code clones[C]//29th International Conference on Software Engineering (ICSE'07).2007:96-105. [23]GABEL M,JIANG L,SU Z.Scalable detection of semanticclones[C]//Proceedings of the 30th International Conference on Software Engineering.2008:321-330. [24]SAHA R K,ASADUZZAMAN M,ZIBRANM F,et al.Evaluating code clone genealogies at release level:An empirical study[C]//2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.2010:87-96. [25]PRECHELT L,MALPOHL G,PHILIPPSEN M.Finding plagiarisms among a set of programs with JPlag[J].Journal of Universal Computer Science,2002,8(11):1016. [26]LI J,ERNST M D.CBCD:Cloned buggy code detector[C]//2012 34th International Conference on Software Engineering (ICSE).2012:310-320. [27]JIANG L,SU Z,CHIU E.Context-based detection of clone-related bugs[C]//Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'07).2007:55-64. [28]MONDAL M,ROY C K,SCHNEIDERK A.Dispersion of changes in cloned and non-cloned code[C]//2012 6th International Workshop on Software Clones (IWSC).2012:29-35. [29]LUO L,MING J,WU D,et al.Semantics-based obfuscation-resilient binary code similarity comparison with applications to software and algorithm plagiarism detection[J].IEEE Transactions on Software Engineering,2017,43(12):1157-1177. [30]RAGKHITWETSAGUL C,KRINKE J.Using compilation/decompilation to enhance clone detection[C]//2017 IEEE 11th International Workshop on Software Clones (IWSC).2017:1-7. [31]ROY C K,CORDY J R.NICAD:Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization[C]//16th IEEE International Conference on Program Comprehension.2008:172-181. [32]WHALE G.Plague:plagiarism detection using program struc-ture[D].University of New South Wales,1988. [33]DUALA-EKOKO E,ROBILLARD M P.Clonetracker:tool support for code clone management[C]//Proceedings of the 30th International Conference on Software Engineering.2008:843-846. [34]BELLON S,KOSCHKE R,ANTONIOL G,et al.Comparisonand evaluation of clone detection tools[J].IEEE Transactions on Software Engineering,2007,33(9):577-591. [35]SVAJLENKO J,ROY C K.Evaluating clone detection toolswith bigclonebench[C]//2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).2015:131-140. [36]SVAJLENKO J,ROY C K.Bigcloneeval:A clone detection tool evaluation framework with bigclonebench[C]//Proc.ICSME.2016:596-600. [37]MOU L,LI G,ZHANG L,et al.Convolutional neural networks over tree structures for programming language processing[C]//Thirtieth AAAI Conference on Artificial Intelligence.2016:1287-1292. [38]DUCASSE S,RIEGER M,DEMEYER S.A Language Inde-pendent Approach for Detecting Duplicated Code[C]//Proc.Int',l Conf.Software Maintenance.1999:109-118. [39]LEE S,JEONG I.SDD:high performance code clone detection system for large scale source code[C]//Companion to the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming,Systems,Languages,and Applications.2005:140-141. [40]KAMIYA T,KUSUMOTO S,INOUE K.CCFinder:a multilin-guistic token-based code clone detection system for large scale source code[J].IEEE Transactions on Software Engineering,2002,28(7):654-670. [41]LIVIERI S,HIGO Y,MATUSHITA M,et al.Very-large scale code clone analysis and visualization of open source programs using distributed CCFinder:D-CCFinder[C]//Proc.ICSE.2007:106-115. [42]SASAKI Y,YAMAMOTO T,HAYASE Y,et al.Finding fileclones in FreeBSD ports collection[C]//IEEE Working Conference on Mining Software Repositories (MSR 2010).2010:102-105. [43]YUAN Y,GUO Y.Boreas:an accurate and scalable token-based approach to code clone detection[C]//Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.2012:286-289. [44]SAJNANI H,SAINI V,SVAJLENKO J,et al.SourcererCC:Scaling code clone detection to big-code[C]//Proceedings of the 38th International Conference on Software Engineering.2016:1157-1168. [45]GÖDE N,KOSCHKE R.Incremental clone detection[C]//13th EuropeanConference on Software Maintenance and Reenginee-ring (CSMR).2009:219-228. [46]GHOFRANI J,MOHSENI M,BOZORGMEHR A.A conceptual framework for clone detection using machine learning[C]//2017 IEEE 4th International Conference on Knowledge-Based Engineering and Innovation (KBEI).2017:810-817. [47]WANG P,SVAJLENKO J,WU Y,et al.CCAligner:a tokenbased large-gap clone detector[C]//Proceedings of the 40th International Conference on Software Engineering.2018:1066-1077. [48]LI L,FENG H,ZHUANG W,et al.Cclearner:A deep learning-based clone detection approach[C]//2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).2017:249-260. [49]HOTTA K H,HIGO Y.Gapped code clone detection with lightweight source code analysis[C]//2013 21st International Conference on Program Comprehension (ICPC).2013:93-102. [50]MURAKAMI H,HOTTA K,HIGO Y,et al.Folding repeated instructions for improving token-based code clone detection[C]//2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.2012:64-73. [51]WHITE M,TUFANO M,VENDOME C,et al.Deep learning code fragments for code clone detection[C]//2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).2016:87-98. [52]WEI H,LI M.Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code[C]//IJCAI.2017:3034-3040. [53]ZHANG J,WANG X,ZHANG H,et al.A novel neural source code representation based on abstract syntax tree[C]//2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).2019:783-794. [54]KOSCHKE R,FALKE R,FRENZEL P.Clone detection using abstract syntax suffix trees[C]//2006 13th Working Conference on Reverse Engineering.2006:253-262. [55]FU D,XU Y,YU H,et al.WASTK:a weighted abstract syntax tree kernel method for source code plagiarism detection[J/OL].Scientific Programming.https://doi.org/10.1155/2017/7809047. [56]WAHLER V,SEIPEL D,WOLFF J,et al.Clone detection insource code by frequent itemset techniques[C]//4th IEEE International Workshop on IEEE .Source Code Analysis and Manipulation.2004:128-135. [57]ZENG J,BEN K,LI X,et al.Fast code clone detection based on weighted recursive autoencoders[J].IEEE Access,2019.7:125062-125078. [58]KOMONDOOR R,HORWITZ S.Using slicing to identify duplication in source code[C]//International Static Analysis Symposium.Springer.2001:40-56. [59]HIGO Y,YASUSHI U,NISHINO M,et al.Incremental code clone detection:A pdg-based approach[C]//18th Working Conference on Reverse Engineering.2011:3-12. [60]ZHAO G,HUANG J.Deepsim:Deep learning code functional similarity[C]//Proc.ESEC/FSE.2018:141-151. [61]WU Y,ZOU D,DOU S,et al.SCDetector:Software Functional Clone Detection Based on Semantic Tokens Analysis[C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).2020:821-833. [62]CHEN K,LIU P,ZHANG Y.Achieving accuracy and scalability simultaneously in detecting application clones on android markets[C]//Proceedings of the 36th International Conference on Software Engineering.2014:175-186. [63]XU X,LIU C,FENG Q,et al.Neural network-based graph embedding for cross-platform binary code similarity detection[C]//Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.2017:363-376. [64]ZOU Y,BAN B,XUE Y,et al.CCGraph:a PDG-based codeclone detector with approximate graph matching[C]//2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).2020:931-942. [65]KONTOGIANNIS,KOSTAS A.Pattern matching for clone and concept detection.Reverse engineering[J]. Automated Software Engineering,1996,3(1):77-108. [66]SHAWKY D M,ALI A F.An approach for assessing similarity metrics used in metric-based clone detection techniques[C]//2010 3rd International Conference on Computer Science and Information Technology.2010:580-584. [67]YUAN Y,GUO Y.CMCD:Count matrix based code clone detection[C]//2011 18th Asia-Pacific Software Engineering Conference.2011:250-257. [68]SAINI V,FARMAHINIFARAHANI F,LU Y,et al.Oreo:Detection of clones in the twilight zone[C]//Proc.ESEC/FSE.2018:354-365. [69]DAVEY N,BARSON P,FIELDS,et al.The development of a software clone detector[J].International Journal of Applied Software Technology,1995,1(3/4):219-236. [70]ABD-EL-HAFIZ S K.A metrics-based data mining approach for software clone detection[C]//2012 IEEE 36th Annual Computer Software and Applications Conference.2012:35-41. [71]LI,SUN J L.A metric space based software clone detection approach[C]//IEEE International Conference on Information Management & Engineering.2010:393-397. [72]RAGKHITWETSAGUL C,KRINKE J,MARNETTEB.A picture is worth a thousand words:Code clone detection based on image similarity[C]//2018 IEEE 12th International Workshop on Software Clones (IWSC).2018:44-50. [73]KIM H,JUNG Y,KIM S,et al.MeCC:memory comparison-based clone detector[C]//Proceedings of the33rd International Conference on Software Engineering(ICSE 2011).2011:301-310. [74]GRANT S,CORDYJ R.Vector space analysis of software clones[C]//2009 IEEE 17th International Conference on Program Comprehension.2009:233-237. [75]WAHLER V,SEIPEL D,WOLFF J,et al.Clone detection insource code by frequent itemset techniques[C]//4th IEEE International Workshop on Source Code Analysis and Manipulation.2004:128-135. [76]FANG C,LIU Z,SHI Y,et al.Functional code clone detection with syntax and semantics fusion learning[C]//Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis.2020:516-527. [77]TUFANO M,WATSON C,BAVOTA G,et al.Deep learningsimilarities from different representations of source code[C]//2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).2018:542-553. [78]HARTMANN B,MACDOUGALL D,BRANDT J,et al.What would other programs do? Suggesting solutions to error messages[C]//Proc.SIGCHI Conference on Human Factors in Computing Systems.2010:1019-1028. [79]KIM S,WOO S,LEE H,et al.VUDDY:A Scalable Approach for Vulnerable Code Clone Discovery[C]//2017 IEEE Symposium on Security and Privacy (SP).2017:595-614 [80]MALETIC J,MARCUS A.Supporting program comprehension using semantic and structural information[C]//ICSE '01.2001:103-112. [81]BRUNTINK M,VAN DEURSEN A,VAN ENGELENR T,et al.On the use of clone detection for identifying crosscutting concern code[J].IEEE Transactions on Software Engineering,2005,31(10):804-818. [82]ZIBRAN M F,ROY C K.IDE-based real-time focused search for near-miss clones[C]//Proc.of the 27th Annual ACM Symposium on Applied Computing.2012:1235-1242. [83]HIGO Y,KAMIYA T,KUSUMOTO S,et al.ARIES:Refactoring support environment based on code clone analysis[C]//IASTED Conf.on Software Engineering and Applications.2004:222-229. [84]ZIBRAN M F,ROYC K.A constraint programming approach to conflict-aware optimal scheduling of prioritized code clone refactoring[C]//2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.2011:105-114. [85]DING S H H,FUNG B C M,CHARLAND P.Asm2vec:Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization[C]//2019 IEEE Symposium on Security and Privacy (SP).2019:472-489. [86]KAWAGUCHI S,YAMASHINA T,UWANO H,et al.Shinobi:A tool forutomatic code clone detection in the ide[C]//2009 6th Working Conference on Reverse Engineering.2009:313-314. [87]NARASIMHAN K.Clone merge an eclipse plugin toabstractnear-clone c++ methods[C]//2015 30thIEEE/ACM International Conference on Automated Software Engineering (ASE).2015:819-823. [88]TOOMEY W.Ctcompare:Code clone detection usinghashed token sequences[C]//2012 6th InternationalWorkshop on Software Clones (IWSC).2012:92-93. [89]SVAJLENKO J,ROY C K.Cloneworks:A fast andflexiblelarge-scale near-miss clone detection tool[C]//2017 IEEE/ACM 39th International Conference onSoftware Engineering Companion (ICSE-C).2017:177-179. [90]ABID S,JAVED S,NASEEM M,et al.Codeease:harnessingmethod clonestructures for reuse[C]//2017 IEEE 11th International Workshop on Software Clones (IWSC).2017:1-7. [91]SEMURA Y,YOSHIDA N,CHOI E,et al.Ccfindersw:Clonedetection tool with flexiblemultilingual tokenization[C]//2017 24th Asia-PacificSoftware Engineering Conference (APSEC).2017:654-659. [92]CORDY J R,ROY C K.The nicad clone detector[C]//2011 IEEE 19th International Conference on Program Comprehension.2011:219-220. [93]XING Z,XUE Y,JARZABEK S.Clonedifferentiator:Analyzing clones by differentiation[C]//2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).2011:576-579. [94]KAMIYA T.Agec:An execution-semantic clonedetection tool[C]//2013 21st International Conferenceon Program Comprehension (ICPC).2013:227-229. [95]VISLAVSKI T,RAKÍ C G,CARDOZO N,et al.Licca:A tool for cross-language clone detection[C]//2018 IEEE 25th International Conference on SoftwareAnalysis,Evolution and Reengineering (SANER).2011:512-516. [96]MOSTAEEN G,SVAJLENKO J,ROY B,et al.on the use ofmachine learning techniques towards the design ofcloud based automatic code clone validation tools[C]//2018 IEEE 18th International Working Conference onSource Code Analysis and Manipulation(SCAM).2018:155-164. [97]TANG W,CHEN D,LUO P.Bcfinder:Alightweight and platform-independent tool to findthird-party components in binaries[C]//2018 25thAsia-Pacific Software Engineering Conference(APSEC).2018:288-297. [98]KRUTZ D E,SHIHAB E.Cccd:Concolic code clonedetection[C]//2013 20th Working Conference onReverse Engineering (WCRE).2013:489-490. [99]HUMMEL B,JUERGENS E,HEINEMANN L,et al.Index-based code clone detection:incremental,distributed,scalable[C]//2010 IEEE International Conference on Software Maintenance.2010:1-9. [100]BIEGEL B,DIEHL S.JCCD:a flexible and extensible API for implementing custom code clone detectors[C]//Proceedings of the IEEE/ACM International Conference on Automated Software Engineering.2010:167-168. [101]MOSTAEEN G,SVAJLENKO J,ROY B,et al.Clonecognition:Machine learning based code clone validation tool[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.ESEC/FSE,2019:1105-1109. [102]AVERSANO L,CERULO L,DI PENTAM.How clones aremaintained:An empirical study[C]//11th European Conference on Software Maintenance and Reengineering (CSMR'07).2007:81-90. [103]DANG Y,KHAN S,ZHANG D,et al.Code clone notification and architectural change visualization:U.S[J].Patent Application 12/972,535.2012. [104]KIM M,SAZAWAL V,NOTKIND,et al.An empirical study of code clone genealogies[C]//Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering.2005:187-196. [105]NAFI K W,KAR T S,ROY B,et al.CLCDSA:cross Language code clone detection using syntactical features and API documentation[C]//2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).2019:1026-1037. [106]MATHEW G,PARNIN C,STOLEE K T.SLACC:simion-based language agnostic code clones[C]//Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering.2020:210-221. [107]ZOU X.A survey on application of knowledge graph.Journal of Physics:Conference Series[J].IOP Publishing,2020,1487(1):012016. [108]ANG Y,ZHANG D,GE S,et al.Transferring code-clone detection and analysis to practice[C]//Proc.ofthe 39th IEEE/ACM Int'l Conf.on Software Engineering in Software Engineering in Practice Track (ICSE-SEIP).2017,34(6):53-62. [109]ABID S.Recommending related functions from API usage-based function clone structures[C]//Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering.2019:1193-1195. [110]ARULKUMARAN K,DEISENROTH M P,BRUNDAGE M,et al.Deep reinforcement learning:A brief survey[J].IEEE Signal Processing Magazine,2017,34(6):26-38. |
[1] | 张丹,罗平. 代码相似性检测方法与工具综述 Survey of Code Similarity Detection Methods and Tools 计算机科学, 2020, 47(3): 5-10. https://doi.org/10.11896/jsjkx.190500148 |
[2] | 张凌浩, 桂盛霖, 穆逢君, 王胜. 基于后缀树的二进制可执行代码的克隆检测算法 Clone Detection Algorithm for Binary Executable Code with Suffix Tree 计算机科学, 2019, 46(10): 141-147. https://doi.org/10.11896/jsjkx.180801573 |
[3] | 董加星,许畅. 一种面向功能类似程序的高效克隆检测技术 Efficient Clone Detection Technique for Functionally Similar Programs 计算机科学, 2017, 44(4): 12-15. https://doi.org/10.11896/j.issn.1002-137X.2017.04.003 |
[4] | 梁正平,谭佳加,程一群,马骁驰. 大型模型克隆检测技术研究 Research on Clone Detection for Large-scale Model 计算机科学, 2012, 39(4): 28-31. |
|