Abstract
R is a programming language and software environment for performing statistical computations and applying data analysis that increasingly gains popularity among practitioners and scientists. In this paper we present a preliminary version of a system to detect pairs of similar R code blocks among a given set of routines, which bases on a proper aggregation of the output of three different [0,1]-valued (fuzzy) proximity degree estimation algorithms. Its analysis on empirical data indicates that the system may in future be successfully applied in practice in order e.g. to detect plagiarism among students’ homework submissions or to perform an analysis of code recycling or code cloning in R’s open source packages repositories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aiken, A.: MOSS (Measure of software similarity) plagiarism detection system, http://theory.stanford.edu/~aiken/moss/
Chilowicz, M., Duris, E., Roussel, G.: Viewing functions as token sequences to highlight similarities in source code. Science of Computer Programming 78, 1871–1891 (2013)
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3), 171–176 (1964)
Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program Lang. Syst. 9(3), 319–349 (1987)
Fodor, J., Roubens, M.: Fuzzy Preference Modelling and Multicriteria Decision Support. Springer (1994)
Gagolewski, M., Grzegorzewski, P.: Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals. International Journal of Approximate Reasoning 52(9), 1312–1324 (2011)
Grabisch, M., Marichal, J.L., Mesiar, R., Pap, E.: Aggregation functions. Cambridge University Press (2009)
Hamming, R.W.: Error detecting and error correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
Lee, C.Y.: Some properties of nonbinary error-correcting codes. IRE Transactions on Information Theory 4(2), 77–82 (1958)
Levenshtein, I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10(8), 707–710 (1966)
Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis. In: Proc. 12th ACM SIGKDD Intl. Conf. Knowledge Discovery and Data Mining (KDD 2006), pp. 872–881 (2006)
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(1), 31–88 (2001)
Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. Journal of Universal Computer Science 8(11), 1016–1038 (2002)
Prechelt, L., Malpohl, G., Phlippsen, M.: JPlag: Finding plagiarisms among a set of programs. Tech. rep. (2000)
Qu, W., Jia, Y., Jiang, M.: Pattern mining of cloned codes in software systems. Information Sciences 259, 544–554 (2014)
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2014), http://www.R-project.org/
Winkler, W.E.: String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In: Proc. Section on Survey Research Methods (ASA), pp. 354–359 (1990)
Wise, M.J.: String similarity via greedy string tiling and running Karp-Rabin matching. Tech. rep., Dept. of Computer Science, University of Sydney (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bartoszuk, M., Gagolewski, M. (2014). A Fuzzy R Code Similarity Detection Algorithm. In: Laurent, A., Strauss, O., Bouchon-Meunier, B., Yager, R.R. (eds) Information Processing and Management of Uncertainty in Knowledge-Based Systems. IPMU 2014. Communications in Computer and Information Science, vol 444. Springer, Cham. https://doi.org/10.1007/978-3-319-08852-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-08852-5_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08851-8
Online ISBN: 978-3-319-08852-5
eBook Packages: Computer ScienceComputer Science (R0)