{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,15]],"date-time":"2024-09-15T00:50:45Z","timestamp":1726361445544},"reference-count":66,"publisher":"IGI Global","isbn-type":[{"value":"9781605668369","type":"print"},{"value":"9781605668376","type":"electronic"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":[],"published-print":{"date-parts":[[2010]]},"abstract":"Nowadays, in a wide variety of situations, source code authorship identification has become an issue of major concern. Such situations include authorship disputes, proof of authorship in court, cyber attacks in the form of viruses, trojan horses, logic bombs, fraud, and credit card cloning. Source code author identification deals with the task of identifying the most likely author of a computer program, given a set of predefined author candidates. We present a new approach, called the SCAP (Source Code Author Profiles) approach, based on byte-level n-grams in order to represent a source code author\u2019s style. Experiments on data sets of different programming-language (Java,C++ and Common Lisp) and varying difficulty (6 to 30 candidate authors) demonstrate the effectiveness of the proposed approach. A comparison with a previous source code authorship identification study based on more complicated information shows that the SCAP approach is language independent and that n-gram author profiles are better able to capture the idiosyncrasies of the source code authors. It is also demonstrated that the effectiveness of the proposed model is not affected by the absence of comments in the source code, a condition usually met in cyber-crime cases.<\/jats:p>","DOI":"10.4018\/978-1-60566-836-9.ch020","type":"book-chapter","created":{"date-parts":[[2010,5,25]],"date-time":"2010-05-25T21:29:11Z","timestamp":1274822951000},"page":"470-495","source":"Crossref","is-referenced-by-count":7,"title":["Source Code Authorship Analysis For Supporting the Cybercrime Investigation Process"],"prefix":"10.4018","author":[{"given":"Georgia","family":"Frantzeskou","sequence":"first","affiliation":[{"name":"University of the Aegean, Greece"}]},{"given":"Stephen G.","family":"MacDonell","sequence":"additional","affiliation":[{"name":"Auckland University of Technology, New Zealand"}]},{"given":"Efstathios","family":"Stamatatos","sequence":"additional","affiliation":[{"name":"University of the Aegean, Greece"}]}],"member":"2432","reference":[{"key":"978-1-60566-836-9.ch020.-1","doi-asserted-by":"publisher","DOI":"10.1109\/MIS.2005.81"},{"key":"978-1-60566-836-9.ch020.-2","unstructured":"Abelson, H., & Sussman, G. J. (1996). Structure and interpretation of computer programs (2nd ed.). Cambridge, MA: MIT Press."},{"key":"978-1-60566-836-9.ch020.-3","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/11.3.121"},{"key":"978-1-60566-836-9.ch020.-4","doi-asserted-by":"publisher","DOI":"10.1103\/PhysRevLett.88.048702"},{"key":"978-1-60566-836-9.ch020.-5","doi-asserted-by":"publisher","DOI":"10.1007\/BF02176636"},{"key":"978-1-60566-836-9.ch020.-6","unstructured":"Cavnar, W. B., Trenkle, J.,M., (1994). N-Gram-based text categorization. In Proceedings of the 1994 Symposium on Document Analysis and Information Retrieval."},{"key":"978-1-60566-836-9.ch020.-7","unstructured":"Chaski, C. (1998). A Daubert-inspired assessment of current techniques for language-based author identification. US National Institute of Justice. Retrieved from http:\/\/www.ncjrs.org."},{"key":"978-1-60566-836-9.ch020.-8","doi-asserted-by":"crossref","unstructured":"Chaski, C. (2001). Empirical evaluations of language-based author identification techniques. Journal of Forensic Linguistics.","DOI":"10.1558\/sll.2001.8.1.1"},{"issue":"1","key":"978-1-60566-836-9.ch020.-9","article-title":"Who\u2019s at the keyboard? Recent results in authorship attribution.","volume":"4","author":"C. E.Chaski","year":"2005","journal-title":"International Journal of Digital Evidence"},{"key":"978-1-60566-836-9.ch020.-10","doi-asserted-by":"publisher","DOI":"10.1002\/prot.20373"},{"key":"978-1-60566-836-9.ch020.-11","doi-asserted-by":"publisher","DOI":"10.1016\/S0164-1212(03)00049-9"},{"key":"978-1-60566-836-9.ch020.-12","unstructured":"Downie, J. S. (1999). Evaluating a simple approach to musical information retrieval: conceiving melodic n-grams as text. Doctoral thesis, University of Western Ontario."},{"key":"978-1-60566-836-9.ch020.-13","first-page":"501","article-title":"Was the Earl of Oxford the true Shakespeare? A computer aided analysis.","volume":"236","author":"W. E. Y.Elliott","year":"1991","journal-title":"Notes and Queries"},{"key":"978-1-60566-836-9.ch020.-14","doi-asserted-by":"crossref","unstructured":"Frank, E., Chui, C., & Witten, I. H. (2000). Text categorization using compression models. In Proceedings of DCC-00, IEEE Data Compression Conference (2000) (pp. 200\u2013209).","DOI":"10.1109\/DCC.2000.838202"},{"key":"978-1-60566-836-9.ch020.-15","unstructured":"Frantzeskou, G., Gritzalis, S., & MacDonell, S. (2004). Source code authorship analysis for supporting the cybercrime investigation process. In Proceedings of the ICETE\u20192004 International Conference on eBusiness and Telecommunication Networks \u2013 Security and Reliability in Information Systems and Networks Track, 2, 85-92. New York: Springer."},{"key":"978-1-60566-836-9.ch020.-16","doi-asserted-by":"publisher","DOI":"10.1016\/j.jss.2007.03.004"},{"key":"978-1-60566-836-9.ch020.-17","unstructured":"Frantzeskou, G., Stamatatos, E., & Gritzalis, S. (2005a). Supporting the digital crime investigation process: effective discrimination of source code authors based on byte-level information. In Proceedings of the ICETE\u20182005 International Conference on eBusiness and Telecommunication Networks \u2013 Security and Reliability in Information Systems and Networks Track. Berlin, Germany: Springer."},{"key":"978-1-60566-836-9.ch020.-18","unstructured":"Frantzeskou, G., Stamatatos, E., & Gritzalis, S. (2005b, July). Source code authorship analysis using n-grams. In Proceedings of the 7th Biennial Conference on Forensic Linguistics, Cardiff, UK"},{"issue":"1","key":"978-1-60566-836-9.ch020.-19","article-title":"Identifying authorship by byte- byte-level n-grams: The source code author profile method.","volume":"6","author":"G.Frantzeskou","year":"2007","journal-title":"International Journal of Digital Evidence"},{"key":"978-1-60566-836-9.ch020.-20","doi-asserted-by":"crossref","unstructured":"Frantzeskou, G., Stamatatos, E., Gritzalis, S., & Katsikas, S. (2006a). Effective identification of source code authors using byte-level information. In B. Cheng & B. Shen (Eds.), Proceedings of the 28th International Conference on Software Engineering ICSE 2006 - Emerging Results Track, Shanghai, China. New York: ACM Press.","DOI":"10.1145\/1134285.1134445"},{"key":"978-1-60566-836-9.ch020.-21","doi-asserted-by":"crossref","unstructured":"Frantzeskou, G., Stamatatos, E., Gritzalis, S., & Katsikas, S. (2006b). Source code author identification based on n-gram author profiles. In Proceedings of 3rd IFIP Conference on Artificial Intelligence Applications & Innovations (AIAI'06) (pp. 508-515). Berlin, Germany: Springer.","DOI":"10.1007\/0-387-34224-9_59"},{"key":"978-1-60566-836-9.ch020.-22","doi-asserted-by":"crossref","unstructured":"Ganapathiraju, M., Weisser, D., Rosenfeld, R., Carbonell, J., Reddy, R., & Klein-Seetharaman, J. (2002). Comparative n-gram analysis of whole-genome protein sequences. In Proceedings of the Human Language Technologies Conference (HLT\u201902), San Diego.","DOI":"10.3115\/1289189.1289259"},{"key":"978-1-60566-836-9.ch020.-23","unstructured":"Gray, A., Sallis, P., & MacDonell, S. (1997). Software forensics: Extending authorship analysis techniques to computer programs. In Proc. 3rd Biannual Conf. Int. Assoc. of Forensic Linguists (IAFL'97) (pp. 1-8)."},{"key":"978-1-60566-836-9.ch020.-24","unstructured":"Gray, A., Sallis, P., & MacDonell, S. (1998). Identified: A dictionary-based system for extracting source code metrics for software forensics. In Proceedings of SE:E&P\u201998 (pp. 252\u2013259). Washington, DC: IEEE Computer Society Press."},{"key":"978-1-60566-836-9.ch020.-25","doi-asserted-by":"publisher","DOI":"10.1016\/0020-0271(74)90015-1"},{"key":"978-1-60566-836-9.ch020.-26","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/10.2.111"},{"key":"978-1-60566-836-9.ch020.-27","doi-asserted-by":"publisher","DOI":"10.2307\/2982671"},{"key":"978-1-60566-836-9.ch020.-28","doi-asserted-by":"crossref","unstructured":"Juola, P. (2006). Authorship attribution for electronic documents. In M. Olivier & S. Shenoi (Eds.), Advances in Digital Forensics II (pp. 119-130). New York: Springer.","DOI":"10.1007\/0-387-36891-4_10"},{"key":"978-1-60566-836-9.ch020.-29","unstructured":"Keselj, V., Peng, F., Cercone, N., & Thomas, C. (2003). N-gram based author profiles for authorship attribution. In Proceedings of Pacific Association for Computational Linguistics."},{"key":"978-1-60566-836-9.ch020.-30","doi-asserted-by":"crossref","unstructured":"Khmelev, D., & Teahan, W. (2003). A repetition based measure for verification of text collections and for text categorization. In Proceedings of the 26th ACM SIGIR 2003 (pp. 104-110).","DOI":"10.1145\/860435.860456"},{"key":"978-1-60566-836-9.ch020.-31","first-page":"865","article-title":"A fuzzy logic approach to computer software source code authorship analysis. In","volume":"97","author":"R. I.Kilgour","year":"1998","journal-title":"Proceedings of ICONIP"},{"key":"978-1-60566-836-9.ch020.-32","unstructured":"Knuth, D. E. (1997). The art of computer programming, vol. 1 (3rd ed.). Boston: Addison-Wesley."},{"key":"978-1-60566-836-9.ch020.-33","doi-asserted-by":"publisher","DOI":"10.1002\/(SICI)1097-4571(1999)50:14<1295::AID-ASI4>3.0.CO;2-5"},{"key":"978-1-60566-836-9.ch020.-34","doi-asserted-by":"crossref","unstructured":"Kothari, J., Shevertalov, M., Stehle, E., & Mancoridis, S. (2007). A probabilistic approach to source code authorship identification. In Proceedings of Third International Conference on Information Technology New Generations (ITNG 2007).","DOI":"10.1109\/ITNG.2007.17"},{"key":"978-1-60566-836-9.ch020.-35","unstructured":"Krsul, I., & Spafford, E. H. (1995). Authorship analysis: Identifying the author of a program. In Proceedings of 8th National Information Systems Security Conference, National Institute of Standards and Technology, 514-524."},{"key":"978-1-60566-836-9.ch020.-36","unstructured":"Lamkins, D. (2004). Successful Lisp: How to understand and use common Lisp. Retrieved from http:\/\/psg.com\/~dlamkins\/sl\/"},{"key":"978-1-60566-836-9.ch020.-37","doi-asserted-by":"crossref","unstructured":"Lange, R., & Mancoridis, S. (2007). Using code metric histograms and genetic algorithms to perform author identification for software forensics. In Proceedings of Genetic and Evolutionary Computation Conference (GECCO 2007), Track Real-World Applications 5.","DOI":"10.1145\/1276958.1277364"},{"key":"978-1-60566-836-9.ch020.-38","doi-asserted-by":"publisher","DOI":"10.1016\/0167-4048(93)90013-U"},{"key":"978-1-60566-836-9.ch020.-39","doi-asserted-by":"publisher","DOI":"10.1007\/BF01829876"},{"issue":"1","key":"978-1-60566-836-9.ch020.-40","first-page":"34","article-title":"Software forensics: Extending authorship analysis techniques to computer programs.","volume":"13","author":"S. G.MacDonell","year":"2002","journal-title":"Journal of Law and Information Science"},{"key":"978-1-60566-836-9.ch020.-41","first-page":"113","article-title":"Software forensics applied to the task of discriminating between program authors.","volume":"10","author":"S. G.MacDonell","year":"2001","journal-title":"Journal of Systems Research and Information Systems"},{"key":"978-1-60566-836-9.ch020.-42","doi-asserted-by":"crossref","unstructured":"Marceau, C. (2000). Characterizing the behaviour of a program using multiple-length n-grams. In Proceedings of the 2000 Workshop on New Security Paradigms (pp. 101-110).","DOI":"10.1145\/366173.366197"},{"key":"978-1-60566-836-9.ch020.-43","doi-asserted-by":"publisher","DOI":"10.1126\/science.ns-9.214S.237"},{"key":"978-1-60566-836-9.ch020.-44","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/11.1.19"},{"key":"978-1-60566-836-9.ch020.-45","unstructured":"Merriam-Webster. (1992). Webster's 7th collegiate dictionary. Springfield, MA: Merriam-Webster."},{"key":"978-1-60566-836-9.ch020.-46","unstructured":"Miller, G. A. 1991. The science of words. New York: Scientific American Library."},{"issue":"1","key":"978-1-60566-836-9.ch020.-47","doi-asserted-by":"crossref","first-page":"54","DOI":"10.1109\/TPC.1975.6593963","article-title":"Computer detection of typographical errors.","volume":"18","author":"A.Morris","year":"1975","journal-title":"IEEE Transactions on Professional Communication"},{"key":"978-1-60566-836-9.ch020.-48","unstructured":"Mosteller, F., & Wallace, D. L. (1964). Inference and disputed authorship: The Federalist. Reading, MA: Addison-Wesley."},{"key":"978-1-60566-836-9.ch020.-49","unstructured":"Norvig, P., & Pitman, K. (1993). Tutorial on good Lisp programming style. In Proceedings of Lisp users and Vendors conference."},{"key":"978-1-60566-836-9.ch020.-50","doi-asserted-by":"crossref","unstructured":"Oman, P., & Cook, C. (1989). Programming style authorship analysis. In Seventeenth Annual ACM Science Conference Proceedings. New York: ACM.","DOI":"10.1145\/75427.75469"},{"key":"978-1-60566-836-9.ch020.-51","doi-asserted-by":"publisher","DOI":"10.1023\/B:INRT.0000011209.19643.e2"},{"key":"978-1-60566-836-9.ch020.-52","unstructured":"Sallis, P., Aakjaer, A., & MacDonell, S. (1996). Software forensics: Old methods for a new science. In Proceedings of SE:E&P\u201996, Dunedin, New Zealand (pp. 367-371). Washington, DC: IEEE Computer Society Press."},{"key":"978-1-60566-836-9.ch020.-53","unstructured":"Schank, R. (1982). Dynamic memory: A theory of reminding and learning in computers and people. Cambridge, UK: Cambridge University Press."},{"key":"978-1-60566-836-9.ch020.-54","doi-asserted-by":"publisher","DOI":"10.1142\/S0218348X93000083"},{"key":"978-1-60566-836-9.ch020.-55","doi-asserted-by":"crossref","unstructured":"Seibel, P. (2005). Practical common Lisp. Retrieved from http:\/\/www.gigamonkeys.com\/book\/","DOI":"10.1007\/978-1-4302-0017-8"},{"key":"978-1-60566-836-9.ch020.-56","doi-asserted-by":"publisher","DOI":"10.1109\/32.637387"},{"key":"978-1-60566-836-9.ch020.-57","doi-asserted-by":"publisher","DOI":"10.1145\/66093.66095"},{"key":"978-1-60566-836-9.ch020.-58","doi-asserted-by":"publisher","DOI":"10.1016\/0167-4048(93)90055-A"},{"key":"978-1-60566-836-9.ch020.-59","doi-asserted-by":"publisher","DOI":"10.1162\/089120100750105920"},{"key":"978-1-60566-836-9.ch020.-60","doi-asserted-by":"publisher","DOI":"10.1023\/A:1002681919510"},{"key":"978-1-60566-836-9.ch020.-61","doi-asserted-by":"publisher","DOI":"10.1023\/A:1001749303137"},{"key":"978-1-60566-836-9.ch020.-62","doi-asserted-by":"crossref","unstructured":"Vel, O., Anderson, A., Corney, M., & Mohay, G. (2001). Mining E-mail content for author identification forensics. Proceedings of ACM SIGMOD Record, 30(4).","DOI":"10.1145\/604264.604272"},{"key":"978-1-60566-836-9.ch020.-63","first-page":"363","article-title":"On sentence-length as a statistical characteristic of style in prose, with applications to two cases of disputed authorship.","volume":"30","author":"G. U.Yule","year":"1938","journal-title":"Biometrika"},{"key":"978-1-60566-836-9.ch020.-64","unstructured":"Yule, G. U. (1944). The statistical study of literary vocabulary. Cambridge, UK: Cambridge University Press."},{"key":"978-1-60566-836-9.ch020.-65","doi-asserted-by":"crossref","unstructured":"Zheng, R., Qin, Y., Huang, Z., & Chen, H. (2003). Authorship analysis in cybercrime investigation. In NSF\/NIJ Symposium on Intelligence and Security Informatics (ISI'03), Tucson, Arizona. Berlin, Germany: Springer-Verlag.","DOI":"10.1007\/3-540-44853-5_5"},{"key":"978-1-60566-836-9.ch020.-66","doi-asserted-by":"crossref","unstructured":"Zipf, G. K. (1932). Selected studies of the principle of relative frequency in language. Cambridge, MA: Harvard University Press.","DOI":"10.4159\/harvard.9780674434929"}],"container-title":["Advances in Digital Crime, Forensics, and Cyber Terrorism","Handbook of Research on Computational Forensics, Digital Crime, and Investigation"],"original-title":[],"link":[{"URL":"https:\/\/www.igi-global.com\/viewtitle.aspx?TitleId=39230","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2021,9,15]],"date-time":"2021-09-15T23:08:28Z","timestamp":1631747308000},"score":1,"resource":{"primary":{"URL":"http:\/\/services.igi-global.com\/resolvedoi\/resolve.aspx?doi=10.4018\/978-1-60566-836-9.ch020"}},"subtitle":[""],"short-title":[],"issued":{"date-parts":[[2010]]},"ISBN":["9781605668369","9781605668376"],"references-count":66,"URL":"https:\/\/doi.org\/10.4018\/978-1-60566-836-9.ch020","relation":{},"ISSN":["2327-0381","2327-0373"],"issn-type":[{"value":"2327-0381","type":"print"},{"value":"2327-0373","type":"electronic"}],"subject":[],"published":{"date-parts":[[2010]]}}}