{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,6]],"date-time":"2024-09-06T08:46:22Z","timestamp":1725612382573},"reference-count":48,"publisher":"MDPI AG","issue":"12","license":[{"start":{"date-parts":[[2020,12,10]],"date-time":"2020-12-10T00:00:00Z","timestamp":1607558400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"name":"the Foundation for Assistance to Small Innovative Enterprises","award":["#334 GUTSES8-D3\/56686","FEWM-2020-0037"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Symmetry"],"abstract":"Many open-source projects are developed by the community and have a common basis. The more source code is open, the more the project is open to contributors. The possibility of accidental or deliberate use of someone else\u2019s source code as a closed functionality in another project (even a commercial) is not excluded. This situation could create copyright disputes. Adding a plagiarism check to the project lifecycle during software engineering solves this problem. However, not all code samples for comparing can be found in the public domain. In this case, the methods of identifying the source code author can be useful. Therefore, identifying the source code author is an important problem in software engineering, and it is also a research area in symmetry. This article discusses the problem of identifying the source code author and modern methods of solving this problem. Based on the experience of researchers in the field of natural language processing (NLP), the authors propose their technique based on a hybrid neural network and demonstrate its results both for simple cases of determining the authorship of the code and for those complicated by obfuscation and using of coding standards. The results show that the author\u2019s technique successfully solves the essential problems of analogs and can be effective even in cases where there are no obvious signs indicating authorship. The average accuracy obtained for all programming languages was 95% in the simple case and exceeded 80% in the complicated ones.<\/jats:p>","DOI":"10.3390\/sym12122044","type":"journal-article","created":{"date-parts":[[2020,12,11]],"date-time":"2020-12-11T03:15:36Z","timestamp":1607656536000},"page":"2044","source":"Crossref","is-referenced-by-count":18,"title":["Source Code Authorship Identification Using Deep Neural Networks"],"prefix":"10.3390","volume":"12","author":[{"given":"Anna","family":"Kurtukova","sequence":"first","affiliation":[{"name":"Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2587-2222","authenticated-orcid":false,"given":"Aleksandr","family":"Romanov","sequence":"additional","affiliation":[{"name":"Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Alexander","family":"Shelupanov","sequence":"additional","affiliation":[{"name":"Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,10]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Kurtukova, A., Romanov, A., and Fedotova, A. (2019, January 25\u201327). De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms. Proceedings of the 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), Yekaterinburg, Russia.","DOI":"10.1109\/SIBIRCON48586.2019.8958026"},{"key":"ref_2","first-page":"741","article-title":"Identification author of source code by machine learning methods","volume":"18","author":"Kurtukova","year":"2019","journal-title":"Trudy SPIIRAN"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"596","DOI":"10.18287\/2412-6179-CO-621","article-title":"Automatic text-independent speaker verification using convolutional deep belief network","volume":"44","author":"Rakhmanenko","year":"2020","journal-title":"Comput. Opt."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Kostyuchenko, E.Y., Viktorovich, I., Renko, B., and Shelupanov, A.A. (2018, January 18\u201325). User Identification by the Free-Text Keystroke Dynamics. Proceedings of the 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC), Vladivostok, Russia.","DOI":"10.1109\/RPC.2018.8482190"},{"key":"ref_5","first-page":"82","article-title":"Crimes in the field of high technologies in modern Russia","volume":"2","author":"Nikerov","year":"2019","journal-title":"Bull. East-Sib. Inst. MIA Russ."},{"key":"ref_6","doi-asserted-by":"crossref","unstructured":"Yang, X., Li, Q., Guo, Y., and Zhang, M. (2017). Authorship attribution of source code by using backpropagation neural network based on particle swarm optimization. PLoS ONE, 12.","DOI":"10.1371\/journal.pone.0187204"},{"key":"ref_7","doi-asserted-by":"crossref","unstructured":"Alsulami, B., Dauber, E., Harang, R., Mancoridis, S., and Greenstadt, R. (2017, January 11\u201315). Source Code Authorship Attribution using Long Short-Term Memory Based Networks. Proceedings of the 22nd European Symposium on Research in Computer Security 2017, Oslo, Norway.","DOI":"10.1007\/978-3-319-66402-6_6"},{"key":"ref_8","first-page":"1","article-title":"Identifying authorship by byte-level n-grams: The source code author profile (SCAP) method","volume":"1","author":"Frantzeskou","year":"2007","journal-title":"Int. J. Digit. Evid."},{"key":"ref_9","doi-asserted-by":"crossref","first-page":"61","DOI":"10.1016\/j.diin.2015.09.001","article-title":"Scripting DNA: Identifying the JavaScript Programmer","volume":"15","author":"Wisse","year":"2015","journal-title":"Digit. Investig."},{"key":"ref_10","first-page":"167","article-title":"Determining the authorship of malicious code using the data compression method","volume":"3","author":"Osovetskiy","year":"2013","journal-title":"Softw. Prod. Syst."},{"key":"ref_11","first-page":"27","article-title":"Source Code Author Attribution Using Author\u2019s Programming Style and Code Smells","volume":"5","author":"Zia","year":"2017","journal-title":"Intel. Syst. Appl."},{"key":"ref_12","unstructured":"Caliskan-Islam, A., Harang, R., and Liu, A. (2015, January 12\u201314). Deanonymizing programmers via code stylometry. Proceedings of the 24th USENIX Security Symposium 2015, Washington, DC, USA."},{"key":"ref_13","unstructured":"Caliskan-Islam, A., Dauber, E., and Harang, R. (2017). Git blame who?. arXiv."},{"key":"ref_14","doi-asserted-by":"crossref","unstructured":"Burrows, S., Uitdenbogerd, A., and Turpin, A. (2009, January 21\u201323). Application of information retrieval techniques for source code authorship attribution. Proceedings of the 14th International Conference on Database Systems for Advanced Applications 2009, Brisbane, Australia.","DOI":"10.1007\/978-3-642-00887-0_61"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Wang, N., and Ji, S. (2018, January 19). Integration of Static and Dynamic Code Stylometry Analysis for Programmer De-anonymization. Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security 2018, Toronto, ON, Canada.","DOI":"10.1145\/3270101.3270110"},{"key":"ref_16","doi-asserted-by":"crossref","unstructured":"Abuhamad, M., AbuHmed, T., Mohaisen, A., and Nyang, D. (2018, January 15\u201319). Large-Scale and Language-Oblivious Code Authorship Identification. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada.","DOI":"10.1145\/3243734.3243738"},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv.","DOI":"10.3115\/v1\/D14-1181"},{"key":"ref_18","unstructured":"Zhang, X., Zhao, J., and LeCun, Y. (2016). Character-level Convolutional Networks for Text Classification. arXiv."},{"key":"ref_19","doi-asserted-by":"crossref","unstructured":"Jin, Y., Wu, D., and Guo, W. (2020). Attention-Based LSTM with Filter Mechanism for Entity Relation Classification. Symmetry, 12.","DOI":"10.3390\/sym12101729"},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Nowak, J., Taspinar, A., and Scherer, R. (2017, January 11\u201315). LSTM Recurrent Neural Networks for Short Text and Sentiment Classification. Proceedings of the International Conference on Artificial Intelligence and Soft Computing 2017, Zakopane, Poland.","DOI":"10.1007\/978-3-319-59060-8_50"},{"key":"ref_21","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_22","doi-asserted-by":"crossref","unstructured":"Lai, S., Xu, L., Liu, K., and Zhao, J. (2015, January 25\u201330). Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence 2015 (AAAI\u201915), Austin, TX, USA.","DOI":"10.1609\/aaai.v29i1.9513"},{"key":"ref_23","doi-asserted-by":"crossref","unstructured":"Apaydin, H., Feizi, H., Sattari, M.T., Colak, M.S., Shamshirband, S., and Chau, K.-W. (2020). Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water, 12.","DOI":"10.3390\/w12051500"},{"key":"ref_24","unstructured":"Mangal, S., Joshi, P., and Modak, R. (2020). LSTM vs. GRU vs. Bidirectional RNN for script generation. arXiv."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Xue, X., Feng, J., Gao, Y., Liu, M., Zhang, W., Sun, X., Zhao, A., and Guo, S. (2019). Convolutional Recurrent Neural Networks with a Self-Attention Mechanism for Personnel Performance Prediction. Entropy, 21.","DOI":"10.3390\/e21121227"},{"key":"ref_26","unstructured":"(2020, November 09). Github. Available online: https:\/\/github.com\/."},{"key":"ref_27","doi-asserted-by":"crossref","unstructured":"Szegedy, C., Liu, W., and Jia, Y. (2014). Going Deeper with Convolutions. arXiv.","DOI":"10.1109\/CVPR.2015.7298594"},{"key":"ref_28","unstructured":"Zeiler, M.D. (2012). Adadelta: An adaptive learning rate. arXiv."},{"key":"ref_29","unstructured":"Nwankpa, C., Ijomah, W., Gachagan, A., and Marshall, S. (2018). Activation Functions: Comparison of trends in Practice and Research for Deep Learning. arXiv."},{"key":"ref_30","first-page":"205","article-title":"Techniques of Program Code Obfuscation for Secure Software","volume":"3","author":"Popa","year":"2011","journal-title":"J. Mob. Embed. Distrib. Syst."},{"key":"ref_31","first-page":"38","article-title":"Analysis of the use of obfuscating transformations for software","volume":"3","author":"Buintsev","year":"2015","journal-title":"Inform. Secur. Is."},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Ceccato, M., Di Penta, M., Nagra, J., Falcarin, P., Ricca, F., Torchiano, M., and Tonella, P. (2009, January 17\u201319). The Effectiveness of Source Code Obfuscation: An Experimental Assessment. Proceedings of the IEEE 17th International Conference on Program Comprehension 2009, Vancouver, BC, Canada.","DOI":"10.1109\/ICPC.2009.5090041"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Anckaert, B., Madou, M., Sutter, B., Bus, B., Bosschere, K., and Preneel, B. (2007, January 29). Program Obfuscation: A Quantitative Approach. Proceedings of the 2007 ACM Workshop on Quality of Protection (QoP 2007), Alexandria, VA, USA.","DOI":"10.1145\/1314257.1314263"},{"key":"ref_34","unstructured":"(2020, November 09). The Tigress Diversifying c Virtualizer. Available online: http:\/\/tigress.cs.arizona.edu."},{"key":"ref_35","unstructured":"(2020, November 09). JS Obfuscator Tool. Available online: https:\/\/obfus-cator.io\/."},{"key":"ref_36","unstructured":"(2020, November 09). JS-Obfuscator. Available online: https:\/\/github.com\/cai-guanhao\/js-obfuscator."},{"key":"ref_37","unstructured":"(2020, November 09). Pyarmor. Available online: https:\/\/github.com\/da-shingsoft\/pyarmor."},{"key":"ref_38","unstructured":"(2020, November 09). Opy. Available online: https:\/\/github.com\/QQuick\/Opy."},{"key":"ref_39","unstructured":"(2020, November 09). Yakpro-po. Available online: https:\/\/github.com\/pkfr\/-yakpro-po."},{"key":"ref_40","unstructured":"(2020, November 09). PHP Obfuscator. Available online: https:\/\/github.com\/-naneau\/php-obfuscator."},{"key":"ref_41","unstructured":"(2020, November 09). Cpp Guard. Available online: https:\/\/github.com\/te-chtocore\/Cpp-Guard."},{"key":"ref_42","unstructured":"(2020, November 09). AnalyseC. Available online: https:\/\/github.com\/ryarn-yah\/AnalyseC."},{"key":"ref_43","unstructured":"Martin, R.C. (2009). Clean Code: A Handbook of Agile Software Craftsmanship, Prentice Hall."},{"key":"ref_44","doi-asserted-by":"crossref","first-page":"88","DOI":"10.4236\/jsea.2008.11013","article-title":"Complying with Coding Standards or Retaining Programming Style: A Quality Outlook at Source Code Level","volume":"1","author":"Wang","year":"2008","journal-title":"JSEA"},{"key":"ref_45","unstructured":"(2020, November 09). Linux Kernel. Available online: https:\/\/github.com\/torvalds\/linux."},{"key":"ref_46","doi-asserted-by":"crossref","unstructured":"Li, X., and Prasad, C. (2005, January 20\u201322). Effectively teaching coding standards in programming. Proceedings of the 6th Conference on Information Technology Education\u2014SIGITE 2005, Newark, NJ, USA.","DOI":"10.1145\/1095714.1095770"},{"key":"ref_47","first-page":"2307","article-title":"Using Machine Learning Methods to Establish Program Authorship","volume":"7","author":"Gorshkov","year":"2019","journal-title":"Int. J. Open Inf. Technol."},{"key":"ref_48","doi-asserted-by":"crossref","unstructured":"Fourment, M., and Gillings, M.R. (2008). A comparison of common programming languages used in bioinformatics. BMC Bioinf., 9.","DOI":"10.1186\/1471-2105-9-82"}],"container-title":["Symmetry"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/12\/2044\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T09:31:38Z","timestamp":1720258298000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/2073-8994\/12\/12\/2044"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,10]]},"references-count":48,"journal-issue":{"issue":"12","published-online":{"date-parts":[[2020,12]]}},"alternative-id":["sym12122044"],"URL":"https:\/\/doi.org\/10.3390\/sym12122044","relation":{},"ISSN":["2073-8994"],"issn-type":[{"value":"2073-8994","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,10]]}}}