{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,6]],"date-time":"2024-09-06T10:59:25Z","timestamp":1725620365064},"reference-count":35,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2020,12,25]],"date-time":"2020-12-25T00:00:00Z","timestamp":1608854400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100012190","name":"Ministry of Science and Higher Education of the Russian Federation","doi-asserted-by":"publisher","award":["Government Order for 2020\u20132022, project no. FEWM-2020-0037 (TUSUR)"],"id":[{"id":"10.13039\/501100012190","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Future Internet"],"abstract":"The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models\u2019 accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.<\/jats:p>","DOI":"10.3390\/fi13010003","type":"journal-article","created":{"date-parts":[[2020,12,25]],"date-time":"2020-12-25T14:30:19Z","timestamp":1608906619000},"page":"3","source":"Crossref","is-referenced-by-count":17,"title":["Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks"],"prefix":"10.3390","volume":"13","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-2587-2222","authenticated-orcid":false,"given":"Aleksandr","family":"Romanov","sequence":"first","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Anna","family":"Kurtukova","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Alexander","family":"Shelupanov","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0001-7844-4363","authenticated-orcid":false,"given":"Anastasia","family":"Fedotova","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Valery","family":"Goncharov","sequence":"additional","affiliation":[{"name":"Department of Automation and Robotics, The National Research Tomsk Polytechnic University, 634050 Tomsk, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2020,12,25]]},"reference":[{"key":"ref_1","unstructured":"Romanov, A.S., Shelupanov, A.A., and Meshcheryakov, R.V. Development and Research of Mathematical Models, Methods and Software Tools of Information Processes in the Identification of the Author of the Text, Tomsk: V-Spektr, 2011."},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Kurtukova, A., Romanov, A., and Fedotova, A. (2019, January 21\u201327). De-Anonymization of the Author of the Source Code Using Machine Learning Algorithms. Proceedings of the 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON), Novosibirsk, Russia.","DOI":"10.1109\/SIBIRCON48586.2019.8958026"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"741","DOI":"10.15622\/sp.2019.18.3.741-765","article-title":"Identification author of source code by machine learning methods","volume":"18","author":"Kurtukova","year":"2019","journal-title":"SPIIRAS Proc."},{"key":"ref_4","doi-asserted-by":"crossref","first-page":"596","DOI":"10.18287\/2412-6179-CO-621","article-title":"Automatic text-independent speaker verification using convolutional deep belief network","volume":"44","author":"Rakhmanenko","year":"2020","journal-title":"Comput. Opt."},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Kostyuchenko, E.Y., Viktorovich, I., Renko, B., and Shelupanov, A.A. (2018, January 18\u201325). User Identification by the Free-Text Keystroke Dynamics. Proceedings of the 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC), Vladivostok, Russia.","DOI":"10.1109\/RPC.2018.8482190"},{"key":"ref_6","unstructured":"(2020, November 18). PAN: Shared Tasks. Available online: https:\/\/pan.webis.de\/shared-tasks.html."},{"key":"ref_7","unstructured":"Halvani, O., Graner, L., and Regev, R. (2020, January 22\u201325). Cross-domain authorship verification based on topic agnostic features. Proceedings of the Working Notes of CLEF, Thessaloniki, Greece."},{"key":"ref_8","unstructured":"(2020, November 18). Feature Vector Difference Based Neural Network and Logistic Regression Models for Authorship Verification. Available online: https:\/\/pan.webis.de\/downloads\/publications\/slides\/weerasinghe_2020.pdf."},{"key":"ref_9","unstructured":"Boenninghoff, B. (2020). Deep bayes factor scoring for authorship verification. arXiv."},{"key":"ref_10","doi-asserted-by":"crossref","unstructured":"Boenninghoff, B., Hessler, S., Kolossa, D., and Nickel, R.M. (2019, January 9\u201312). Explainable Authorship Verification in Social Media via Attention-based Similarity Learning. Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA.","DOI":"10.1109\/BigData47090.2019.9005650"},{"key":"ref_11","unstructured":"Jafariakinabad, F., and Hua, K.A. (2020). A Self\u2013Supervised Representation Learning of Sentence Structure for Authorship Attribution. arXiv."},{"key":"ref_12","doi-asserted-by":"crossref","unstructured":"Mamgain, S., Balabantaray, R.C., and Das, A.K. (2019, January 19\u201321). Author Profiling: Prediction of Gender and Language Variety from Document. Proceedings of the 2019 International Conference on Information Technology (ICIT), Bhubaneswar, India.","DOI":"10.1109\/ICIT48102.2019.00089"},{"key":"ref_13","doi-asserted-by":"crossref","unstructured":"Barlas, G., and Stamatatos, E. (2020, January 5\u20137). Cross-Domain Authorship Attribution Using Pre-Trained Language Models. Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Neos Marmaras, Greece.","DOI":"10.1007\/978-3-030-49161-1_22"},{"key":"ref_14","doi-asserted-by":"crossref","first-page":"741","DOI":"10.1007\/s00607-018-0587-8","article-title":"Document embeddings learned on various types of n-grams for cross-topic authorship attribution","volume":"100","year":"2018","journal-title":"Computing"},{"key":"ref_15","doi-asserted-by":"crossref","unstructured":"Custodio, J.E., and Paraboni, I. (2019, January 9\u201312). An ensemble approach to cross-domain authorship attribution. Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages, Lugano, Switzerland.","DOI":"10.1007\/978-3-030-28577-7_17"},{"key":"ref_16","unstructured":"Bartelds, M., and de Vries, W. (2019, January 9\u201312). Improving Cross-domain Authorship Attribution by Combining Lexical and Syntactic Features. Proceedings of the CLEF (Working Notes), Lugano, Switzerland."},{"key":"ref_17","first-page":"29","article-title":"System of analysis and visualization for cross-language identification of authors of scientific publications","volume":"16","author":"Isachenko","year":"2018","journal-title":"NSU Vestnik Inf. Technol."},{"key":"ref_18","first-page":"143","article-title":"Using Ontology for Revealing Authorship Attribution of Arabic Text","volume":"4","author":"Darwish","year":"2020","journal-title":"Int. J. Eng. Adv. Technol. (IJEAT)"},{"key":"ref_19","unstructured":"Iskhakova, A.O. (2020, November 18). Method and Software for Determining Artificially Created Texts. Available online: https:\/\/tusur.ru\/ru\/nauka-i-innovatsii\/podgotovka-kadrov-vysshey-nauchnoy-kvalifikatsii\/ob-yavleniya-o-zaschitah-dissertatsiy\/dissertatsiya-metod-i-programmnoe-sredstvo-opredeleniya-iskusstvenno-sozdannyh-tekstov."},{"key":"ref_20","doi-asserted-by":"crossref","unstructured":"Uchendu, A. Authorship Attribution for Neural Text Generation. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Available online: http:\/\/www.cs.iit.edu\/~kshu\/files\/emnlp20.pdf.","DOI":"10.18653\/v1\/2020.emnlp-main.673"},{"key":"ref_21","first-page":"139","article-title":"Application of \u201csupervised\u201d machine learning methods for text attribution: Individual approaches and intermediate results in identifying authors of Russian-language texts","volume":"1","author":"Chashchin","year":"2018","journal-title":"Probl. Criminol. Forensic Sci. Forensic Exam."},{"key":"ref_22","first-page":"29","article-title":"Automatic determination of the stylistic affiliation of texts by their statistical parameters","volume":"1","author":"Dubovik","year":"2017","journal-title":"Comput. Linguist. Comput. Ontol."},{"key":"ref_23","unstructured":"Dmitrin, Y.V. Comparison of deep neural network architectures for authorship attribution of Russian social media texts. Proceedings of the Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue, 2018, Available online: http:\/\/www.dialog-21.ru\/media\/4560\/_-dialog2018scopus.pdf."},{"key":"ref_24","first-page":"121","article-title":"Attribution of texts using mathematical methods and computer technologies","volume":"3","author":"Kulakov","year":"2019","journal-title":"Digit. Technol. Educ. Sci. Soc."},{"key":"ref_25","doi-asserted-by":"crossref","unstructured":"Huang, W., Su, R., and Iwaihara, M. (2020, January 12\u201314). Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts. Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Tianjing, China.","DOI":"10.1007\/978-3-030-60290-1_20"},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"G\u00f3mez-Adorno, H., Sidorov, G., Pinto, D., Vilari\u00f1o, D., and Gelbukh, A. (2016). Automatic authorship detection using textual patterns extracted from integrated syntactic graphs. Sensors, 16.","DOI":"10.3390\/s16091374"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"3224","DOI":"10.1109\/ACCESS.2018.2885011","article-title":"An empirical study on forensic analysis of urdu text using LDA-based authorship attribution","volume":"7","author":"Anwar","year":"2018","journal-title":"IEEE Access"},{"key":"ref_28","doi-asserted-by":"crossref","unstructured":"Zhang, R., Hu, Z., Guo, H., and Mao, Y. (November, January 31). Syntax encoding with application in authorship attribution. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium.","DOI":"10.18653\/v1\/D18-1294"},{"key":"ref_29","unstructured":"Keyrouz, Y., Fonlupt, C., Robilliard, D., and Mezher, D. (2018, January 29\u201330). Evolving a Weighted Combination of Text Similarities for Authorship Attribution. Proceedings of the International Conference on Artificial Evolution (Evolution Artificielle), Mulhouse, France."},{"key":"ref_30","unstructured":"Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Gomez, A.N.J., Kaiser, L., and Polosukhin, I. (2017). Attention Is All You Need. arXiv."},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Chang, W.-C., Yu, H.-F., Zhong, K., Yang, Y., and Dhillon, I. (2019). Taming Pretrained Transformers for Extreme Multi-label Text Classification. arXiv.","DOI":"10.1145\/3394486.3403368"},{"key":"ref_32","doi-asserted-by":"crossref","unstructured":"Kurtukova, A., Romanov, A., and Shelupanov, A. (2020). Source Code Authorship Identification Using Deep Neural Networks. Symmetry, 12.","DOI":"10.3390\/sym12122044"},{"key":"ref_33","doi-asserted-by":"crossref","unstructured":"Romanov, A.S., Kurtukova, A.V., Sobolev, A.A., Shelupanov, A.A., and Fedotova, A.M. (2020). Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information, 11.","DOI":"10.3390\/info11120589"},{"key":"ref_34","unstructured":"(2020, November 18). Moshkov\u2019s Library. Available online: http:\/\/lib.ru\/."},{"key":"ref_35","unstructured":"Romanov, A., Kurtukova, A., Fedotova, A., and Meshcheryakov, R. (2019, January 27). Natural Text Anonymization Using Universal Transformer with a Self-attention. Proceedings of the III International Conference on Language Engineering and Applied Linguistics, Saint Petersburg, Russia."}],"container-title":["Future Internet"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-5903\/13\/1\/3\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,7,6]],"date-time":"2024-07-06T19:43:33Z","timestamp":1720295013000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-5903\/13\/1\/3"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,12,25]]},"references-count":35,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2021,1]]}},"alternative-id":["fi13010003"],"URL":"https:\/\/doi.org\/10.3390\/fi13010003","relation":{},"ISSN":["1999-5903"],"issn-type":[{"value":"1999-5903","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,12,25]]}}}