{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,16]],"date-time":"2024-09-16T08:40:22Z","timestamp":1726476022886},"reference-count":45,"publisher":"MDPI AG","issue":"1","license":[{"start":{"date-parts":[[2022,12,26]],"date-time":"2022-12-26T00:00:00Z","timestamp":1672012800000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Algorithms"],"abstract":"This article is the third paper in a series aimed at the establishment of the authorship of Russian-language texts. This paper considers methods for determining the authorship of classical Russian literary texts, as well as fanfiction texts. The process of determining the author was first considered in the classical version of classification experiments using a closed set of authors, and experiments were also completed for a complicated modification of the problem using an open set of authors. The use of methods to identify the author of the text is justified by the conclusions about the effectiveness of the fastText and Support Vector Machine (SVM) methods with the selection of informative features discussed in our past studies. In the case of open attribution, the proposed methods are based on the author\u2019s combination of fastText and One-Class SVM as well as statistical estimates of a vector\u2019s similarity measures. The feature selection algorithm for a closed set of authors is chosen based on a comparison of five different selection methods, including the previously considered genetic algorithm as a baseline. The regularization-based algorithm (RbFS) was found to be the most efficient method, while methods based on a complete enumeration (FFS and SFS) are found to be ineffective for any set of authors. The accuracy of the RbFS and SVM methods in the case of classical literary texts averaged 83%, which outperforms other selection methods by 3 to 10% for an identical number of features, and the average accuracy of fastText was 84%. For the open attribution in cross-topic classification, the average accuracy of the method based on the combination of One-Class SVM with RbFS and fastText was 85%, and for in-group classification, it was 75 to 78%, depending on the group, which is the best result among the open attribution methods considered.<\/jats:p>","DOI":"10.3390\/a16010013","type":"journal-article","created":{"date-parts":[[2022,12,27]],"date-time":"2022-12-27T07:53:11Z","timestamp":1672127591000},"page":"13","source":"Crossref","is-referenced-by-count":5,"title":["Digital Authorship Attribution in Russian-Language Fanfiction and Classical Literature"],"prefix":"10.3390","volume":"16","author":[{"ORCID":"http:\/\/orcid.org\/0000-0001-7844-4363","authenticated-orcid":false,"given":"Anastasia","family":"Fedotova","sequence":"first","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"ORCID":"http:\/\/orcid.org\/0000-0002-2587-2222","authenticated-orcid":false,"given":"Aleksandr","family":"Romanov","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Anna","family":"Kurtukova","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]},{"given":"Alexander","family":"Shelupanov","sequence":"additional","affiliation":[{"name":"Department of Security, Tomsk State University of Control Systems and Radioelectronics, 634050 Tomsk, Russia"}]}],"member":"1968","published-online":{"date-parts":[[2022,12,26]]},"reference":[{"key":"ref_1","doi-asserted-by":"crossref","unstructured":"Romanov, A., Kurtukova, A., Shelupanov, A., Fedotova, A., and Goncharov, V. (2021). Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks. Future Internet, 13.","DOI":"10.3390\/fi13010003"},{"key":"ref_2","doi-asserted-by":"crossref","unstructured":"Romanov, A.S., Kurtukova, A.V., Sobolev, A.A., Shelupanov, A.A., and Fedotova, A.M. (2020). Determining the Age of the Author of the Text Based on Deep Neural Network Models. Information, 11.","DOI":"10.3390\/info11120589"},{"key":"ref_3","doi-asserted-by":"crossref","first-page":"481","DOI":"10.1007\/s42979-021-00911-2","article-title":"Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution","volume":"2","author":"Jafariakinabad","year":"2021","journal-title":"SN Comput. Sci."},{"key":"ref_4","doi-asserted-by":"crossref","unstructured":"Mahor, U., and Kumar, A. (2021). A Comparative Study of Stylometric Characteristics in Authorship Attribution. Information and Communication Technology for Competitive Strategies, ICTCS Springer.","DOI":"10.1007\/978-981-19-0095-2_8"},{"key":"ref_5","doi-asserted-by":"crossref","unstructured":"Fedotova, A., Romanov, A., Kurtukova, A., and Shelupanov, A. (2022). Authorship Attribution of Social Media and Literary Russian-Language Texts Using Machine Learning Methods and Feature Selection. Future Internet, 14.","DOI":"10.3390\/fi14010004"},{"key":"ref_6","unstructured":"(2022, October 19). Russian GPT-2 Model. Available online: https:\/\/github.com\/vlarine\/ruGPT2."},{"key":"ref_7","unstructured":"(2022, October 19). Russian GPT-3 Model. Available online: https:\/\/developers.sber.ru\/portal\/products\/rugpt-3?attempt=1."},{"key":"ref_8","unstructured":"(2022, October 20). PAN: Series of Scientific Events and Shared Tasks on Digital Text Forensics and Stylometry. Available online: https:\/\/pan.webis.de\/."},{"key":"ref_9","unstructured":"(2022, October 20). The 100 Idiolectic Project. Available online: https:\/\/fold.aston.ac.uk\/handle\/123456789\/17."},{"key":"ref_10","unstructured":"Najafi, M., and Tavan, E. (2021, January 5\u20138). Text-to-Text Transformer in Authorship Verification Via Stylistic and Semantical Analysis. Proceedings of the CLEF 2022\u2014Conference and Labs of the Evaluation Forum, Bologna, Italy. Available online: https:\/\/ceur-ws.org\/Vol-3180\/paper-215.pdf."},{"key":"ref_11","unstructured":"(2022, October 25). PAN at CLEF 2021. Available online: https:\/\/pan.webis.de\/clef21\/pan21-web\/index.html."},{"key":"ref_12","unstructured":"Boenninghoff, B., Nickel, R.M., and Kolossa, D. (2021). O2D2: Out-of-distribution detector to capture undecidable trials in authorship verification. arXiv."},{"key":"ref_13","unstructured":"Weerasinghe, J., Singh, R., and Greenstadt, R. (2021, January 21\u201324). Feature Vector Difference based Authorship Verification for Open-World Settings. Proceedings of the CLEF 2021\u2014Conference and Labs of the Evaluation Forum, Bucharest, Romania."},{"key":"ref_14","first-page":"89","article-title":"Modern Classic in the Web Environment: Narrative Variations of V. Nabokov\u2019s in Fanfiction. Acta Universitatis Sapientiae","volume":"18","author":"Drozdova","year":"2020","journal-title":"Film Media Stud."},{"key":"ref_15","doi-asserted-by":"crossref","first-page":"531","DOI":"10.1080\/14708477.2020.1812621","article-title":"Transcultural literacies in online collaboration: A case study of fanfiction translation from Russian into English","volume":"20","author":"Shafirova","year":"2020","journal-title":"Lang. Intercult. Commun."},{"key":"ref_16","first-page":"348","article-title":"Deep neural network and model-based clustering technique for forensic electronic mail author attribution","volume":"3","author":"Apoorva","year":"2021","journal-title":"Appl. Sci."},{"key":"ref_17","doi-asserted-by":"crossref","unstructured":"Wang, H., Riddell, A., and Juola, P. (2021, January 19\u201323). Mode effects\u2019 challenge to authorship attribution. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online.","DOI":"10.18653\/v1\/2021.eacl-main.97"},{"key":"ref_18","doi-asserted-by":"crossref","unstructured":"Swain, S., Mishra, G., and Sindhu, C. (2017, January 20\u201322). Recent approaches on authorship attribution techniques\u2014An overview. Proceedings of the 2017 International Conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India.","DOI":"10.1109\/ICECA.2017.8203599"},{"key":"ref_19","unstructured":"Hedegaard, S., and Simonsen, J.G. (2011, January 19\u201324). Lost in translation: Authorship attribution using frame semantics. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, USA."},{"key":"ref_20","doi-asserted-by":"crossref","first-page":"107815","DOI":"10.1016\/j.asoc.2021.107815","article-title":"Exploring syntactic and semantic features for authorship attribution","volume":"111","author":"Wu","year":"2021","journal-title":"Appl. Soft Comput."},{"key":"ref_21","unstructured":"Alharthi, H., Inkpen, D., and Szpakowicz, S. (2018, January 20\u201326). Authorship identification for literary book recommendations. Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA."},{"key":"ref_22","unstructured":"(2022, November 02). The Litrec Dataset. Available online: https:\/\/www.inesc-id.pt\/publications\/8386\/pdf."},{"key":"ref_23","doi-asserted-by":"crossref","first-page":"72","DOI":"10.21681\/2311-3456-2019-4-72-79","article-title":"Methods for identifying the psychological characteristics of the author in the text (on the example of aggressiveness)","volume":"4","author":"Kovalev","year":"2019","journal-title":"Cyber Secur. Issues"},{"key":"ref_24","first-page":"49","article-title":"Analysis and visualization system for cross-language identification of authors of scientific publications. Bulletin of the Novosibirsk State University","volume":"16","author":"Isachenko","year":"2018","journal-title":"Ser. Inf. Technol."},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"67","DOI":"10.17803\/2311-5998.2022.90.2.067-076","article-title":"Problems of expert identification in forensic autonomy","volume":"2","author":"Sokolova","year":"2022","journal-title":"Bull. O.E. Kutafin Univ."},{"key":"ref_26","doi-asserted-by":"crossref","unstructured":"Bardamova, M., and Hodashinsky, I. (2021, January 13\u201314). Hybrid Algorithm for Tuning Feature Weights in a Fuzzy Classifier. Proceedings of the 2021 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT), Yekaterinburg, Russia.","DOI":"10.1109\/USBEREIT51232.2021.9455030"},{"key":"ref_27","doi-asserted-by":"crossref","first-page":"12316","DOI":"10.1007\/s10489-021-03076-w","article-title":"Wrapper feature selection with partially labeled data","volume":"52","author":"Feofanov","year":"2022","journal-title":"Appl. Intell."},{"key":"ref_28","doi-asserted-by":"crossref","first-page":"3224","DOI":"10.1109\/ACCESS.2018.2885011","article-title":"An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution","volume":"7","author":"Anwar","year":"2018","journal-title":"IEEE Access"},{"key":"ref_29","doi-asserted-by":"crossref","unstructured":"Morales S\u00e1nchez, D., Moreno, A., and Jim\u00e9nez L\u00f3pez, M.D. (2022). A White-Box Sociolinguistic Model for Gender Detection. Appl. Sci., 12.","DOI":"10.3390\/app12052676"},{"key":"ref_30","first-page":"1","article-title":"Overview of the 8th author profiling task at pan 2020: Profiling fake news spreaders on twitter","volume":"Volume 2696","author":"Rangel","year":"2020","journal-title":"CEUR Workshop Proceedings"},{"key":"ref_31","doi-asserted-by":"crossref","unstructured":"Bevendorff, J., Chulvi, B., Fersini, E., Heini, A., Kestemont, M., Kredens, K., and Zangerle, E. (2022, January 5\u20138). Overview of PAN 2022: Authorship Verification, Profiling Irony and Stereotype Spreaders, and Style Change Detection. Proceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages, Bologna, Italy.","DOI":"10.1007\/978-3-031-13643-6_24"},{"key":"ref_32","first-page":"19","article-title":"Gender profiling of the author of the subprime text","volume":"11","author":"Krassa","year":"2014","journal-title":"Bull. South Ural State Univ. Ser. Linguist."},{"key":"ref_33","first-page":"22","article-title":"Automatic determination of the gender of the author of the text: The phenomenon of Russian women\u2019s prose. Bulletin of the Novosibirsk State University","volume":"18","author":"Khazova","year":"2020","journal-title":"Ser. Linguist. Intercult. Commun."},{"key":"ref_34","unstructured":"Kov\u00e1cs, G., Balogh, V., Mehta, P., Shridhar, K., Alonso, P., and Liwicki, M. (2022, December 21). Author Profiling Using Semantic and Syntactic Features: Notebook for PAN at CLEF 2019. Available online: https:\/\/core.ac.uk\/download\/pdf\/287813157.pdf."},{"key":"ref_35","doi-asserted-by":"crossref","first-page":"4857","DOI":"10.3233\/JIFS-179033","article-title":"A comparative analysis of distributional term representations for author profiling in social media","volume":"36","year":"2019","journal-title":"J. Intell. Fuzzy Syst."},{"key":"ref_36","unstructured":"Nguyen, D., Trieschnigg, D., Do\u011fru\u00f6z, A.S., Gravel, R., Theune, M., Meder, T., and de Jong, F. (2014, January 23\u201329). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland."},{"key":"ref_37","unstructured":"(2022, December 21). PAN Data. Available online: https:\/\/pan.webis.de\/data.html."},{"key":"ref_38","unstructured":"(2022, December 21). Victorian Era Authorship Attribution Data Set. Available online: https:\/\/archive.ics.uci.edu\/ml\/datasets\/Victorian+Era+Authorship+Attribution."},{"key":"ref_39","unstructured":"(2022, December 21). Blog Authorship Corpus. Available online: https:\/\/www.kaggle.com\/datasets\/rtatman\/blog-authorship-corpus."},{"key":"ref_40","unstructured":"(2022, December 21). Russian Literature. Available online: https:\/\/www.kaggle.com\/datasets\/d0rj3228\/russian-literature."},{"key":"ref_41","unstructured":"(2022, December 21). Authorship Attribution for Russian Literature. Available online: https:\/\/www.kaggle.com\/code\/d0rj3228\/authorship-attribution-for-russian-literature."},{"key":"ref_42","unstructured":"(2022, November 19). Ficbook: Fanfiction Book. Available online: https:\/\/ficbook.net\/."},{"key":"ref_43","doi-asserted-by":"crossref","first-page":"2833","DOI":"10.1109\/TKDE.2019.2960251","article-title":"A recursive regularization based feature selection framework for hierarchical classification","volume":"33","author":"Zhao","year":"2019","journal-title":"IEEE Trans. Knowl. Data Eng."},{"key":"ref_44","unstructured":"Ren, J., Qiu, Z., Fan, W., Cheng, H., and Yu, P.S. (2008, January 20\u201323). Forward semi-supervised feature selection. Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Osaka, Japan."},{"key":"ref_45","doi-asserted-by":"crossref","unstructured":"Marc\u00edlio, W.E., and Eler, D.M. (2020, January 7\u201310). From explanations to feature selection: Assessing shap values as feature selection mechanism. Proceedings of the 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Porto de Galinhas, Brazil.","DOI":"10.1109\/SIBGRAPI51738.2020.00053"}],"container-title":["Algorithms"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/13\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,8,11]],"date-time":"2024-08-11T16:38:34Z","timestamp":1723394314000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1999-4893\/16\/1\/13"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,12,26]]},"references-count":45,"journal-issue":{"issue":"1","published-online":{"date-parts":[[2023,1]]}},"alternative-id":["a16010013"],"URL":"https:\/\/doi.org\/10.3390\/a16010013","relation":{},"ISSN":["1999-4893"],"issn-type":[{"type":"electronic","value":"1999-4893"}],"subject":[],"published":{"date-parts":[[2022,12,26]]}}}