Abstract
Author verification is a fundamental problem in authorship attribution, and it suits most relevant applications where it is not possible to predefine a closed set of suspects. So far, the most successful approaches attempt to sample the non-target class (all documents by all other authors) and transform author verification to a binary classification task. Moreover, they follow the instance-based paradigm (all documents of known authorship are treated separately). In this paper, we propose two algorithms, one instance-based and one profile-based (all known documents are treated cumulatively) that are able to outperform state-of-the-art methods in several benchmark datasets. We demonstrate that the proposed methods are capable of taking advantage of the availability of multiple documents of known authorship and that they are robust when text length is reduced.
Similar content being viewed by others
Notes
An early-view version of [22] was available since 2013.
References
Abbasi A, Chen H (2005) Applying authorship analysis to extremist-group web forum messages. IEEE Intell Syst 20(5):67–75
Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Berlin, pp 163–222
Bagnall D (2015) Author Identification using multi-headed recurrent neural networks. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs
Bartoli A, Dagri A, Lorenzo AD, Medvet E, Tarlao F (2015) An author verification approach based on differential features. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn R 7:1–30
Escalante HJ, Montes-y-Gómez M, Villaseñor-Pineda L (2009) Particle swarm model selection for authorship verification. In: Proceedings of the 14th Iberoamerican conference on pattern recognition, pp 563–570
Fréry J, Largeron C, Juganaru-Mathieu M (2014) UJM at CLEF in Author Identification. In: Proceedings CLEF-2014, Working Notes (2014) pp 1042–1048
Gutierrez J, Casillas J, Ledesma P, Fuentes G, Meza I (2015) Homotopy based classification for author verification task. Working Notes Papers of the CLEF
Halvani O, Winter C, Pflug A (2016) Authorship verification for different languages, genres and topics. Dig Investig 16:S33–S43
Hürlimann M, Weck B, van den Berg E, Šuster S, Nissim M (2015) GLAD: Groningen lightweight authorship detection. In: Cappellato L, Ferro N, Jones J, San Juan E (eds) CLEF 2015 evaluation labs and workshop— working notes papers. CEUR-WS.org
Jankowska M, Milios EE, Keselj V (2014) Author verification using common n-gram profiles of text documents. In: Proceedings of COLING, 25th international conference on computational linguistics, pp 387–397
Juola P (2008) Authorship attribution. Found Trends Inf Retr 1:234–334
Juola P (2013) How a computer program helped reveal J. K. Rowling as author of A Cuckoo’s Calling. Scientific American
Juola P, Stamatatos E (2013) Overview of the author identification task at PAN 2013. In: Working notes for CLEF 2013 conference
Kestemont M, Luyckx K, Daelemans WTC (2012) Cross-genre authorship verification using unmasking. Engl Stud 93(3):340–356
Khonji M, Iraqi Y (2014) A slightly-modified gi-based author-verifier with lots of features (asgalf). In: CLEF 2014 labs and workshops, notebook papers. CLEF and CEUR-WS.org
Kocher M, Savoy J (2015) UniNE at CLEF 2015: author identification. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs
Kocher M, Savoy J (2016) A simple and efficient algorithm for authorship verification. J Assoc Inf Sci Technol 68(1):259–269
Koppel M, Schler J, Argamon S (2011) Authorship attribution in the wild. Lang Resour Eval 45(1):83–94
Koppel M, Schler J, Bonchek-Dokow E (2007) Measuring differentiability: unmasking pseudonymous authors. J Mach Learn Res 8:1261–1276
Koppel M, Winter Y (2014) Determining if two documents are written by the same author. J Am Soc Inf Sci Technol 65(1):178–187
Layton R, Watters PA, Dazeley R (2015) Authorship analysis of aliases: Does topic influence accuracy? Nat Lang Eng 21(4):497–518
Luyckx K, Daelemans W (2008) Authorship attribution and verification with many authors and limited data. In: Proceedings of the conference of COLING 2008, 22nd international conference on computational linguistics, pp 513–520
Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs
Pacheco M, Fernandes K, Porco A (2015) Random forest with increased generalization: a universal background approach for authorship verification. In: Cappellato L, Ferro N, Jones J, San Juan E (eds) CLEF 2015 evaluation labs and workshop—working notes papers. CEUR-WS.org
Peñas A, Rodrigo A (2011) A simple measure to assess non-response. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, ACL, vol 1, pp 1415–1424
Potha N, Stamatatos E (2014) A profile-based method for authorship verification. In: Artificial intelligence: methods and applications—proceedings of the 8th Hellenic conference on AI, SETN, pp 313–326
Potha N, Stamatatos E (2017) An improved impostors method for authorship verification. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 138–144
Samdani R, Chang K, Roth D (2014) A discriminative latent variable model for online clustering. In: Proceedings of the 31th international conference on machine learning, ICML 2014, pp 1–9
Sanderson C, Guenter S (2006) Short text authorship attribution via sequence kernels, markov chains and author unmasking: an investigation. In: Proceedings of the international conference on empirical methods in natural language engineering, pp 482–491
Seidman S (2013) Authorship verification using the impostors method. In: Forner P, Navigli R, Tufis D (eds) CLEF 2013 evaluation labs and workshop—working notes papers
Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60:538–556
Stamatatos E, Daelemans W, Verhoeven B, Juola P, López-López A, Potthast M, Stein B (2015) Overview of the author identification task at PAN 2015. In: Working notes of CLEF 2015—conference and labs of the evaluation forum
Stamatatos E, Daelemans W, Verhoeven B, Stein B, Potthast M, Juola P, Sánchez-Pérez MA, Barrón-Cedeño A (2014) Overview of the author identification task at PAN 2014. In: Working notes for CLEF 2014 conference, pp 877–897
Stamatatos E, Fakotakis N, Kokkinakis G (2000) Automatic text categorization in terms of genre and author. Comput Linguist 26(4):471–495
Stover JA, Winter Y, Koppel M, Kestemont M (2016) Computational authorship verification method attributes a new work to a major 2nd century african author. J Am Soc Inf Sci Technol 67(1):239–242
Sun J, Yang Z, Liu S, Wang P (2012) Applying stylometric analysis techniques to counter anonymity in cyberspace. J Netw 7(2):259–266
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Potha, N., Stamatatos, E. Improved algorithms for extrinsic author verification. Knowl Inf Syst 62, 1903–1921 (2020). https://doi.org/10.1007/s10115-019-01408-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-019-01408-4