Improved algorithms for extrinsic author verification | Knowledge and Information Systems Skip to main content
Log in

Improved algorithms for extrinsic author verification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Author verification is a fundamental problem in authorship attribution, and it suits most relevant applications where it is not possible to predefine a closed set of suspects. So far, the most successful approaches attempt to sample the non-target class (all documents by all other authors) and transform author verification to a binary classification task. Moreover, they follow the instance-based paradigm (all documents of known authorship are treated separately). In this paper, we propose two algorithms, one instance-based and one profile-based (all known documents are treated cumulatively) that are able to outperform state-of-the-art methods in several benchmark datasets. We demonstrate that the proposed methods are capable of taking advantage of the availability of multiple documents of known authorship and that they are robust when text length is reduced.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. An early-view version of [22] was available since 2013.

  2. https://pan.webis.de/clef14/pan14-web/author-identification.html.

  3. https://pan.webis.de/clef15/pan15-web/author-identification.html.

References

  1. Abbasi A, Chen H (2005) Applying authorship analysis to extremist-group web forum messages. IEEE Intell Syst 20(5):67–75

    Article  Google Scholar 

  2. Aggarwal CC, Zhai C (2012) A survey of text classification algorithms. In: Aggarwal CC, Zhai C (eds) Mining text data. Springer, Berlin, pp 163–222

    Chapter  Google Scholar 

  3. Bagnall D (2015) Author Identification using multi-headed recurrent neural networks. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs

  4. Bartoli A, Dagri A, Lorenzo AD, Medvet E, Tarlao F (2015) An author verification approach based on differential features. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs

  5. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  6. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn R 7:1–30

    MathSciNet  MATH  Google Scholar 

  7. Escalante HJ, Montes-y-Gómez M, Villaseñor-Pineda L (2009) Particle swarm model selection for authorship verification. In: Proceedings of the 14th Iberoamerican conference on pattern recognition, pp 563–570

  8. Fréry J, Largeron C, Juganaru-Mathieu M (2014) UJM at CLEF in Author Identification. In: Proceedings CLEF-2014, Working Notes (2014) pp 1042–1048

  9. Gutierrez J, Casillas J, Ledesma P, Fuentes G, Meza I (2015) Homotopy based classification for author verification task. Working Notes Papers of the CLEF

  10. Halvani O, Winter C, Pflug A (2016) Authorship verification for different languages, genres and topics. Dig Investig 16:S33–S43

    Article  Google Scholar 

  11. Hürlimann M, Weck B, van den Berg E, Šuster S, Nissim M (2015) GLAD: Groningen lightweight authorship detection. In: Cappellato L, Ferro N, Jones J, San Juan E (eds) CLEF 2015 evaluation labs and workshop— working notes papers. CEUR-WS.org

  12. Jankowska M, Milios EE, Keselj V (2014) Author verification using common n-gram profiles of text documents. In: Proceedings of COLING, 25th international conference on computational linguistics, pp 387–397

  13. Juola P (2008) Authorship attribution. Found Trends Inf Retr 1:234–334

    Google Scholar 

  14. Juola P (2013) How a computer program helped reveal J. K. Rowling as author of A Cuckoo’s Calling. Scientific American

  15. Juola P, Stamatatos E (2013) Overview of the author identification task at PAN 2013. In: Working notes for CLEF 2013 conference

  16. Kestemont M, Luyckx K, Daelemans WTC (2012) Cross-genre authorship verification using unmasking. Engl Stud 93(3):340–356

    Article  Google Scholar 

  17. Khonji M, Iraqi Y (2014) A slightly-modified gi-based author-verifier with lots of features (asgalf). In: CLEF 2014 labs and workshops, notebook papers. CLEF and CEUR-WS.org

  18. Kocher M, Savoy J (2015) UniNE at CLEF 2015: author identification. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs

  19. Kocher M, Savoy J (2016) A simple and efficient algorithm for authorship verification. J Assoc Inf Sci Technol 68(1):259–269

    Article  Google Scholar 

  20. Koppel M, Schler J, Argamon S (2011) Authorship attribution in the wild. Lang Resour Eval 45(1):83–94

    Article  Google Scholar 

  21. Koppel M, Schler J, Bonchek-Dokow E (2007) Measuring differentiability: unmasking pseudonymous authors. J Mach Learn Res 8:1261–1276

    MATH  Google Scholar 

  22. Koppel M, Winter Y (2014) Determining if two documents are written by the same author. J Am Soc Inf Sci Technol 65(1):178–187

    Article  Google Scholar 

  23. Layton R, Watters PA, Dazeley R (2015) Authorship analysis of aliases: Does topic influence accuracy? Nat Lang Eng 21(4):497–518

    Article  Google Scholar 

  24. Luyckx K, Daelemans W (2008) Authorship attribution and verification with many authors and limited data. In: Proceedings of the conference of COLING 2008, 22nd international conference on computational linguistics, pp 513–520

  25. Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: Cappellato L, Ferro N, Gareth J, San Juan E (eds) Working notes papers of the CLEF 2015 evaluation labs

  26. Pacheco M, Fernandes K, Porco A (2015) Random forest with increased generalization: a universal background approach for authorship verification. In: Cappellato L, Ferro N, Jones J, San Juan E (eds) CLEF 2015 evaluation labs and workshop—working notes papers. CEUR-WS.org

  27. Peñas A, Rodrigo A (2011) A simple measure to assess non-response. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, ACL, vol 1, pp 1415–1424

  28. Potha N, Stamatatos E (2014) A profile-based method for authorship verification. In: Artificial intelligence: methods and applications—proceedings of the 8th Hellenic conference on AI, SETN, pp 313–326

  29. Potha N, Stamatatos E (2017) An improved impostors method for authorship verification. In: International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 138–144

  30. Samdani R, Chang K, Roth D (2014) A discriminative latent variable model for online clustering. In: Proceedings of the 31th international conference on machine learning, ICML 2014, pp 1–9

  31. Sanderson C, Guenter S (2006) Short text authorship attribution via sequence kernels, markov chains and author unmasking: an investigation. In: Proceedings of the international conference on empirical methods in natural language engineering, pp 482–491

  32. Seidman S (2013) Authorship verification using the impostors method. In: Forner P, Navigli R, Tufis D (eds) CLEF 2013 evaluation labs and workshop—working notes papers

  33. Stamatatos E (2009) A survey of modern authorship attribution methods. J Am Soc Inf Sci Technol 60:538–556

    Article  Google Scholar 

  34. Stamatatos E, Daelemans W, Verhoeven B, Juola P, López-López A, Potthast M, Stein B (2015) Overview of the author identification task at PAN 2015. In: Working notes of CLEF 2015—conference and labs of the evaluation forum

  35. Stamatatos E, Daelemans W, Verhoeven B, Stein B, Potthast M, Juola P, Sánchez-Pérez MA, Barrón-Cedeño A (2014) Overview of the author identification task at PAN 2014. In: Working notes for CLEF 2014 conference, pp 877–897

  36. Stamatatos E, Fakotakis N, Kokkinakis G (2000) Automatic text categorization in terms of genre and author. Comput Linguist 26(4):471–495

    Article  Google Scholar 

  37. Stover JA, Winter Y, Koppel M, Kestemont M (2016) Computational authorship verification method attributes a new work to a major 2nd century african author. J Am Soc Inf Sci Technol 67(1):239–242

    Article  Google Scholar 

  38. Sun J, Yang Z, Liu S, Wang P (2012) Applying stylometric analysis techniques to counter anonymity in cyberspace. J Netw 7(2):259–266

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Efstathios Stamatatos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Potha, N., Stamatatos, E. Improved algorithms for extrinsic author verification. Knowl Inf Syst 62, 1903–1921 (2020). https://doi.org/10.1007/s10115-019-01408-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01408-4

Keywords

Navigation