Investigating the Effects of Recency and Size of Training Text on Author Recognition Problem | SpringerLink
Skip to main content

Investigating the Effects of Recency and Size of Training Text on Author Recognition Problem

  • Conference paper
Computer and Information Sciences - ISCIS 2004 (ISCIS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3280))

Included in the following conference series:

  • 830 Accesses

Abstract

Prediction by partial match (PPM) is an effective tool to address the author recognition problem. In this study, we have successfully applied the trained PPM technique for author recognition on Turkish texts. Furthermore, we have investigated the effects of recency, as well as size of the training text on the performance of the PPM approach. Results show that, more recent and larger training texts help decrease the compression rate, which, in turn, leads to increased success in author recognition. Comparing the effects of the recency and the size of the training text, we see that the size factor plays a more dominant role on the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 17159
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 21449
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Clough, P.: Plagiarism in Natural and Programming Languages: An Overview of Current Tools&Techs. Dept. of Comp. Sc., Univ. of Sheffield, UK (2000)

    Google Scholar 

  2. Stylometry Authorship Analysis: http://www.lightplanet.com/response/style.htm

  3. Rudman, J., Holmes, D.I., Tweedie, F.J., Baayen, R.H.: The State of Authorship Studies (1) The History and the Scope (2) The Problems – Towards Credibility and Validity. In: Joint Int’l Conf. of the Assoc. for Comp. & the Humanities and the Assoc. for Literary & Linguistic Computing, Queen’s Univ., Canada (1997)

    Google Scholar 

  4. Burrows, J.F.: Word-Patterns and Story-Shapes: The Statistical Analysis of Narrative Style. Literary & Linguistic Computing 2(2), 61–70 (1987)

    Article  Google Scholar 

  5. Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Authorship Attribution. In: Proceedings of EACL (1999)

    Google Scholar 

  6. Khmelev, D.V., Tweedie, F.J.: Using Markov Chains for Identification of Writers. Literary and Linguistic Computing 16(4), 299–307 (2001)

    Article  Google Scholar 

  7. Khmelev, D.V.: Disputed Authorship Resolution through Using Relative Empirical Entropy for Markov Chains of Letters in Human Language Texts. Journal of Quantitative Linguistics 7(3), 201–207 (2000)

    Article  Google Scholar 

  8. Kukushkina, O., Polikarpov, A., Khmelev, D.V.: Using Letters and Grammatical Statistics for Authorship Attribution. Problems of Information Transmission 37(2), 172–184 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  9. Teahan, W.J.: Modeling English Text. PhD. Thesis, Univ. of Waikato, NZ (1998)

    Google Scholar 

  10. Teahan, W.J., Harper, D.J.: Using Compression-Based Language Models for Text Categorization. In: Workshop on Language Modeling and Information Retrieval, pp. 83–88. Carnegie Mellon University (2001)

    Google Scholar 

  11. Teahan, W.J.: Text Classification and Segmentation Using Minimum Cross-Entropy. In: Proceedings of RIAO 2000, Paris, France, pp. 943–961 (2000)

    Google Scholar 

  12. Tur, G.: Automatic Authorship Detection (2000) (unpublished)

    Google Scholar 

  13. Khmelev, D.V., Teahan, W.J.: A Repetition Based Measure for Verification of Text Collections and for Text Categorization. In: SIGIR 2003, Toronto, Canada (2003)

    Google Scholar 

  14. Witten, I., Moffat, A., Bell, T.C.: Managing Gigabytes, San Fransisco (1999)

    Google Scholar 

  15. Can, F., Patton, J.M.: Change of Writing Style with Time. Computers and the Humanities 38(1), 61–82 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Celikel, E., Dalkılıç, M.E. (2004). Investigating the Effects of Recency and Size of Training Text on Author Recognition Problem. In: Aykanat, C., Dayar, T., Körpeoğlu, İ. (eds) Computer and Information Sciences - ISCIS 2004. ISCIS 2004. Lecture Notes in Computer Science, vol 3280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30182-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30182-0_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23526-2

  • Online ISBN: 978-3-540-30182-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics