Investigation of Text Attribution Methods Based on Frequency Author Profile | SpringerLink
Skip to main content

Investigation of Text Attribution Methods Based on Frequency Author Profile

  • Conference paper
  • First Online:
Databases and Information Systems (DB&IS 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 838))

Included in the following conference series:

  • 584 Accesses

Abstract

The task of text analysis with the objective to determine text’s author is a challenge the solutions of which have engaged researchers since the last century. With the development of social networks and platforms for publishing of web-posts or articles on the Internet, the task of identifying authorship becomes even more acute. Specialists in the areas of journalism and law are particularly interested in finding a more accurate approach in order to resolve disputes related to the texts of dubious authorship. In this article authors carry out an applicability comparison of eight modern Machine Learning algorithms like Support Vector Machine, Naive Bayes, Logistic Regression, K-nearest Neighbors, Decision Tree, Random Forest, Multilayer Perceptron, Gradient Boosting Classifier for classification of Russian web-post collection. The best results were achieved with Logistic Regression, Multilayer Perceptron and Support Vector Machine with linear kernel using combination of Part-of-Speech and Word N-grams as features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Fissette, M.: Author identification in short texts (2010)

    Google Scholar 

  2. Ganesh, H.B.B., Reshma, U., Kumar, M.A.: Author identification based on word distribution in word space. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 1519–1523, August 2015. https://doi.org/10.1109/ICACCI.2015.7275828

  3. Howedi, F., Mohd, M.: Text classification for authorship attribution using Naive Bayes classifier with limited training data. In: Computer Engineering and Intelligent Systems (2014)

    Google Scholar 

  4. Jenkins, J., Nick, W., Roy, K., Esterline, A.C., Bloch, J.C.: Author identification using sequential minimal optimization. In: SoutheastCon 2016, pp. 1–2 (2016)

    Google Scholar 

  5. Kanhirangat, V., Gupta, D.: Text plagiarism classification using syntax based linguistic features. Expert Syst. Appl. 88, 448–464 (2017). https://doi.org/10.1016/j.eswa.2017.07.006. http://www.sciencedirect.com/science/article/pii/S095741741730475X

    Article  Google Scholar 

  6. Kapočiūtė-Dzikienė, J., Venčkauskas, A., Damaševičius, R.: A comparison of authorship attribution approaches applied on the Lithuanian language. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 347–351, September 2017. https://doi.org/10.15439/2017F110

  7. Khonji, M., Iraqi, Y., Jones, A.: An evaluation of authorship attribution using random forests. In: 2015 International Conference on Information and Communication Technology Research (ICTRC), pp. 68–71, May 2015. https://doi.org/10.1109/ICTRC.2015.7156423

  8. Korobov, M.: Morphological analyzer and generator for Russian and Ukrainian languages. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 320–332. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26123-2_31

    Chapter  Google Scholar 

  9. Largeron, C., Juganaru-Mathieu, M., Frery, J.: Author identification by automatic learning. In: IEEE International Conference on Document Analysis and Recognition (ICDAR 2015), Nancy, France, August 2015. https://hal.archives-ouvertes.fr/hal-01223252

  10. Meina, M., et al.: Ensemble-based classification for author profiling using various features notebook for pan at CLEF 2013. In: CLEF (2013)

    Google Scholar 

  11. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  12. Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using variable length part-of-speech patterns. In: ICAART (2016)

    Google Scholar 

  13. Pranckevičius, T., Marcinkevičius, V.: Application of logistic regression with part-of-the-speech tagging for multi-class text classification. In: 2016 IEEE 4th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), pp. 1–5, November 2016. https://doi.org/10.1109/AIEEE.2016.7821805

  14. Reddy, T.R., Vardhan, B.V., Reddy, P.V.: N-gram approach for gender prediction. In: 2017 IEEE 7th International Advance Computing Conference (IACC), pp. 860–865 (2017)

    Google Scholar 

  15. Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, CIKM 2002, pp. 659–661. ACM, New York (2002). https://doi.org/10.1145/584792.584911

  16. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009). https://doi.org/10.1002/asi.v60:3

    Article  Google Scholar 

  17. Vorobeva, A.A.: Examining the performance of classification algorithms for imbalanced data sets in web author identification. In: 2016 18th Conference of Open Innovations Association and Seminar on Information Security and Protection of Information Technology (FRUCT-ISPIT), pp. 385–390, April 2016. https://doi.org/10.1109/FRUCT-ISPIT.2016.7561554

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Polina Diurdeva or Elena Mikhailova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Diurdeva, P., Mikhailova, E. (2018). Investigation of Text Attribution Methods Based on Frequency Author Profile. In: Lupeikiene, A., Vasilecas, O., Dzemyda, G. (eds) Databases and Information Systems. DB&IS 2018. Communications in Computer and Information Science, vol 838. Springer, Cham. https://doi.org/10.1007/978-3-319-97571-9_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97571-9_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97570-2

  • Online ISBN: 978-3-319-97571-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics