Author-Profile System Development Based on Software Reuse of Open Source Components | SpringerLink
Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 328))

Abstract

This paper demonstrates the contribution of simple open source tools to the development of a highly efficient author profiling system, which determines the age and gender of the author based on the authored text itself. With the rapid growth of the Web, the number of social websites has increased by twice a fold. Thus it becomes necessary for security agencies and intelligence experts to keep track of any malicious activity by users on the Web (such as pedophiles, security attacks etc.) by monitoring their profiles and flagging them if necessary. Rather than building the system from scratch Software Engineering provides us a Component Based Methodology (CBM) that permits the reuse of various components that will help us in achieving better quality software in a quick span of time, free of cost. Significant differences exist in the way males/females and younger/older people write. We illustrate in detail how the system exploits these differences for its development based on the architecture of the CBM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 22879
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 28599
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bose, D.: Component Based Development. Application in Software Engineering. Indian Statistical Institute (2010)

    Google Scholar 

  2. Crnkovic, I.: Component-Based Software Engineering-New Challenge in Software Development. Software Focus 2(4), 127–133 (2001)

    Article  Google Scholar 

  3. MALLET GUI, https://code.google.com/p/topic-modeling-tool/

  4. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, vol. 6, pp. 199–205 (2006)

    Google Scholar 

  5. Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: Age, gender, and the varieties of self-expression. First Monday 12(9) (September 2007)

    Google Scholar 

  6. Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Communications of the ACM 52(2), 119–123 (2009)

    Article  Google Scholar 

  7. Santosh, K., Bansal, R., Shekhar, M., Varma, V.: Author Profiling: Predicting Age and Gender from Blogs. Notebook for PAN at CLEF 2013 (2013)

    Google Scholar 

  8. Pavan, A., Mogadala, A., Varma, V.: Author Profiling Using LDA and Maximum Entropy. Notebook for PAN at CLEF 2013 (2013)

    Google Scholar 

  9. Patra, B.G., Banerjee, S., Das, D., Saikh, T., Bandyopadhyay, S.: Automatic Author Profiling Based on Linguistic and Stylistic Features. Notebook for PAN at CLEF 2013 (2013)

    Google Scholar 

  10. McCallum, A.K.: MALLET: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu

  11. PAN Corpus, http://www.uni-weimar.de/medien/webis/research/events/pan-13/pan13-web/author-profiling.html

  12. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: HLT-NAACL, vol. 1, pp. 173–180 (2003)

    Google Scholar 

  13. List of stopwords, http://www.ranks.nl/stopwords

  14. Collaborative User Experience group’s Java library code for basic natural-language processing capabilities, https://github.com/jdf/cue.language#cuelanguage

  15. WEKA API, http://www.cs.waikato.ac.nz/ml/weka/downloading.html

  16. LibSVM API, http://dev.davidsoergel.com/trac/jlibsvm/

  17. LibSVM Wrapper Class for WEKA, http://www.cs.iastate.edu/~yasser/wlsvm/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derrick Nazareth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Nazareth, D., Asnani, K., Rodrigues, O. (2015). Author-Profile System Development Based on Software Reuse of Open Source Components. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_69

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12012-6_69

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12011-9

  • Online ISBN: 978-3-319-12012-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics