Abstract
This paper demonstrates the contribution of simple open source tools to the development of a highly efficient author profiling system, which determines the age and gender of the author based on the authored text itself. With the rapid growth of the Web, the number of social websites has increased by twice a fold. Thus it becomes necessary for security agencies and intelligence experts to keep track of any malicious activity by users on the Web (such as pedophiles, security attacks etc.) by monitoring their profiles and flagging them if necessary. Rather than building the system from scratch Software Engineering provides us a Component Based Methodology (CBM) that permits the reuse of various components that will help us in achieving better quality software in a quick span of time, free of cost. Significant differences exist in the way males/females and younger/older people write. We illustrate in detail how the system exploits these differences for its development based on the architecture of the CBM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bose, D.: Component Based Development. Application in Software Engineering. Indian Statistical Institute (2010)
Crnkovic, I.: Component-Based Software Engineering-New Challenge in Software Development. Software Focus 2(4), 127–133 (2001)
MALLET GUI, https://code.google.com/p/topic-modeling-tool/
Schler, J., Koppel, M., Argamon, S., Pennebaker, J.: Effects of age and gender on blogging. In: AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, vol. 6, pp. 199–205 (2006)
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the blogosphere: Age, gender, and the varieties of self-expression. First Monday 12(9) (September 2007)
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Communications of the ACM 52(2), 119–123 (2009)
Santosh, K., Bansal, R., Shekhar, M., Varma, V.: Author Profiling: Predicting Age and Gender from Blogs. Notebook for PAN at CLEF 2013 (2013)
Pavan, A., Mogadala, A., Varma, V.: Author Profiling Using LDA and Maximum Entropy. Notebook for PAN at CLEF 2013 (2013)
Patra, B.G., Banerjee, S., Das, D., Saikh, T., Bandyopadhyay, S.: Automatic Author Profiling Based on Linguistic and Stylistic Features. Notebook for PAN at CLEF 2013 (2013)
McCallum, A.K.: MALLET: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu
PAN Corpus, http://www.uni-weimar.de/medien/webis/research/events/pan-13/pan13-web/author-profiling.html
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: HLT-NAACL, vol. 1, pp. 173–180 (2003)
List of stopwords, http://www.ranks.nl/stopwords
Collaborative User Experience group’s Java library code for basic natural-language processing capabilities, https://github.com/jdf/cue.language#cuelanguage
WEKA API, http://www.cs.waikato.ac.nz/ml/weka/downloading.html
LibSVM API, http://dev.davidsoergel.com/trac/jlibsvm/
LibSVM Wrapper Class for WEKA, http://www.cs.iastate.edu/~yasser/wlsvm/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nazareth, D., Asnani, K., Rodrigues, O. (2015). Author-Profile System Development Based on Software Reuse of Open Source Components. In: Satapathy, S., Biswal, B., Udgata, S., Mandal, J. (eds) Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2014. Advances in Intelligent Systems and Computing, vol 328. Springer, Cham. https://doi.org/10.1007/978-3-319-12012-6_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-12012-6_69
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12011-9
Online ISBN: 978-3-319-12012-6
eBook Packages: EngineeringEngineering (R0)