Abstract
While it is crucial for organizations to automatically identify the gender of participants in product discussion forums, they may have difficulties adopting existing gender classification methods because the associations between the linguistic features used in gender classification models and gender type usually varies with context. This paper proposes and validates a framework for the development of gender classification that uses a more “data-driven” approach. The framework constantly extracts content-specific features from the discussions and could automatically adjust the features selected to accommodate the contextual changes in order to achieve better classification accuracy. It does not require any manual effort for model adjustment, which makes it easier for organizations to adopt.
Similar content being viewed by others
References
Bamman, D., Eisenstein, J., Schnoebelen, T.: Gender in Twitter: styles, stances, and social networks. CoRR abs/1210.4567 (2012)
Mukherjee, A., Liu, B.: Improving gender classification of blog authors. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 207–217. Association for Computational Linguistics, Stroudsburg, PA, USA (2010)
Wei, X., Dong, P., Cui, G.: Automatic extraction of course ontology from chinese textbook. In: 2010 International Conference on Computational Intelligence and Software Engineering. IEEE (2010). doi:10.1109/CISE.2010.5677020
Yang, J., Leskovec, J.: Patterns of temporal variation in online media. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 177–186. ACM, New York, NY, USA (2011). doi:10.1145/1935826.1935863
Herring, S.C., Paolillo, J.C.: Gender and genre variation in weblogs. J. Sociolinguistics 10(4), 439–459 (2006). doi:10.1111/j.1467-9841.2006.00287.x
Labov, W.: Principles of linguistic change, cognitive and cultural factors, vol. 3. John Wiley & Sons, Hoboken (2011)
Garbarino, E., Strahilevitz, M.: Gender differences in the perceived risk of buying online and the effects of receiving a site recommendation. J. Bus. Res. 57(7), 768–775 (2004). doi:10.1016/S0148-2963(02)00363-6
Yang, C., Wu, C.C.: Gender differences in online shoppers’ decision-making styles. In: Ascenso, J., Vasiu, L., Belo, C., Saramago, M. (eds.) e-Business and Telecommunication Networks, pp. 108–115. Springer, Dordrecht (2006). doi:10.1007/1-4020-4761-4_6
Doong, H., Wang, H.: Do males and females differ in how they perceive and elaborate on agent-based recommendations in Internet-based selling? Electron. Commer. Res. Appl. 10(5), 595–604 (2011). doi:10.1016/j.elerap.2010.12.005
Savicki, V., Kelley, M.: Computer mediated communication: gender and group composition. CyberPsychol. Behav. 3(5), 817–826 (2004). doi:10.1089/10949310050191791
Thomson, R., Murachver, T., Green, J.: Where is the gender in gendered language? Psychol. Sci. 12(2), 171–175 (2001). doi:10.1111/1467-9280.00329
Mulac, A., Bradac, J.J., Gibbons, P.: Empirical support for the gender-as-culture hypothesis. Hum. Commun. Res. 27(1), 121–152 (2001). doi:10.1111/j.1468-2958.2001.tb00778.x
Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: Proceedings of the 2nd international workshop on Search and mining user-generated contents, pp. 37–44. ACM, New York, NY, USA (2010). doi:10.1145/1871985.1871993
Martindale, C., McKenzie, D.: On the utility of content analysis in author attribution: the Federalist. Comput. Hum. 29(4), 259–270 (1995). doi:10.1007/BF01830395
Eckert, P., McConnell-Ginet, S.: Constructing meaning, constructing selves: snapshots of language, gender and class from Belten high. In: Hall, K., Bucholtz, M. (eds.) Gender Articulated: Language and the Socially Constructed Self, pp. 469–507. Routledge, London and New York (1995)
Burger, J.D., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on Twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1301–1309. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)
ICTCLAS: ICTCLAS features. http://www.ictclas.org
Bo, A., Peng, S., Xinming, T., Alimu, N.: Spatio-temporal visualization system of news events based on GIS. In: 2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN), pp. 448–451. IEEE (2011). doi:10.1109/ICCSN.2011.6014089
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1997)
Liu, W., Zhu, Y., Li, C., Xiang, H., Wen, Z.: Research on building Chinese basic semantic lexicon. J. Comput. Appl. 29(10), 2875–2877 (2009)
Acknowledgments
This work is partly supported by the National Natural Science Foundation of PRC (Nos. 71531013, 71490720, and 71401047).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wang, J., Yan, X., Zhu, B. (2017). Behavior Theory Enabled Gender Classification Method. In: Fan, M., Heikkilä, J., Li, H., Shaw, M., Zhang, H. (eds) Internetworked World. WEB 2016. Lecture Notes in Business Information Processing, vol 296. Springer, Cham. https://doi.org/10.1007/978-3-319-69644-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-69644-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69643-0
Online ISBN: 978-3-319-69644-7
eBook Packages: Computer ScienceComputer Science (R0)