Abstract
We apply a machine learning method to the occupation coding, which is a task to categorize the answers to open-ended questions regarding the respondent’s occupation. Specifically, we use Support Vector Machines (SVMs) and their combination with hand-crafted rules. Conducting the occupation coding manually is expensive and sometimes leads to inconsistent coding results when the coders are not experts of the occupation coding. For this reason, a rule-based automatic method has been developed and used. However, its categorization performance is not satisfiable. Therefore, we adopt SVMs, which show high performance in various fields, and compare it with the rule-based method. We also investigate effective combination methods of SVMs and the rule-based method. In our methods, the output of the rule-based method is used as features for SVMs. We empirically show that SVMs outperform the rule-based method in the occupation coding and that the combination of the two methods yields even better accuracy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
The National Institute for Japanese Language Publications (ed.): Word List by Semantic Principles. Shuei Press (1964)
Giorgetti, D., Sebastiani, F.: Multiclass text categorization for automated survey coding. In: Proceedings of the 18th ACM Symposium on Applied Computing (SAC 2003), pp. 798–802 (2003)
1995SSM Survey Research Group, SSM Industry and Occupation Classification (the 1995 edition). 1995SSM Survey Research Group (1995)
1995SSM Survey Research Group, Codebook for 1995SSM Survey. 1995SSM Survey Research Group (1996)
Hara, J., Umino, M.: Social Surveys Seminar. University of Tokyo Press (1984)
Isozaki, H., Hirao, T.: Japanese zero pronoun resolution based on ranking rules and machine learning. In: Proceedings of the 8th Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), pp. 184–191 (2003)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Kressel, U.: Pairwise classification and support vector machines. In: Schölkopf, B., Burgesa, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods -Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. Journal of Natural language Processing 9(5), 3–22 (2002)
Park, S.-B., Zhang, B.-T.: Text chunking by combining hand-crafted rules and memory-based learning. In: Proceedings of the 41th Annual Meeting of the Association for Computational Linguistics (ACL 2003), pp. 497–504 (2003)
Sebastiani, F.: Machine learning automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Takahashi, K.: A supporting system for coding of the answers from an open-ended question: An automatic coding system for SSM occupation data by case frame. Sociological Theory and Methods 15(1), 149–164 (2000)
Takahashi, K.: Automatic coding system for open-ended answers: Occupation data coding in the health and stratification survey. Keiai University International Studies 8(1), 31–52 (2001)
Takahashi, K.: Applying automatic occupation/industry coding system. In: Proceedings of the 8th Annual Meeting of the Association for Natural Language Processing, pp. 491–494 (2002)
Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS 2000. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2000, pp. 171–184 (2000)
Takahashi, K.: Applying the automatic occupational/industrial coding system to JGSS-2001. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2001 [2], pp. 179–192 (2003)
Takahashi, K.: A combination of ROCCO-system and support vector machines in occupation coding. In: Japanese Values and Behavioral Pattern Seen in the Japanese General Social Surveys in 2002 [3], pp. 163–174 (2004)
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
Wolpert, D.: Stacked generalization. Neural Networks 5, 241–259 (1992)
Mainichi: CD Mainichi Shinbun 2000. Nichigai Associates Co. (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takahashi, K., Takamura, H., Okumura, M. (2005). Automatic Occupation Coding with Combination of Machine Learning and Hand-Crafted Rules. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_34
Download citation
DOI: https://doi.org/10.1007/11430919_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26076-9
Online ISBN: 978-3-540-31935-1
eBook Packages: Computer ScienceComputer Science (R0)