Abstract
Automatic text classification is the task of assigning unseen documents to a predefined set of classes. Text representation for classification purposes has been traditionally approached using a vector space model due to its simplicity and good performance. On the other hand, multi-label automatic text classification has been typically addressed either by transforming the problem under study to apply binary techniques or by adapting binary algorithms to work with multiple labels. In this paper we present two new representations for text documents based on label-dependent term-weighting for multi-label classification. We focus on modifying the input. Performance was tested with a well-known dataset and compared to alternative techniques. Experimental results based on Hamming loss analysis show an improvement against alternative approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cristianini, N., Shawe-Taylor, J., Lodhi, H.: Latent semantic kernels. Journal of Intelligent Information Systems 18(2-3), 127–152 (2002)
Fink, E.: Automatic evaluation and selection of problem-solving methods: Theory and experiments. Journal of Experimental and Theoretical Artificial Intelligence 16(2), 73–105 (2004)
Joachims, T.: Learning to classify text using support vector machines – methods, theory, and algorithms. Kluwer-Springer (2002)
Keikha, M., Razavian, N.S., Oroumchian, F., Razi, H.S.: Document representation and quality of text: An analysis. In: Survey of Text Mining II: Clustering, Classifcation, and Retrieval, pp. 135–168. Springer, London (2008)
Lan, M., Tan, C.-L., Low, H.-B.: Proposing a new term weighting scheme for text categorization. In: AAAI 2006: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 763–768. AAAI Press, Menlo Park (2006)
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 721–735 (2009)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Machine Learning 46(1-3), 423–444 (2002)
Manning, C., Schutze, H.: Foundations of statistical natural language processing. The MIT Press, Cambridge (1999)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management: an International Journal 24(5), 513–523 (1988)
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning, 135–168 (2000)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Tsivtsivadze, E., Pahikkala, T., Boberg, J., Salakoski, T.: Kernels for text analysis. Advances of Computational Intelligence in Industrial Systems 116, 81–97 (2008)
Tsoumakas, G., Katakis, I.: Multi label classification: An overview. International Journal of Data Warehouse and Mining 3(3), 1–13 (2007)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, 2nd edn. Springer, Heidelberg (2010)
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge Data Engineering 18(10), 1338–1351 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alfaro, R., Allende, H. (2011). Text Representation in Multi-label Classification: Two New Input Representations. In: Dobnikar, A., Lotrič, U., Šter, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2011. Lecture Notes in Computer Science, vol 6594. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20267-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-20267-4_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20266-7
Online ISBN: 978-3-642-20267-4
eBook Packages: Computer ScienceComputer Science (R0)