Abstract
We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we apply a Rocchio classifier on a dataset from a Web 2.0 site and show that it is suitable for semi-automated labelling (often called tagging) of short texts and is faster than other approaches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Dietterich, T.G.: Ensemble methods in machine learning. In: Proc. of the First Int. Workshop on Multiple Classifier Systems (2000)
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998: Proc. of the 7th International Conf. on Information and Knowledge Management. ACM, New York (1998)
Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Proc. of the 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD) (2004)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proc. European Conf. on Machine Learning (ECML). Springer, Heidelberg (1998)
Mishne, G.: Autotag: a collaborative approach to automated tag assignment for weblog posts. In: Proc. of the 15th Int. World Wide Web Conference. ACM Press, New York (2006)
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2) (1992)
Yang, Y.: A study of thresholding strategies for text categorization. In: Proc. of the 24th Int. ACM SIGIR Conf. (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heß, A., Dopichaj, P., Maaß, C. (2008). Multi-value Classification of Very Short Texts. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds) KI 2008: Advances in Artificial Intelligence. KI 2008. Lecture Notes in Computer Science(), vol 5243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85845-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-85845-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85844-7
Online ISBN: 978-3-540-85845-4
eBook Packages: Computer ScienceComputer Science (R0)