Multi-value Classification of Very Short Texts | SpringerLink
Skip to main content

Multi-value Classification of Very Short Texts

  • Conference paper
KI 2008: Advances in Artificial Intelligence (KI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5243))

Included in the following conference series:

Abstract

We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we apply a Rocchio classifier on a dataset from a Web 2.0 site and show that it is suitable for semi-automated labelling (often called tagging) of short texts and is faster than other approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Dietterich, T.G.: Ensemble methods in machine learning. In: Proc. of the First Int. Workshop on Multiple Classifier Systems (2000)

    Google Scholar 

  2. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: CIKM 1998: Proc. of the 7th International Conf. on Information and Knowledge Management. ACM, New York (1998)

    Google Scholar 

  3. Godbole, S., Sarawagi, S.: Discriminative methods for multi-labeled classification. In: Proc. of the 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD) (2004)

    Google Scholar 

  4. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proc. European Conf. on Machine Learning (ECML). Springer, Heidelberg (1998)

    Google Scholar 

  5. Mishne, G.: Autotag: a collaborative approach to automated tag assignment for weblog posts. In: Proc. of the 15th Int. World Wide Web Conference. ACM Press, New York (2006)

    Google Scholar 

  6. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining 3(3), 1–13 (2007)

    Google Scholar 

  7. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools with Java Implementations. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  8. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2) (1992)

    Google Scholar 

  9. Yang, Y.: A study of thresholding strategies for text categorization. In: Proc. of the 24th Int. ACM SIGIR Conf. (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andreas R. Dengel Karsten Berns Thomas M. Breuel Frank Bomarius Thomas R. Roth-Berghofer

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Heß, A., Dopichaj, P., Maaß, C. (2008). Multi-value Classification of Very Short Texts. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds) KI 2008: Advances in Artificial Intelligence. KI 2008. Lecture Notes in Computer Science(), vol 5243. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85845-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85845-4_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85844-7

  • Online ISBN: 978-3-540-85845-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics