Abstract
In this paper, we present techniques aimed at avoiding typical errors of state-of-the-art POS-taggers and at constructing high-quality POS-taggers with extremely low error rates. Such taggers are very helpful, if not even necessary, for many NLP applications organized in a pipeline architecture. The appropriateness of the suggested solutions is demonstrated in several experiments. Although these experiments were performed only with German data, the proposed modular architecture is applicable for many other languages, too.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brants, T.: TnT - a Statistical Part-of-Speech Tagger. In: Proceedings of the 6th Applied NLP conference, ANLP-2000, Seattle, WA (2000)
Hajič, J., Krbec, P., Květoň, P., Oliva, K., Petkevič, V.: Serial Combination of Rules and Statistics: A Case Study in Czech Tagging. In: Proceedings of ACL 2001, Toulouse (2001)
Höhle, T.: Der Begriff Mittelfeld. Anmerkungen über die Theorie der topologischen Felder. In: Weiss, W., Wiegand, E.H., Reis, M. (eds.) Textlinguistik contra Stilistik/Wortschatz und Wörterbuch/Grammatische oder pragmatische Organisation von Rede, Niemeyer, Tübingen (1986)
Klatt, S., Bohnet, B.: You don’t have to think twice if you carefully tokenize. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 299–309. Springer, Heidelberg (2005)
Klatt, S.: Combining a Rule-Based Tagger with a Statistical Tagger for Annotating German Texts. In: Busemann, S. (ed.) KONVENS 2002. 6. Konferenz zur Verarbeitung natürlicher Sprache, Saarbrücken, Germany (2002)
Samuelsson, C., Voutilainen, A.: Comparing a Linguistic and a Stochastic Tagger. In: Proceedings of the Joint 35th Annual Meeting of the Association for Computational Linguistics (1997)
Schiller, A., Teufel, S., Stöckert, C.: Guidelines für das Tagging deutscher Textcorpora mit STTS Technical Report, University of Stuttgart and University of Tübingen (1999)
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Armstrong, S., Church, K.W., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Kluwer, Dordrecht (1999)
Skut, W., Brants, T., Krenn, B., Uszkoreit, H.: A Linguistically Interpreted Corpus of German Newspaper Text. In: Workshop on Recent Advances in Corpus Annotation, 10th European Summer School in Logic, Language and Information, Saarbrücken, Germany (1998)
Trushkina, J.: Morpho-Syntactic Annotation and Dependency Parsing of German Ph.D. thesis, University of Tübingen (2004)
Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Method. In: Meeting of the Association for Computational Linguistics (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klatt, S., Oliva, K. (2005). On the Road to High-Quality POS-Tagging. In: Furbach, U. (eds) KI 2005: Advances in Artificial Intelligence. KI 2005. Lecture Notes in Computer Science(), vol 3698. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551263_31
Download citation
DOI: https://doi.org/10.1007/11551263_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28761-2
Online ISBN: 978-3-540-31818-7
eBook Packages: Computer ScienceComputer Science (R0)