Abstract
Our approach to solve the problem of Korean named entity classification adopted a co-training method called DL-CoTrain. We use only a part-of-speech tagger and a simple noun phrase chunker instead of a full parser to extract the contextual features of a named entity. We will discuss the linguistic features in Korean which are valuable for named entity classification and experimentally show how large a labeled corpus and which unlabeled corpus is necessary for the better performance and portability of a named entity classifier. With only about a quarter of the labeled corpus, our method can compete with its supervised counterpart.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co-training. In: Proceedings of the Workshop on Computational Learning Theory(COLT). Morgan Kaufmann Publishers, San Francisco (1998)
Cha, J., Lee, G., Lee, J.-H.: Generalized Unknown Morpheme Guessing for Hybrid POS Tagging of Korean. In: Proceedings of the Sixth Workshop on Very Large Corpora, pp. 85–93 (1998)
Collins, M., Singer, Y.: Unsupervised Models for Named Entity Classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. (1999)
Collins, M.J.: A New Statistical Parser Based on Bigram Lexical Dependencies. In: Joshi, A., Palmer, M. (eds.) Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics, pp. 184–191. Morgan Kaufmann Publishers, San Francisco (1996)
Cucerzan, S., Yarowsky, D.: Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence. In: Proceedings of Joint SIGDAT Conference on EMNLP and VLC (1999)
Kim, H.-G., Kang, B.-M.: 21st Century Sejong Project - Compiling Korean Corpora. In: Proceedings of the 19th International Conference on Computer Processing of Oriental Languages (2001)
Kim, J.-H., Kwak, B.-K., Lee, S.-w., Lee, G., Lee, J.-H.: A Corpus-Based Learning Method of Compound Noun Indexing Rules for Korean. Information Retrieval 27(4), 115–132 (2001)
MUC-6: Proceedings of The Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann Publisher, San Francisco (1995)
MUC-7: Proceedings of The Seventh Message Understanding Conference (MUC-7) (1998)
Satoshi, S., Hitoshi, I.: IREX: IR and IE Evaluation Project in Japanese. In: Proceedings of the 2nd International Conference on Language Resources & Evaluation (2000)
Seon, C.-N., Ko, Y., Kim, J.-S., Seo, J.: Named Entity Recognition using Machine Learning Methods and Pattern-Selection Rules. In: Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, pp. 229–236 (2001)
Utsuro, T., Sassano, M.: Minimally Supervised Japanese Named Entity Recognition: Resources and Evaluation. In: Proceedings of the 2nd International Conference on Language Resources & Evaluation (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kwak, BK., Cha, JW. (2005). Named Entity Tagging for Korean Using DL-CoTrain Algorithm. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_55
Download citation
DOI: https://doi.org/10.1007/11562382_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)