Abstract
It is difficult to cope with data sparseness, unless augmenting the size of the dictionary in a stochastic-based word-spacing model is an option. To resolve both data sparseness and the dictionary memory size problem, this paper describes the process of dynamically providing candidate words to detect correct words using morpheme unigrams and their categories. Each candidate word’s probability was estimated from the morpheme probability, which was weighted according to its category. The category weights were trained to minimize the mean of the errors between the observed probability of a word and that estimated by the word’s individual morpheme probability weighted by its category power in a category pattern for producing the given word.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kang, M.Y., Yoon, A.S., Kwon, H.C.: Combined Word-Spacing Method for Disambiguating Korean Texts. In: Webb, G.I., Yu, X. (eds.) AI 2004. LNCS (LNAI), vol. 3339, pp. 562–573. Springer, Heidelberg (2004)
Kang, S.S., Woo, C.W.: Automatic Segmentation of Words Using Syllable Bigram Statistics. In: Proceedings of the 6th Natural Language Processing Pacific Rim Symposium, pp. 729–732 (2001)
Lee, D.G., Lee, S.Z., Lim, H.S., Rim, H.C.H.: Two Statistical Models for Automatic Word spacing of Korean Sentences. Journal of KISS(B): Software and Applications 30(4), 358–370 (2003)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (2001)
Shim, K.S.: Automated Word-Segmentation for Korean using Mutual Information of Syllables. Journal of KISS(B) 23, 991–1000 (1996)
Sin, H.C.H.: A Study of Word-spacing using Morphological Analysis. Korean Linguistic 12 12, 167–185 (2000)
Sproat, R., Shih, C., Gale, W., Chang, N.: A Stochastic Finite-State Word-Segmentation Algorithm for Chinese. Computational Linguistics 22(3), 377–404 (1996)
Tsai, C.-H.: Word identification and eye movements in reading Chinese: A modeling approach. Doctoral thesis, University of Illinois at Urbana-Champaign, IL, USA (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, My., Jung, Sw., Kwon, Hc. (2006). Category-Pattern-Based Korean Word-Spacing. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_30
Download citation
DOI: https://doi.org/10.1007/11940098_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49667-0
Online ISBN: 978-3-540-49668-7
eBook Packages: Computer ScienceComputer Science (R0)