Abstract
This paper proposes a lexicon-constrained character model that combines both word and character features to solve complicated issues in Chinese morphological analysis. A Chinese character-based model constrained by a lexicon is built to acquire word building rules. Each character in a Chinese sentence is assigned a tag by the proposed model. The word segmentation and part-of-speech tagging results are then generated based on the character tags. The proposed method solves such problems as unknown word identification, data sparseness, and estimation bias in an integrated, unified framework. Preliminary experiments indicate that the proposed method outperforms the best SIGHAN word segmentation systems in the open track on 3 out of the 4 test corpora. Additionally, our method can be conveniently integrated with any other Chinese morphological systems as a post-processing module leading to significant improvement in performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Wu, A.: Chinese Word Segmentation in MSR-NLP. In: Proc. of SIGHAN Workshop, Sapporo, Japan, pp. 127–175 (2003)
Zhou, G., Su, J.: A Chinese Efficient Analyzer Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 78–83 (2003)
Zhang, H., Yu, H.-K., et al.: HHMM-based Chinese Lexical Analyzer ICTCLAS. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 184–187 (2003)
Xue, N., Shen, L.: Chinese Word Segmentation as LMR Tagging. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 176–179 (2003)
Ng, H.T., Low, J.K.: Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? In: Proc. of EMNLP, Barcelona, Spain, pp. 277–284 (2004)
Nakagawa, T.: Chinese and Japanese Word Segmentation Using Word-level and Character-level Information. In: Proc. of the 20th COLING, Geneva, Switzerland, pp. 466–472 (2004)
Fu, G., Luke, K.-K.: A Two-stage Statistical Word Segmentation System for Chinese. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 156–157 (2003)
Gao, J., Wu, A., Huang, C.-N., et al.: Adaptive Chinese Word Segmentation. In: Proc. of 42nd ACL, Barcelona, Spain, pp. 462–469 (2004)
Sproat, R., Emerson, T.: The First International Chinese Word Segmentation Bakeoff. In: Proc. Of SIGHAN Workshop, Sapporo, Japan, pp. 133–143 (2003)
Luo, X.: A Maximum Entropy Chinese Character-based Parser. In: Proc. of EMNLP, Sapporo, Japan, pp. 192–199 (2003)
Jin, H., Wong, K.-F.: A Chinese Dictionary Construction Algorithm for Information Retrieval. ACM Transactions on Asian Language Information Processing 1(4), 281–296 (2002)
Meng, Y., Yu, H., Nishino, F.: Chinese New Word Identification Based on Character Parsing Model. In: Proc. of 1st IJCNLP, Hainan, China, pp. 489–496 (2004)
S. Yu., H. Duan., et al.: 北京大学现代汉语语料库基本加工规范. 中文信息学报 v(5), pp 49–64, 58–65 (2002)
Sun, M., Sou, B.K.T.: Ambiguity Resolution in Chinese Word Segmentation. In: Proc. of 10th Pacific Asia Conference on Language, Information & Computation, pp. 121–126 (1995)
Xue, N., Chiou, F.-D., Palmer, M.: Building a Large-scale Annotated Chinese Corpus. In: Proc. of the 19th COLING, Taibei, Taiwan (2002)
Goh, C.-L., Asahara, M., Matsumoto, Y.: Chinese Unknown Word Identification Using Character-based Tagging and Chunking. In: Proc. of the 41st ACL, Interactive Poster/Demo Sessions, Sapporo, Japan, pp. 197–200 (2003)
Luo, S., Sun, M.: Two-character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measure. In: Proc. of the 2nd SIGHAN Workshop, Sapporo, Japan, pp. 20–30 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Meng, Y., Yu, H., Nishino, F. (2005). A Lexicon-Constrained Character Model for Chinese Morphological Analysis. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_48
Download citation
DOI: https://doi.org/10.1007/11562214_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)