Abstract
This paper presents our recent work on period disambiguation, the kernel problem in sentence boundary identification, with the maximum entropy (Maxent) model. A number of experiments are conducted on PTB-II WSJ corpus for the investigation of how context window, feature space and lexical information such as abbreviated and sentence-initial words affect the learning performance. Such lexical information can be automatically acquired from a training corpus by a learner. Our experimental results show that extending the feature space to integrate these two kinds of lexical information can eliminate 93.52% of the remaining errors from the baseline Maxent model, achieving an F-score of 99.8227%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aberdeen, J., Burger, J., Day, D., Hirschman, L., Robinson, P., Vilain, M.: Mitre: Description of the alembic system used for muc-6. In: Proceedings of the Sixth Message Understanding Conference (MUC-6), Columbia, Maryland. Morgan Kaufmann, San Francisco (1995)
Berger, A., Pietra, S.D., Pietra, V.D.: A maximum entropy approach to natural language processing. Computational linguistics 22(1), 39–71 (1996)
Della Pietra, S., Della Pietra, V., Lafferty, J.: Inducing features of random fields. Transactions Pattern Analysis and Machine Intelligence 19(4), 380–393 (1997)
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of CoNLL-2002, Taipei, Taiwan, pp. 49–55 (2002)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19(2), 313–329 (1993)
Mikheev, A.: Tagging sentence boundaries. In: Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics, NAACL 2000 (2000)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Palmer, D.D., Hearst, M.A.: Adaptive Multilingual Sentence Boundary Disambiguation. Computational Linguistics 23(2), 241–267 (1997)
Ratnaparkhi, A.: Maximum entropy models for natural language ambiguity resolution. Ph.D. dissertation, University of Pennsylvania (1998)
Reynar, J.C., Ratnaparkhi, A.: A maximum entropy approach to identifying sentence boundaries. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, Washington, D.C. (1997)
Riley, M.D.: Some applications of tree-based modelling to speech and language indexing. In: Proceedings of the DARPA Speech and Natural Language Workshop, pp. 339–352. Morgan Kaufmann (1989)
Rosenfeld, R.: Adaptive statistical language modeling: A Maximum Entropy Approach. PhD thesis CMU-CS-94 (1994)
Van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1979)
Wallach, H.M.: Efficient training of conditional random fields. Master’s thesis, University of Edinburgh (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kit, C., Liu, X. (2005). Period Disambiguation with Maxent Model. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_20
Download citation
DOI: https://doi.org/10.1007/11562214_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)