Abstract
The merit of phrase-based statistical machine translation is often reduced by the complexity to construct it. In this paper, we address some issues in phrase-based statistical machine translation, namely: the size of the phrase translation table, the use of underlying translation model probability and the length of the phrase unit. We present Level-Of-Detail (LOD) approach, an agglomerative approach for learning phrase-level alignment. Our experiments show that LOD approach significantly improves the performance of the word-based approach. LOD demonstrates a clear advantage that the phrase translation table grows only sub-linearly over the maximum phrase length, while having a performance comparable to those of other phrase-based approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2), 263–311 (1993)
Och, F.J., Tillmann, C., Ney, H.: Improved alignment models for statistical machine translation. In: Proc of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20–28. University of Maryland, College Park (1999)
Och, F.J., Ney, H.: A Comparison of alignment models for statistical machine translation. In: Proc. of the 18th International Conference of Computational Linguistics, Saarbruken, Germany (July 2000)
Marcu, D., Wong, W.: A phrase-Based, joint probability model for statistical machine translation. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, pp. 133–139 (July 2002)
Vogel, S., Ney, H., Tillmann, C.: HMM-based word alignment in statistical translation. In: Proc. of COLING 1996: The 16th International Conference of Computational Linguistics, Copenhagen, Denmark, pp. 836–841 (1996)
Tillmann, C.: A projection extension algorithm for statistical machine translation. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, Sapporo, Japan (2003)
Zhang, Y., Vogel, S., Waibel, A.: Integrated phrase segmentation and alignment algorithm for statistical machine translation. In: Proc. of the Conference on Natural Language Processing and Knowledge Engineering, Beijing, China (2003)
Koehn, P., Och, F.J., Marcu, D.: Statistical Phrase-based Translation. In: Proc. of the Human Language Technology Conference, Edmonton, Canada, May/June, pp. 127–133 (2003)
Venugopal, A., Vogel, S., Waibel, A.: Effective phrase translation extraction from alignment models. In: Proc. of 41st Annual Meeting of Association of Computational Linguistics, Sapporo, Japan, pp. 319–326 (July 2004)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: A method for automatic evaluation of machine translation. Technical Report RC22176 (W0109-022), IBM Research Report (2001)
Doddington, G.: Automatic evaluation of machine translation quality using N-gram co-occurence statistics. In: Proc. of the Conference on Human Language Technology, San Diego, CA, USA, pp. 138–135 (2002)
Zens, R., Ney, H.: Improvements in phrase-Based statistical machine translation. In: Proc. of Conference on Human Language Technology, Boston, MA, USA, pp. 257–264 (2004)
Melamed, I.D.: Automatic discovery of non-compositional compounds in parallel data. In: Proc. of 2nd Conference on Empirical Methods in Natural Language Processing, Provicence, RI (1997)
Moore, R.C.: Towards a simple and accurate statistical approach to learning translation relationships among words. In: Proc. of Workshop on Data-driven Machine Translation, 39th Annual Meeting and 10th Conference of the European Chapter, Association for Computational Linguistics, Toulouse, France, pp. 79–86 (2001)
Schwartz, R., Chow, Y.L.: The N-best algorithm: An efficient and exact procedure for finding the N most likely sentence hypothesis. In: Proc. of ICASSP 1990, Albuquerque, CA, pp. 81–84 (1990)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proc. of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 388–395 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Setiawan, H., Li, H., Zhang, M., Ooi, B.C. (2005). Phrase-Based Statistical Machine Translation: A Level of Detail Approach. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_51
Download citation
DOI: https://doi.org/10.1007/11562214_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)