Abstract
This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akiba Y, Watanabe T, Sumita E (2002) Using language and translation models to select the best among outputs from multiple MT systems. In: Proceedings of the 19th international conference on computational linguistics. Taipei, Taiwan, pp 8–14
Al-Adhaileh MH, Kong TE (1999) Example-based machine translation based on the synchronous SSTC annotation schema. In: Proceedings of machine translation summit VII, “MT in the great translation era”. Singapore, pp 244–249
Al-Adhaileh MH, Kong TE, Zaharin Y (2002) A synchronization structure of SSTC and its applications in machine translation. In: Proceedings of the Coling-2002 post-conference workshop on machine translation in Asia. Taipei, Taiwan, pp 1–8
Aramaki E, Kurohashi S (2004) Example-based machine translation using structural translation examples. In: Proceedings of the IWSLT2004: International workshop on spoken language translation – Evaluation campaign on spoken language translation. Kyoto, Japan, pp 91–94
Aramaki E, Kurohashi S, Kashioka H, Tanaka H (2003) Word selection for EBMT based on monolingual similarity and translation confidence. In: Proceedings of the HLT/NAACL 2003 workshop on building and using parallel texts: Data driven machine translation and beyond. Edmonton, Canada, pp 57–64
Bikel D (2004) Intricacies in Collins’ parsing model. Comput Linguist 30:479–511
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: Parameter estimation. Comput Linguist 19:263–311
Callison-Burch C, Flournoy RS (2001) A program for automatically selecting the best output from multiple machine translation engines. In: Machine translation summit VIII, “machine translation in the information age”. Santiago de Compostela, Spain, pp 63–66
Collins M (1999) Head-driven statistical models for natural language parsing. PhD Thesis, University of Pennsylvania, Philadelphia, PA
Doddington G (2002) Automatic evaluation of machine translation quality using n-gram statistics. In: Proceedings of the ARPA workshop on human language technology notebook proceedings. San Diego, CA, pp 128–132
Fellbaum C (ed) (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
Germann U (2003) Greedy decoding for statistical machine translation in almost linear time. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 72–79
Imamura K, Okuma H, Watanabe T, Sumita E (2004) Example-based machine translation based on syntactic transfer with statistical models. In: Coling: 20th international conference on computational linguistics. Geneva, Switzerland, pp 99–105
Kaki S, Yamada S, Sumita E (1999) Scoring multiple translations using character N-gram. In: Proceedings of the 5th natural language processing Pacific rim symposium “Closing the [sic]. Beijing, China, pp 298–302
Koehn P (2004) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking RE, Taylor KB (eds) Machine translation: from real users to research; 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, USA, September/October 2004. Springer, Berlin, Germany, pp 115–124
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL: Human language technology conference of the North American chapter of the Association for Computational Linguistics. Edmonton, Alta, Canada, pp 127–133
Langlais P, Gotti F (2006) EBMT by tree-phrasing. Mach Translat 20:1–25
Lin D (1998) An information-theoretic definition of similarity. In: Machine learning: Proceedings of the fifteenth international conference (ICML ’98). Madison, Wisconsin, pp 296–304
Matsumoto Y, Ishimoto H, Utsuro T (1993) Structural matching of parallel texts. In: Proceedings of the 31st annual meeting of the Association for Computational Linguistics. Columbus, OH, pp 23–30
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics. Sapporo, Japan, pp 160–167
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics. Hong Kong, China, pp 440–447
Poutsma A (2000) Data-oriented translation. In: Proceedings of the 18th international conference on computational linguistics: COLING 2000 in Europe. Saarbrücken, Germany, pp 635–641
Shieber SM (1994) Restricting the weak generative capacity of synchronous tree adjoining grammar. Comput Intell 10:371–385
Somers H (1999) Review article: example-based machine translation. Mach Translat 14:113–157
Stolcke A (2002) SRILM – An extensible language modeling toolkit. In: Proceedings of the 7th international conference on spoken language processing (ICSLP2002 – interspeech 2002). Denver, CO, pp 901–904
Utsuro T, Uchimoto K, Matsumoto M, Nagao M (1994) Thesaurus-based efficient example retrieval by generating retrieval queries from similarities. In: Proceedings of the 15th international conference on computational linguistics. Kyoto, Japan, pp 1044–1048
Watanabe H (1992) A similarity-driven transfer system. In: Proceedings of the fifteenth [sic] international conference on computational linguistics. Nantes, France, pp 770–776
Watanabe H (1995) A model of a bi-directional transfer mechanism using rule combinations. Mach Translat 10:269–291
Way A (2003) Machine translation using LFG-DOP. In: Bod R, Scha R, Sima’an K (eds) Data-oriented parsing. CSLI Publications, Stanford, CA, pp 359–384
Yamada K, Knight K (2002) A decoder for syntax-based statistical MT. In: 40th annual meeting of the Association for Computational Linguistics. Philadelphia, PA, pp 303–310
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, Z., Wang, H. & Wu, H. Example-based machine translation based on tree–string correspondence and statistical generation. Machine Translation 20, 25–41 (2006). https://doi.org/10.1007/s10590-006-9016-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-006-9016-4