Abstract.
This paper presents a methodology for evaluating Arabic Machine Translation (MT) systems. We are specifically interested in evaluating lexical coverage, grammatical coverage, semantic correctness and pronoun resolution correctness. The methodology presented is statistical and is based on earlier work on evaluating MT lexicons in which the idea of the importance of a specific word sense to a given application domain and how its presence or absence in the lexicon affects the MT system’s lexical quality, which in turn will affect the overall system output quality. The same idea is used in this paper and generalized so as to apply to grammatical coverage, semantic correctness and correctness of pronoun resolution. The approach adopted in this paper has been implemented and applied to evaluating four English-Arabic commercial MT systems. The results of the evaluation of these systems are presented for the domain of the Internet and Arabization.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Akiba Y., E. Sumita, H. Nakaiwa, S. Yamamoto., H. Okuno 2002, ’Experimental Comparison of MT Evaluation Methods: RED versus BLEU’. In Proceedings of the MT Summit IX, New Orleans, pp. 1-8.
Al-Jundi F. 1997, [‘Al-Mutarjim Al-Arabey An Attempt to Understand English’], PC Magazine (Middle East) october, 40-44.
Andrewsky A. 1978, ‘Le problème de l’évaluation d’une traduction automatique’, [‘The problem of evaluating a machine translation’],CEC Memorandum, February 1978.
Anon. 1996, [‘The Machine Translator: Al-Wafi’], Arabuter 8.71, 27-28.
InstitutionalAuthorNameArab.Net Technology Ltd. (1996) Arabtrans User’s Guide Arab Press House Simi Valley,California
Arnold, D.J., R.L. Humphreys., L. Sadler (eds) 1993, ‘Special Issue on Evaluation of MT Systems’. Machine Translation VOL:1-2.
ATA 1997, Al-Mutarjim Al-Arabey User manual. http://www.almisbar.com/salam_trans_a.html, [accessed 11 February 2005].
Carroll J, T. Briscoe 1998, ’A Survey of Parser Evaluation Methods,’ in Proceedings of the Workshop on the Evaluation of Parsing Systems, University of Sussex.
Chaumier J., M.C. Mallen., G. van Slype 1977, ’Evaluation du système de traduction automatique SYSTRAN; Evaluation de la qualité de la traduction’ [Evaluation of the SYSTRAN machine translation system; translation quality evaluation],CEC Report No 4, Luxembourg.
Culy C., S.Z. Riehemann 2003, ’The Limits of N-Gram Translation Evaluation Metrics,’ in Proceedings of the MT Summit IX, New Orleans, pp. 133-138.
M.C. Dyson J. Hannah (1987) ArticleTitle‘Towards a Methodology for the Evaluation of Machine-Assisted Translation Systems’ Computers and Translation 2 163–176
A. Guessoum R. Zantout (2000) ArticleTitle’Arabic Machine Translation: A Strategic Choice for the Arab World’ KSU Computer and Information Sciences Journal 12 117–144
A. Guessoum R. Zantout (2001a) ArticleTitle‘A Methodology for a semi-automatic evaluation of the language coverage of machine translation system lexicons’ Machine Translation 16 127–149 Occurrence Handle10.1023/A:1014504808954
Guessoum A., R. Zantout 2001, ‘Semi-Automatic Evaluation of the Grammatical Coverage of Machine Translation Systems’. inProceedings of the MT Summit VIII, Santiago de Compostela, Spain, pp. 133-138.
T.C. Halliday E.A. Briss (Eds) (1977) The Evaluation and Systems Analysis of the Systran Machine Translation System Rome Air Development Center, Griffiss Air Force Base New York
S. Hedberg (1994) ArticleTitle‘Machine Translation Comes of Age’ AI Expert 9 IssueID10 37
Hovy E, M. King, A. Popescu-Belis 2002, ‘An Introduction to Machine Translation Evaluation,’ in Proceedings of the Workshop at the LREC 2002 Conference, Las Palmas, Spain, pp. 1-7.
W.J Hutchins H.L. Somers (1992) An Introduction to Machine Translation Academic Press London
Jihad, A. (1996), [’Has the Arabic Machine ranslation Era Started?’], Byte Middle East November, 36-48.
D. Jurafsky J.H. Martin (2000) Speech Processing and Language Processing Prentice Hall Upper Saddle River NJ
J. Klein S. Lehmann K. Netter T. Wegst (1998) ‘DiET in the Context of MT Evaluation’ B. Schröder W. Lenders W. Hess T. Portele (Eds) Computer Linguistik und Phonetik zwischen Sprache und Sprechen, Computers, Linguistics, Phonetics between Language and Speech Peter Lang Bern 107–126
King, M. and K. Falkedal (1990), ’Using Test Suites in Evaluation of Machine Translation, Systems,’ inCOLING 1990, Proceedings of the 13th International Conference on Computational Linguistics, Helsinki, Vol. 2, pp. 211-216.
King, M., B. Maegaard, J. Schultz, L. des Tombe, A. Bech, A. Neville, A. Arppe, L. Balkan, C. Brace, H. Bunt, L. Carlson, S. Douglas, M. Höge, S. Krauwer, S. Manzi, C. Mazzi, A.J. Sielemann., R. Steenbakkers (1996), EAGLES - Evaluation of Natural Language Processing Systems, final report, EAG-EWG-PR.2, October 1996.
J. Lehrberger L. Bourbeau (1988) Machine Translation: Linguistic Characteristics of MT Systems and General Methodology of Evaluation John Benjamins Amsterdam
J. Mason A. Rinsche (1995) Ovum Evaluates: Translation Technology Products OVUM Ltd London
Melby A.K. (1988), ‘Lexical Transfer: Between a Source Rock and a Hard Target’, inProceedings of the 12th International Conference on Computational Linguistics (COLING), Budapest, pp. 411-419.
C. Mellish R. Dale (1998) ArticleTitle‘Evaluation in the Context of Natural Language Generation’ Journal of Computer Speech and Language 12 349–373 Occurrence Handle10.1006/csla.1998.0106
M. Nagao (1985) ArticleTitle‘Evaluation of the Quality of Machine-Translated Sentences and the Control of Language’ Journal of the Information Processing Society of Japan 26 1197–1202
Nyberg E.H., Mitamura T., Carbonell J.G. (1992), ’The KANT System: Fast, Accurate, High-Quality Translation in Practical Domains,’ in Proceedings of the fifteenth [sic] International Conference on Computational Linguistics, COLING ’92, Nantes, France, pp. 1069-1073.
Nyberg, E.H., T. Mitamura., J.G. Carbonell (1994), ’Evaluation Metrics for Knowledge-Based Machine Translation’, COLING 1994, 15th International Conference on Computational Linguistics, Kyoto, pp. 95-99.
Papineni, K., S. Roukos, T. Ward., W-J. Zhu (2002), ’BLEU: A Method for Automatic Evaluation of Machine Translation,’ in 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, pp. 311-318.
Qendelft, G. (1997), [The Translation Program Al-Wafi Is Useful for Getting a General Understanding of a Letter Written in English], (Al-Hayat), 25 October 1997.
H.W. Sinaiko G.R. Klare (1972) ArticleTitle’Further Experiments in Language Translation: Readability of Computer Translations’ ITL 15 1–29
H.W. Sinaiko G.R. Klare (1973) ArticleTitle’Further Experiments in Language Translation: A Second Evaluation of the Readability of Computer Translations’ ITL 19 29–52
G. Slype Particlevan (1979a) ArticleTitle’Systran: Evaluation of the 1978 Version of the Systran English-French Automatic System of the Commission of the European Communities’ The Incorporated Linguist 18 86–89
G. Slype Particlevan (1979b) Critical Study of Methods for Evaluating the Quality of Machine Translation (Final Report), Prepared for the Commission of the European Communities Bureau Marcel van Dyke Brussels
M. Vasconcellos (Eds) (1988) Technology as Translation Strategy State University of New York at Binghampton (SUNY) Binghampton NY
White, J., T. O’Connell, and F. O’Mara: 1994, ’The ARPA MT Evaluation Methodologies: Evolution, Lessons, and Future Approaches,’ in Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, pp. 193-205.
Y. Wilks (1991) ’Systran: It Obviously Works, but How Much Can It Be Improved?’ Computer Research Laboratory, New Mexico State University Las Cruces
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Guessoum, A., Zantout, R. A Methodology for Evaluating Arabic Machine Translation Systems. Mach Translat 18, 299–335 (2004). https://doi.org/10.1007/s10590-005-2412-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-005-2412-3