{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,4,2]],"date-time":"2025-04-02T21:03:32Z","timestamp":1743627812949},"reference-count":65,"publisher":"Cambridge University Press (CUP)","issue":"3","license":[{"start":{"date-parts":[[2020,4,3]],"date-time":"2020-04-03T00:00:00Z","timestamp":1585872000000},"content-version":"unspecified","delay-in-days":0,"URL":"https:\/\/www.cambridge.org\/core\/terms"}],"content-domain":{"domain":["cambridge.org"],"crossmark-restriction":true},"short-container-title":["Nat. Lang. Eng."],"published-print":{"date-parts":[[2021,5]]},"abstract":"Abstract<\/jats:title>Translations are generally assumed to share universal features that distinguish them from texts that are originally written in the same language. Thus, we can argue that these translations constitute their own variety of a language, often called translationese. However, translations are also influenced by their source languages and thus show different characteristics depending on the source language. Consequently, we argue that these variants constitute different \u201cdialects\u201d of translations into the same target language. Studies using machine learning techniques on Indo-European languages have investigated the universal characteristics of translationese and how translations from various source languages differ. However, for typologically very different languages such as Chinese, there are only few corpus studies that tap into the intricate relation between translations and the originals, as well as into the relations among translations themselves. In this contribution, we investigate the following questions: (1) What are the characteristics of Chinese translationese, both in general and with respect to different source languages? (2) Can we find differences not only at the lexical but also on the syntactic level? and (3) Based on the characteristics found in the previous questions, which of the proposed laws and universals can we corroborate based on our evidence from Chinese? We use machine learning to operationalize determining the importance of different characteristics and comparing their importance for our Chinese dataset with characteristics previously reported in studies on English. In addition, our methodology allows us to add syntactic features, which have rarely been used to study translations into Chinese. Our results show that Chinese translations as a whole can be reliably distinguished from non-translations, even based on only five features. More interestingly, typological traces from the source languages can often be found in their translations, therefore creating what we call dialects of translationese. For instance, translations from two Altaic languages exhibit more noun repetition and less frequent use of pronouns. Additionally, some characteristics that are not discriminative for English work well for Chinese, possibly because the distance between Chinese and the source languages is greater than that in English studies.<\/jats:p>","DOI":"10.1017\/s1351324920000182","type":"journal-article","created":{"date-parts":[[2020,4,3]],"date-time":"2020-04-03T05:45:34Z","timestamp":1585892734000},"page":"339-372","update-policy":"http:\/\/dx.doi.org\/10.1017\/policypage","source":"Crossref","is-referenced-by-count":5,"title":["Investigating translated Chinese and its variants using machine learning"],"prefix":"10.1017","volume":"27","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-2289-9008","authenticated-orcid":false,"given":"Hai","family":"Hu","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0003-0885-5436","authenticated-orcid":false,"given":"Sandra","family":"K\u00fcbler","sequence":"additional","affiliation":[]}],"member":"56","published-online":{"date-parts":[[2020,4,3]]},"reference":[{"key":"S1351324920000182_ref18","doi-asserted-by":"crossref","first-page":"157","DOI":"10.1515\/9783110459586-006","volume-title":"Empirical Translation Studies: New Theoretical and Methodological Traditions","author":"Ferraresi","year":"2017"},{"key":"S1351324920000182_ref39","doi-asserted-by":"publisher","DOI":"10.1353\/lan.2018.0053"},{"key":"S1351324920000182_ref25","volume-title":"A Study of Grammatical Features in Europeanized Chinese","author":"He","year":"2008"},{"key":"S1351324920000182_ref62","doi-asserted-by":"publisher","DOI":"10.1075\/ijcl.15.1.01xia"},{"key":"S1351324920000182_ref38","unstructured":"Lin, C.-J.C. (2011). Chinese and English relative clauses: Processing constraints and typological consequences. In Proceedings of the 23rd North American Conference on Chinese Linguistics (NACCL-23), Eugene, OR."},{"key":"S1351324920000182_ref24","doi-asserted-by":"publisher","DOI":"10.1145\/1656274.1656278"},{"key":"S1351324920000182_ref52","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/N16-1110"},{"key":"S1351324920000182_ref59","volume-title":"Contemporary Grammar of Chinese","author":"Wang","year":"1943"},{"key":"S1351324920000182_ref49","doi-asserted-by":"publisher","DOI":"10.1075\/btl.48.13puu"},{"key":"S1351324920000182_ref42","volume-title":"A Sketch of Chinese Grammar","author":"Lv","year":"1942"},{"key":"S1351324920000182_ref10","unstructured":"Cartoni, B. , Zufferey, S. , Meyer, T. and Popescu-Belis, A. (2011). How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives. In Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, Portland, OR, pp. 78\u201386."},{"key":"S1351324920000182_ref11","unstructured":"Chen, J.W. (2006). Explicitation Through the Use of Connectives in Translated Chinese: A Corpus-Based Study . PhD Thesis, The University of Manchester."},{"key":"S1351324920000182_ref47","doi-asserted-by":"publisher","DOI":"10.1556\/Acr.1.2000.2.1"},{"key":"S1351324920000182_ref57","doi-asserted-by":"publisher","DOI":"10.1075\/btl.4"},{"key":"S1351324920000182_ref30","first-page":"303","article-title":"Fanyi zhong de xian he yin (implicitation and explicitation in translations)","volume":"37","author":"Ke","year":"2005","journal-title":"Foreign Language Teaching and Research"},{"key":"S1351324920000182_ref33","first-page":"75","volume-title":"New Perspectives on Cohesion and Coherence","author":"Kunilovskaya","year":"2017"},{"key":"S1351324920000182_ref27","doi-asserted-by":"publisher","DOI":"10.1515\/cllt-2014-0047"},{"key":"S1351324920000182_ref2","doi-asserted-by":"publisher","DOI":"10.1075\/target.7.2.03bak"},{"key":"S1351324920000182_ref9","doi-asserted-by":"crossref","first-page":"235","DOI":"10.1515\/9783110459586-009","volume-title":"Empirical Translation Studies: New Theoretical and Methodological Traditions","author":"Cappelle","year":"2017"},{"key":"S1351324920000182_ref43","doi-asserted-by":"publisher","DOI":"10.1162\/coli_a_00323"},{"key":"S1351324920000182_ref8","unstructured":"Bykh, S. and Meurers, D. (2014). Exploring syntactic features for native language identification: A variationist perspective on feature encoding and ensemble optimization. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics, Dublin, Ireland, pp. 1962\u20131973."},{"key":"S1351324920000182_ref35","unstructured":"Laviosa-Braithwaite, S. (1996). The English Comparable Corpus (ECC): A Resource and a Methodology for the Empirical Study of Translation. PhD Thesis, University of Manchester."},{"key":"S1351324920000182_ref12","first-page":"363","article-title":"Discourse analysis of zero anaphora in Chinese","volume":"5","author":"Chen","year":"1987","journal-title":"Chinese Philology"},{"key":"S1351324920000182_ref50","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/P16-1176"},{"key":"S1351324920000182_ref53","doi-asserted-by":"publisher","DOI":"10.1002\/j.1538-7305.1948.tb01338.x"},{"key":"S1351324920000182_ref4","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqi039"},{"key":"S1351324920000182_ref20","first-page":"88","volume-title":"Translation Studies in Scandinavia","author":"Gellerstam","year":"1986"},{"key":"S1351324920000182_ref46","unstructured":"Meyer, T. and Webber, B. (2013). Implicitation of discourse connectives in (machine) translation. In Proceedings of the Workshop on Discourse in Machine Translation, pp. 19\u201326."},{"key":"S1351324920000182_ref19","first-page":"159","volume-title":"Translation: Literary, Linguistic and Philosophical Perspectives","author":"Frawley","year":"1984"},{"key":"S1351324920000182_ref28","first-page":"319","article-title":"Translationese traits in Romanian newspapers: A machine learning approach","volume":"2","author":"Ilisei","year":"2011","journal-title":"International Journal of Computational Linguistics and Applications"},{"key":"S1351324920000182_ref45","doi-asserted-by":"publisher","DOI":"10.1075\/btl.48"},{"key":"S1351324920000182_ref37","unstructured":"Levy, R. and Andrew, G. (2006). Tregex and Tsurgeon: Tools for querying and manipulating tree data structures. In Proceedings of the Fifth International Conference on Language Resources and Evaluation, Genoa, Italy, pp. 2231\u20132234."},{"key":"S1351324920000182_ref61","unstructured":"Wang, L. (1958). History of the Chinese Language. Zhonghua Book Company (In Chinese) Beijing."},{"key":"S1351324920000182_ref6","doi-asserted-by":"publisher","DOI":"10.7202\/002054ar"},{"key":"S1351324920000182_ref21","unstructured":"Graff, D. (2007). Chinese Gigaword, 3rd Edn. LDC Catalog No.: LDC2007T38, ISBN: 1-58563-455-7."},{"key":"S1351324920000182_ref16","doi-asserted-by":"publisher","DOI":"10.1515\/9783110459586"},{"key":"S1351324920000182_ref7","first-page":"17","volume-title":"Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies","author":"Blum-Kulka","year":"1986"},{"key":"S1351324920000182_ref31","unstructured":"Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit, Phuket, Thailand, pp. 79\u201386."},{"key":"S1351324920000182_ref3","first-page":"175","author":"Baker","year":"1996"},{"key":"S1351324920000182_ref5","unstructured":"Becher, V. (2011). Explicitation and Implicitation in Translation. A Corpus-Based Study of English-German and German-English Translations of Business Texts. PhD Thesis, University of Hamburg."},{"key":"S1351324920000182_ref13","unstructured":"Chen, Z. , Boston, M.F. and Hale, J.T. (2009). Using entropy to evaluate child language performance. In The 22nd CUNY Conference on Human Sentence Processing, Davis, CA."},{"key":"S1351324920000182_ref22","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqm020"},{"key":"S1351324920000182_ref14","first-page":"22","article-title":"Word association norms, mutual information, and lexicography","volume":"16","author":"Church","year":"1990","journal-title":"Computational Linguistics"},{"key":"S1351324920000182_ref1","first-page":"233","author":"Baker","year":"1993"},{"key":"S1351324920000182_ref63","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-41363-6"},{"key":"S1351324920000182_ref44","doi-asserted-by":"publisher","DOI":"10.3115\/v1\/P14-5010"},{"key":"S1351324920000182_ref55","doi-asserted-by":"publisher","DOI":"10.1515\/9783110896541"},{"key":"S1351324920000182_ref41","doi-asserted-by":"publisher","DOI":"10.1075\/ijcl.15.4.02lu"},{"key":"S1351324920000182_ref34","doi-asserted-by":"publisher","DOI":"10.1353\/lan.2013.0044"},{"key":"S1351324920000182_ref58","doi-asserted-by":"publisher","DOI":"10.1093\/llc\/fqt031"},{"key":"S1351324920000182_ref48","doi-asserted-by":"publisher","DOI":"10.1075\/btl.48.12pap"},{"key":"S1351324920000182_ref51","doi-asserted-by":"publisher","DOI":"10.1162\/tacl_a_00148"},{"key":"S1351324920000182_ref40","unstructured":"Lin, C.-J.C. and Hu, H. (2018). Syntactic complexity as a measure of linguistic authenticity in modern Chinese. In 26th Annual Conference of International Association of Chinese Linguistics and the 20th International Conference on Chinese Language and Culture, Madison, WI."},{"key":"S1351324920000182_ref17","doi-asserted-by":"crossref","first-page":"47","DOI":"10.1515\/9783110459586-003","volume-title":"Empirical Translation Studies: New Theoretical and Methodological Traditions","author":"Evert","year":"2017"},{"key":"S1351324920000182_ref29","doi-asserted-by":"publisher","DOI":"10.1007\/978-3-642-12116-6_43"},{"key":"S1351324920000182_ref65","volume-title":"Dialogues in Grammar","author":"Zhu","year":"1985"},{"key":"S1351324920000182_ref60","volume-title":"Theory of Chinese Grammar","author":"Wang","year":"1944"},{"key":"S1351324920000182_ref64","doi-asserted-by":"publisher","DOI":"10.1017\/S135132490400364X"},{"key":"S1351324920000182_ref36","doi-asserted-by":"publisher","DOI":"10.1162\/COLI_a_00111"},{"key":"S1351324920000182_ref56","unstructured":"Toury, G. (1978). The nature and role of norms in translation. In Holmes, J. , Lambert, J. and van den Broeck, R. (eds), Literature and Translation: New Perspectives in Literary Studies. Acco, Leuven."},{"key":"S1351324920000182_ref26","doi-asserted-by":"publisher","DOI":"10.18653\/v1\/W18-1603"},{"key":"S1351324920000182_ref54","unstructured":"Swanson, B. and Charniak, E. (2012). Native language detection with tree substitution grammars. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, South Korea pp. 193\u2013197."},{"key":"S1351324920000182_ref15","unstructured":"Da, J. (2004). A corpus-based study of character and bigram frequencies in Chinese e-texts and its implications for Chinese language instruction. In Proceedings of the Fourth International Conference on New Technologies in Teaching and Learning Chinese, Beijing, China, pp. 501\u2013511."},{"key":"S1351324920000182_ref32","unstructured":"Koppel, M. and Ordan, N. (2011). Translationese and its dialects. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, pp. 1318\u20131326."},{"key":"S1351324920000182_ref23","doi-asserted-by":"publisher","DOI":"10.1111\/lnc3.12196"}],"container-title":["Natural Language Engineering"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.cambridge.org\/core\/services\/aop-cambridge-core\/content\/view\/S1351324920000182","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,20]],"date-time":"2022-10-20T12:48:29Z","timestamp":1666270109000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.cambridge.org\/core\/product\/identifier\/S1351324920000182\/type\/journal_article"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,4,3]]},"references-count":65,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2021,5]]}},"alternative-id":["S1351324920000182"],"URL":"https:\/\/doi.org\/10.1017\/s1351324920000182","relation":{},"ISSN":["1351-3249","1469-8110"],"issn-type":[{"value":"1351-3249","type":"print"},{"value":"1469-8110","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,4,3]]},"assertion":[{"value":"\u00a9 Cambridge University Press 2020","name":"copyright","label":"Copyright","group":{"name":"copyright_and_licensing","label":"Copyright and Licensing"}}]}}