Abstract
The emergence of neural machine translation techniques has opened up a new era for developing translation systems. However, it requires a very large amount of parallel corpus, which is scarce for many under-resourced languages, e.g., Bangla. In order to develop a corpus, currently, there is a lack of publicly available collaborative system. In this paper, we report an online collaborative system for the development of the parallel corpus. The system is developed for supporting any language, however, we only evaluated for developing Bangla–English parallel corpus. In a task completion evaluation experiment, the system outperforms the widely used offline system, i.e., OmegaT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
TM is a database consisting of source and target language pairs. It is usually stored in the database while translating the text corpus by the translators.
- 6.
Fuzzy matching is an approximate matching approach that tries to find a segment of matched translation by matching them with previously translated sentences. The segment can be a phrase or the whole sentence.
References
Ahmed S, Rahman MO, Pir SR, Mottalib M, Islam MS (2003) A new approach towards the development of English to Bangla machine translation system. In: International conference on computer information and technology (ICCIT). pp 360–364
Allen IE, Seaman CA (2007) Likert scales and data analyses. Qual Prog 40(7):64–65
Asaduzzaman S, Ali MM (2003) Transfer machine translation-an experience with Bangla English machine translation system. In: Proceedings of the international conference on computer and information technology (ICCIT). Bangladesh
Ashrafi SS, Kabir MH, Anwar MM, Noman A (2013) English to Bangla machine translation system using context-free grammars. Int J Comput Sci Issues (IJCSI) 10(3):144
Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2):263–311
Chiang D (2005) A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 263–270
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv:1409.1259
Das S, Mitra P (2011) A rule-based approach of stemming for inflectional and derivational words in Bengali. In: 2011 IEEE Students’ technology symposium (TechSym). IEEE, pp 134–136
Dobrišek S, Žibert J, Pavešić N, Mihelič F (2008) An edit-distance model for the approximate matching of timed strings. IEEE Trans Pattern Anal Mach Intell 4:736–741
Escartín CP (2012) Design and compilation of a specialized Spanish-German parallel corpus. In: LREC. pp 2199–2206
Harshawardhan R, Augustine MS, Soman K (2011) Phrase based English-Tamil translation system by concept labeling using translation memory. Int J Comput Appl 20(3):1–6
Hummel J, Knyphausen I (2006) Method and apparatus for processing source information based on source placeable elements. US Patent 7,020,601, 28 Mar 2006
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. MT summit 5:79–86
Koehn P (2009) Statistical machine translation. Cambridge University Press, Cambridge
Koehn P, Hoang H, Birch A., Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, et al (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the ACL on interactive poster and demonstration sessions. Association for Computational Linguistics, pp 177–180
Koehn P, Senellart J (2010) Convergence of translation memory and statistical machine translation. In: Proceedings of AMTA workshop on MT research and the translation industry. pp 21–31
Lagoudaki E (2006) Translation memories survey 2006: users perceptions around tm use. In: proceedings of the ASLIB international conference translating and the computer, vol 28. pp 1–29
Mahmud MR, Afrin M, Razzaque MA, Miller E, Iwashige J (2014) A rule based Bengali stemmer. In: 2014 International conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2750–2756
Nielsen J (1994) Usability inspection methods. In: Conference companion on human factors in computing systems. ACM, pp 413–414
Ruiz Yepes G, et al (2011) Parallel corpora in translator education
Skadiņš R, Puriņš M, Skadiņa I, Vasiļjevs A (2011) Evaluation of SMT in localization to under-resourced inflected. In: 15th international conference of the European association for machine translation. pp 35–40
Somers H (2003) Translation memory systems. Benjamins Transl Libr 35:31–48
Ummi RS (2013) A rule-based stemmer for Bangla verbs. PhD thesis, Independent University
Wharton C (1994) The cognitive walkthrough method: a practitioner’s guide. Usability inspection methods
Zampieri M, Vela M (2014) Quantifying the influence of MT output in the translators’ performance: a case study in technical translation. In: Proceedings of the EACL 2014 workshop on humans and computer-assisted translation. pp 93–98
Acknowledgements
We would like to extend our sincere thanks to A S M Humayun Morshed from Daffodil International University and students from Department of English of the same University for helping us with the data collection task. We also would like to thank our anonymous reviewers for their detailed and constructive comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hasan, M.A., Alam, F., Noori, S.R.H. (2020). A Collaborative Platform to Collect Data for Developing Machine Translation Systems. In: Uddin, M.S., Bansal, J.C. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-13-7564-4_35
Download citation
DOI: https://doi.org/10.1007/978-981-13-7564-4_35
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7563-7
Online ISBN: 978-981-13-7564-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)