Abstract
We describe novel approaches to tackling the problem of natural language processing for low-resource languages. The approaches are embodied in systems for name tagging and machine translation (MT) that we constructed to participate in the NIST LoReHLT evaluation in 2016. Our methods include universal tools, rapid resource and knowledge acquisition, rapid language projection, and joint methods for MT and name tagging.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
References
Alvarez A, Levin L, Frederking R, Good J, Peterson E (2005) Semi-automated elicitation corpus generation. In: Proceedings of MT Summit X
Baldwin T, Pool J, Colowick S (2010) PanLex and LEXTRACT: translating all words of all languages of the world. In: Proceedings of the 23rd international conference on computational linguistics
Bond F, Paik K (2012) A survey of Wordnets and their licenses. In: Proceedings of the 6th global WordNet conference
Chiu JP, Nichols E (2016) Named entity recognition with bidirectional LSTM-CNNs. Trans ACI. arXiv:1511.08308
Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Helsinki University of Technology, Helsinki
Dryer MS, Haspelmath M (eds) (2013) WALS Online
Engesath T, Yakup M, Dwyer A (2009) Greetings from the Teklimakan: a handbook of modern Uyghur. University of Kansas Scholarworks, Lawrence
Ge T, Dou Q, Pan X, Ji H, Cui L, Chang B, Sui Z, Zhou M (2015) Aligning coordinated text streams through burst information network construction and decipherment. In: arXiv preprint arXiv:1609.08237
Graves A, Jaitly N, Mohamed Ar (2013) Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, pp 273–278
Grishman R, Sundheim B (1996) Message understanding conference-6: a brief history. In: Proceedings of COLING
Heafield K, Lavie A (2010) Combining machine translation output with open source: the Carnegie Mellon multi-engine machine translation scheme. Prague Bull. Math. Linguist. 93:27–36
Ji H (2009) Mining name translations from comparable corpora by creating bilingual information networks. In: Proceedings of ACL-IJCNLP workshop on building and using comparable corpora
Ji H, Grishman R (2007) Collaborative entity extraction and translation. In: Proceedings of international conference on recent advances in natural language processing
Ji H, Grishman R (2011) Knowledge base population: Successful approaches and challenges. In: Proceedings of ACL
Jiampojamarn S, Bhargava A, Dou Q, Dwyer K, Kondrak G (2009) Directl: A language-independent approach to transliteration. In: Proceedings of named entities workshop
Kamholz D, Pool J, Colowick S (2014) Panlex: building a resource for panlingual lexical translation. In: Proceedings of the ninth international conference on language resources and evaluation
Lample G, Ballesteros M, Kawakami K, Subramanian S, Dyer C (2016) Neural architectures for named entity recognition. In: Proceedings the 2016 conference of the North American chapter of the association for computational linguistics—human language technologies (NAACL-HLT 2016)
Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: Proceedings of NAACL, pp 104–111
Lin Y, Pan X, Deri A, Ji H, Knight K (2016) Leveraging entity linking and related language projection to improve name transliteration. In: Proceedings of ACL workshop on named entities
Lu D, Pan X, Pourdamghani N, Chang SF, Ji H, Knight K (2016) A multi-media approach to cross-lingual entity knowledge transfer. In: Proceedings of ACI
de Melo G (2014) Etymological wordnet: tracing the history of words. In: Proceeddings of the conference on language resources
de Melo G, Weikum G (2009) Towards a universal Wordnet by learning from combined evidence. In: Proceedings of The conference on information and knowledge management
de Melo G, Weikum G (2010) Towards universal multilingual knowledge bases. In: Proceedings of the 5th global Wordnet conference
NIST (2005) http://www.itl.nist.gov/iad/mig/tests/ace/2005/
Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of ACL
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1):19–51
Pan X, Cassidy T, Hermjakob U, Ji H, Knight K (2015) Unsupervised entity linking with abstract meaning representation. In: Proceedings of NAACL-HLT
Pan X, Zhang B, May J, Nothman J, Knight K, Ji H (2017) Cross-lingual name tagging and linking for 282 languages. In: Proceedings of ACL
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318
Pourdamghani N, Knight K (2017) Deciphering related languages. In: Proceedings of EMNLP
Probst K, Brown RD, Carbonell JG, Lavie A, Levin L (2001) Design and implementation of controlled elicitation for machine translation of low-density languages. In: Machine Translation Summit VIII
Searle JR (1980) Minds, brains, and programs. Behav Brain Sci 3(03):417–424
Tiimiir H, Lee A (2003) Modern Uyghur grammar (morphology). Yildiz, Istanbul
Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the conference on computer vision and pattern recognition
Yu D, Pan X, Zhang B, Huang L, Lu D, Whitehead S, Ji H (2016) RPI_BLENDER TAC-KBP2016 system description. In: Proceedings of text analysis conference (TAC2016)
Zakir H (2010) Introduction to modern Uighur. H. Zakir, New York
Zhang B, Pan X, Wang T, Vaswani A, Ji H, Knight K, Marcu D (2016) Name tagging for low-resource incident languages based on expectation-driven learning. In: Proceedings of NAACL-HLT
Acknowledgements
We would like to thank other ELISA team members who contributed to resource construction and system preparation before the evaluation: Chris Callison-Burch (UPenn), Aliya Deri (USC) and Ashish Vaswani (Google). We thank Billy Wagner from Next Century for running the LTDE to produce name tagging runs. This work was supported by the U.S. Defense Advanced Research Projects Agency (DARPA) LORELEI Program No. HR0011-15-C-0115 and ARL/ARO MURI W911NF-10-1-0533. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hermjakob, U., Li, Q., Marcu, D. et al. Incident-Driven Machine Translation and Name Tagging for Low-resource Languages. Machine Translation 32, 59–89 (2018). https://doi.org/10.1007/s10590-017-9207-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-017-9207-1