ISCA Archive - Named entity translation using anchor texts
ISCA Archive IWSLT 2011
ISCA Archive IWSLT 2011

Named entity translation using anchor texts

Wang Ling, Pável Calado, Bruno Martins, Isabel Trancoso, Alan Black, Luísa Coheur

This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.