{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,12]],"date-time":"2024-07-12T12:37:59Z","timestamp":1720787879225},"reference-count":32,"publisher":"Association for Computing Machinery (ACM)","issue":"2","content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Transactions on Asian Language Information Processing"],"published-print":{"date-parts":[[2012,6]]},"abstract":"Bilingual dictionaries can be automatically extended by new translations using comparable corpora. The general idea is based on the assumption that similar words have similar contexts across languages. However, previous studies have mainly focused on Indo-European languages, or use only a bag-of-words model to describe the context. Furthermore, we argue that it is helpful to extract only the statistically significant context, instead of using all context. The present approach addresses these issues in the following manner. First, based on the context of a word with an unknown translation (query word), we extract salient pivot words. Pivot words are words for which a translation is already available in a bilingual dictionary. For the extraction of salient pivot words, we use a Bayesian estimation of the point-wise mutual information to measure statistical significance. In the second step, we match these pivot words across languages to identify translation candidates for the query word. We therefore calculate a similarity score between the query word and a translation candidate using the probability that the same pivots will be extracted for both the query word and the translation candidate. The proposed method uses several context positions, namely, a bag-of-words of one sentence, and the successors, predecessors, and siblings with respect to the dependency parse tree of the sentence. In order to make these context positions comparable across Japanese and English, which are unrelated languages, we use several heuristics to adjust the dependency trees appropriately. We demonstrate that the proposed method significantly increases the accuracy of word translations, as compared to previous methods.<\/jats:p>","DOI":"10.1145\/2184436.2184439","type":"journal-article","created":{"date-parts":[[2012,6,11]],"date-time":"2012-06-11T13:03:21Z","timestamp":1339419801000},"page":"1-31","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":6,"title":["Statistical Extraction and Comparison of Pivot Words for Bilingual Lexicon Extension"],"prefix":"10.1145","volume":"11","author":[{"given":"Daniel","family":"Andrade","sequence":"first","affiliation":[{"name":"University of Tokyo"}]},{"given":"Takuya","family":"Matsuzaki","sequence":"additional","affiliation":[{"name":"University of Tokyo"}]},{"given":"Jun\u2019ichi","family":"Tsujii","sequence":"additional","affiliation":[{"name":"University of Tokyo"}]}],"member":"320","published-online":{"date-parts":[[2012,6]]},"reference":[{"key":"e_1_2_1_1_1","doi-asserted-by":"publisher","DOI":"10.1214\/ss\/1177011454"},{"key":"e_1_2_1_2_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (CL\u201910)","author":"Andrade D.","unstructured":"Andrade , D. , Nasukawa , T. , and Tsujii , J . 2010. Robust measurement and comparison of context similarity for finding translation pairs . In Proceedings of the International Conference on Computational Linguistics (CL\u201910) . 19--27. Andrade, D., Nasukawa, T., and Tsujii, J. 2010. Robust measurement and comparison of context similarity for finding translation pairs. In Proceedings of the International Conference on Computational Linguistics (CL\u201910). 19--27."},{"key":"e_1_2_1_3_1","volume-title":"Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201911)","author":"Andrade D.","unstructured":"Andrade , D. , Matsuzaki , T. , and Tsujii , J . 2011. Effective use of dependency structure for bilingual lexicon creation . In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201911) . Lecture Notes in Computer Science, Springer Verlag, 80--92. Andrade, D., Matsuzaki, T., and Tsujii, J. 2011. Effective use of dependency structure for bilingual lexicon creation. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201911). Lecture Notes in Computer Science, Springer Verlag, 80--92."},{"key":"e_1_2_1_4_1","unstructured":"Bach F. and Jordan M. 2005. A probabilistic interpretation of canonical correlation analysis. Tech. rep. 688 Department of Statistics University of California Berkeley CA. Bach F. and Jordan M. 2005. A probabilistic interpretation of canonical correlation analysis. Tech. rep. 688 Department of Statistics University of California Berkeley CA."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.3115\/1071884.1071904"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.3115\/1072228.1072394"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.5555\/972450.972454"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.5555\/648179.749226"},{"key":"e_1_2_1_10_1","volume-title":"Proceedings of the Conference on Computational Natural Language Learning (CoNLL\u201909)","author":"Garera N.","unstructured":"Garera , N. , Callison-Burch , C. , and Yarowsky , D . 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences . In Proceedings of the Conference on Computational Natural Language Learning (CoNLL\u201909) . 129--137. Garera, N., Callison-Burch, C., and Yarowsky, D. 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In Proceedings of the Conference on Computational Natural Language Learning (CoNLL\u201909). 129--137."},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.3115\/1218955.1219022"},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201908)","author":"Haghighi A.","unstructured":"Haghighi , A. , Liang , P. , Berg-Kirkpatrick , T. , and Klein , D . 2008. Learning bilingual lexicons from monolingual corpora . In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201908) . 771--779. Haghighi, A., Liang, P., Berg-Kirkpatrick, T., and Klein, D. 2008. Learning bilingual lexicons from monolingual corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201908). 771--779."},{"key":"e_1_2_1_13_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (CL\u201910)","author":"Ismail A.","unstructured":"Ismail , A. and Manandhar , S . 2010. Bilingual lexicon extraction from comparable corpora using indomain terms . In Proceedings of the International Conference on Computational Linguistics (CL\u201910) . 481--489. Ismail, A. and Manandhar, S. 2010. Bilingual lexicon extraction from comparable corpora using indomain terms. In Proceedings of the International Conference on Computational Linguistics (CL\u201910). 481--489."},{"key":"e_1_2_1_14_1","volume-title":"Trading recall for precision with confidence-sets. Tech. rep","author":"Johnson M.","unstructured":"Johnson , M. 2001. Trading recall for precision with confidence-sets. Tech. rep ., Brown University . Johnson, M. 2001. Trading recall for precision with confidence-sets. Tech. rep., Brown University."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.3115\/1118627.1118629"},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (CL\u201910)","author":"Laroche A.","unstructured":"Laroche , A. and Langlais , P . 2010. Revisiting context-based projection methods for term-translation spotting in comparable corpora . In Proceedings of the International Conference on Computational Linguistics (CL\u201910) . 617--625. Laroche, A. and Langlais, P. 2010. Revisiting context-based projection methods for term-translation spotting in comparable corpora. In Proceedings of the International Conference on Computational Linguistics (CL\u201910). 617--625."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of the International Conference on Computational Linguistics (CL\u201910)","author":"Laws F.","unstructured":"Laws , F. , Michelbacher , L. , Dorow , B. , Scheible , C. , Heid , U. , and Sch\u00fctze , H . 2010. A linguistically grounded graph model for bilingual lexicon extraction . In Proceedings of the International Conference on Computational Linguistics (CL\u201910) . 614--622. Laws, F., Michelbacher, L., Dorow, B., Scheible, C., Heid, U., and Sch\u00fctze, H. 2010. A linguistically grounded graph model for bilingual lexicon extraction. In Proceedings of the International Conference on Computational Linguistics (CL\u201910). 614--622."},{"key":"e_1_2_1_18_1","unstructured":"Manning C. and Sch\u00fctze H. 2002. Foundations of Statistical Natural Language Processing. MIT Press. Manning C. and Sch\u00fctze H. 2002. Foundations of Statistical Natural Language Processing . MIT Press."},{"key":"e_1_2_1_19_1","doi-asserted-by":"publisher","DOI":"10.3115\/1219840.1219852"},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.3115\/1220575.1220586"},{"key":"e_1_2_1_21_1","volume-title":"Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201907)","author":"Morin E.","unstructured":"Morin , E. , Daille , B. , Takeuchi , K. , and Kageura , K . 2007. Bilingual terminology mining-using brain, not brawn comparable corpora . In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201907) . 45, 664--671. Morin, E., Daille, B., Takeuchi, K., and Kageura, K. 2007. Bilingual terminology mining-using brain, not brawn comparable corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL\u201907). 45, 664--671."},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908)","author":"Okazaki N.","unstructured":"Okazaki , N. , Tsuruoka , Y. , Ananiadou , S. , and Tsujii , J . 2008. A discriminative candidate generator for string transformations . In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908) . 447--456. Okazaki, N., Tsuruoka, Y., Ananiadou, S., and Tsujii, J. 2008. A discriminative candidate generator for string transformations. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP\u201908). 447--456."},{"key":"e_1_2_1_23_1","volume-title":"Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201908)","author":"Otero P.","unstructured":"Otero , P. and Campos , J . 2008. Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary . In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201908) . 423--433. Otero, P. and Campos, J. 2008. Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary. In Proceedings of the International Conference on Computational Linguistics and Intelligent Text Processing (CICLing\u201908). 423--433."},{"key":"e_1_2_1_24_1","doi-asserted-by":"publisher","DOI":"10.1007\/s10590-007-9029-7"},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval (CLTSR\u201997)","author":"Peters C.","unstructured":"Peters , C. and Picchi , E . 1997. Using linguistic tools and resources in cross-language retrieval . In Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval (CLTSR\u201997) . 179--188. Peters, C. and Picchi, E. 1997. Using linguistic tools and resources in cross-language retrieval. In Proceedings of the AAAI Symposium on Cross-Language Text and Speech Retrieval (CLTSR\u201997). 179--188."},{"key":"e_1_2_1_26_1","doi-asserted-by":"publisher","DOI":"10.5555\/646171.678755"},{"key":"e_1_2_1_27_1","doi-asserted-by":"publisher","DOI":"10.1080\/03610920008832632"},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.3115\/1034678.1034756"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1016\/S0010-4825(03)00019-2"},{"key":"e_1_2_1_30_1","volume-title":"Proceedings of the Annual Conference on Language Resources and Evaluation (LREC\u201908)","author":"Saralegi X.","unstructured":"Saralegi , X. , San Vicente , I. , and Gurrutxaga , A . 2008. Automatic extraction of bilingual terms from comparable corpora in a popular science domain . In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC\u201908) . 27--32. Saralegi, X., San Vicente, I., and Gurrutxaga, A. 2008. Automatic extraction of bilingual terms from comparable corpora in a popular science domain. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC\u201908). 27--32."},{"key":"e_1_2_1_31_1","doi-asserted-by":"publisher","DOI":"10.1007\/11573036_36"},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3115\/1067807.1067854"},{"key":"e_1_2_1_33_1","doi-asserted-by":"crossref","DOI":"10.1093\/oso\/9780195315103.001.0001","volume-title":"Basic Statistics: Understanding Conventional Methods and Modern Insights","author":"Wilcox R.","year":"2009","unstructured":"Wilcox , R. 2009 . Basic Statistics: Understanding Conventional Methods and Modern Insights . Oxford University Press. Wilcox, R. 2009. Basic Statistics: Understanding Conventional Methods and Modern Insights. Oxford University Press."}],"container-title":["ACM Transactions on Asian Language Information Processing"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2184436.2184439","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2024,4,25]],"date-time":"2024-04-25T06:43:21Z","timestamp":1714027401000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2184436.2184439"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2012,6]]},"references-count":32,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2012,6]]}},"alternative-id":["10.1145\/2184436.2184439"],"URL":"https:\/\/doi.org\/10.1145\/2184436.2184439","relation":{},"ISSN":["1530-0226","1558-3430"],"issn-type":[{"value":"1530-0226","type":"print"},{"value":"1558-3430","type":"electronic"}],"subject":[],"published":{"date-parts":[[2012,6]]},"assertion":[{"value":"2011-03-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2011-08-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2012-06-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}