Abstract
Paraphrasing is a critical issue in many Natural Language Processing (NLP) applications. The traditional Pivot-based methods of extracting paraphrases require a large-scale bilingual parallel corpus. The quality of the extracted paraphrases is affected by the quality of bilingual parallel corpora and word alignment. In this paper, we propose a method for Chinese paraphrases extraction. An online translation system is used to obtain the candidate paraphrases of a word. A deep neural network model combined with cosine similarity is exploited to filter the candidate results through computing the similarity of word vectors between a word and its candidate paraphrase. Experiments are conducted in two ways: (1) The random sampling is employed to manually verify the correctness of the paraphrases results. The effect has been significantly improved; (2) We design two Question Answering (QA) systems based on the NLPCC2016 Document Based Question Answering (DBQA) corpus. One uses the BM25 model to retrieve the candidate answer sentences, and another uses the Convolution Neural Network (CNN) model. Extracted paraphrases are quite effective in question reformulation, enhancing the MRR from 56.33% to 60.21% (BM25) and from 63.82% to 66.60% (CNN) with the questions of NLPCC 2016 DBQA corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
References
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 597–604. Association for Computational Linguistics (2005)
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Advances in Automatic Text Summarization, pp. 111–121 (1999)
Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 16–23. Association for Computational Linguistics (2003)
Barzilay, R., McKeown, K.R.: Extracting paraphrases from a parallel corpus. In: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, ACL 2001, pp. 50–57. Association for Computational Linguistics, Stroudsburg (2001). https://doi.org/10.3115/1073012.1073020
Bhagat, R., Ravichandran, D.: Large scale acquisition of paraphrases for learning surface patterns. In: Proceedings of ACL-08: HLT, pp. 674–682 (2008)
Bolshakov, I.A., Gelbukh, A.: Synonymous paraphrasing using wordnet and internet. In: Meziane, F., Métais, E. (eds.) NLDB 2004. LNCS, vol. 3136, pp. 312–323. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27779-8_27
Chan, T.P., Callison-Burch, C., Van Durme, B.: Reranking bilingually extracted paraphrases using monolingual distributional similarity. In: Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pp. 33–42. Association for Computational Linguistics (2011)
Ganitkevitch, J., Callison-Burch, C.: The multilingual paraphrase database. In: LREC, pp. 4276–4283 (2014)
Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB: the paraphrase database. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 758–764 (2013)
Ho, C., Murad, M.A.A., Doraisamy, S., Kadir, R.A.: Extracting lexical and phrasal paraphrases: a review of the literature. Artif. Intell. Rev. 42(4), 851–894 (2014)
Kauchak, D., Barzilay, R.: Paraphrasing for automatic evaluation. In: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 455–462. Association for Computational Linguistics (2006)
Lin, D., Pantel, P.: DIRT@ SBT@ discovery of inference rules from text. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–328. ACM (2001)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Nakagawa, H., Masuda, H.: Extracting paraphrases of Japanese action word of sentence ending part from web and mobile news articles. In: Myaeng, S.H., Zhou, M., Wong, K.-F., Zhang, H.-J. (eds.) AIRS 2004. LNCS, vol. 3411, pp. 94–105. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31871-2_9
Tan, M., Santos, C.d., Xiang, B., Zhou, B.: LSTM-based deep learning models for non-factoid answer selection. arXiv preprint arXiv:1511.04108 (2015)
Zhao, S., Wang, H., Liu, T.: Paraphrasing with search engine query logs. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1317–1325. Association for Computational Linguistics (2010)
Zhao, S., Zhou, M., Liu, T.: Learning question paraphrases for QA from Encarta logs. In: IJCAI, pp. 1795–1801 (2007)
Acknowledgments
This work was supported by the National Basic Research Program (973 Program) (Grant No. 2014CB340503) and the National Natural Science Foundation of China (Grant No. 61472105 and 61502120).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, Y., Qi, L., Wang, L., Yu, L., Liu, T. (2018). Beyond Pivot for Extracting Chinese Paraphrases. In: Zhang, S., Liu, TY., Li, X., Guo, J., Li, C. (eds) Information Retrieval. CCIR 2018. Lecture Notes in Computer Science(), vol 11168. Springer, Cham. https://doi.org/10.1007/978-3-030-01012-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-01012-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01011-9
Online ISBN: 978-3-030-01012-6
eBook Packages: Computer ScienceComputer Science (R0)