Abstract
Typical Lexical Simplification systems replace single words with simpler alternatives. We introduce the task of Phrase-Level Simplification, a variant of Lexical Simplification where sequences of words are replaced as a whole, allowing for the substitution of compositional expressions. We tackle this task with a novel pipeline approach by generating candidate replacements with lexicon-retrofitted POS-aware phrase embedding models, selecting them through an unsupervised comparison-based method, then ranking them with rankers trained with features that capture phrase simplicity more effectively than other popularly used feature sets. We train and evaluate this approach using BenchPS, a new dataset we created for the task that focuses on annotations on the needs of non-native English speakers. Our methods and resources result in a state-of-the-art phrase simplifier that correctly simplifies complex phrases 61% of the time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Azab, M., Hokamp, C., Mihalcea, R.: Using word semantics to assist English as a second language learners. In: Proceedings of the 2015 NAACL (2015)
Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th ACL, Portland, Oregon, USA, June 2011, pp. 496–501. Association for Computational Linguistics (2011)
Brysbaert, M., New, B.: Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990 (2009)
De Belder, J., Moens, M.-F.: A dataset for the evaluation of lexical simplification. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7182, pp. 426–437. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28601-8_36
Devlin, S.: Simplifying natural language for aphasic readers. Ph.D. thesis, University of Sunderland (1999)
Devlin, S., Tait, J.: The use of a psycholinguistic database in the simplification of text for aphasic readers. Linguist. Databases 161–173 (1998)
Devlin, S., Unthank, G.: Helping aphasic people process online information. In: Proceedings of the 8th SIGACCESS, pp. 225–226 (2006)
Elhadad, N.: Comprehending technical texts: predicting and defining unfamiliar terms. In: Proceedings of the 2006 AMIA (2006)
Elhadad, N., Sutaria, K.: Mining a lexicon of technical terms and lay equivalents. In: Proceedings of the 2007 BioNLP, pp. 49–56 (2007)
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.: Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 NAACL, Denver, Colorado, May–June 2015, pp. 1606–1615. Association for Computational Linguistics (2015)
Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: Proceedings of the 53rd ACL, Beijing, China, July 2015, pp. 63–68. Association for Computational Linguistics (2015)
Horn, C., Manduca, C., Kauchak, D.: Learning a lexical simplifier using Wikipedia. In: Proceedings of the 52nd ACL, Baltimore, Maryland, June 2014, pp. 458–463. Association for Computational Linguistics (2014)
Kauchak, D.: Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st ACL, Sofia, Bulgaria, August 2013, pp. 1537–1546. Association for Computational Linguistics (2013)
Lison, P., Tiedemann, J.: Opensubtitles 2016: extracting large parallel corpora from movie and TV subtitles. In: Proceedings of the 10th LREC (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of 2013 NAACL, pp. 746–751 (2013)
Nunes, B.P., Kawase, R., Siehndel, P., Casanova, M., Dietze, S.: As simple as it gets - a sentence simplifier for different learning levels and contexts. In: Proceedings of the 13th ICALT, pp. 128–132 (2013)
Paetzold, G., Specia, L.: Understanding the lexical simplification needs of non-native speakers of English. In: Proceedings of the 26th COLING, Osaka, Japan, pp. 717–727. The COLING 2016 Organizing Committee (2016)
Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th EACL, pp. 34–40. Association for Computational Linguistics (2017)
Paetzold, G.H.: Lexical simplification for non-native English speakers. Ph.D. thesis, University of Sheffield (2016)
Paetzold, G.H., Specia, L.: LEXenstein: a framework for lexical simplification. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, July 2015, pp. 85–90. Association for Computational Linguistics and The Asian Federation of Natural Language Processing (2015)
Paetzold, G.H., Specia, L.: Benchmarking lexical simplification systems. In: Proceedings of the 10th LREC, Portoroz, Slovenia. European Language Resources Association (ELRA) (2016)
Paetzold, G.H., Specia, L.: Collecting and exploring everyday language for predicting psycholinguistic properties of words. In: Proceedings of the 26th COLING, Osaka, Japan, December 2016, pp. 1669–1679 (2016)
Paetzold, G.H., Specia, L.: SemEval 2016 task 11: complex word identification. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, June 2016, pp. 560–569. Association for Computational Linguistics (2016)
Paetzold, G.H., Specia, L.: Unsupervised lexical simplification for non-native speakers. In: Proceedings of the 13th AAAI, pp. 3761–3767. AAAI Press (2016)
Pavlick, E., Callison-Burch, C.: Simple PPDB: a paraphrase database for simplification. In: Proceedings of the 54th ACL, pp. 143–148 (2016)
Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd ACL, pp. 425–430. Association for Computational Linguistics (2015)
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 EMNLP, pp. 1532–1543 (2014)
Shardlow, M.: A comparison of techniques to automatically identify complex words. In: Proceedings of the 51st ACL Student Research Workshop, pp. 103–109 (2013)
Specia, L.: Translating from complex to simplified sentences. In: Computational Processing of the Portuguese Language, pp. 30–39 (2010)
Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: English lexical simplification. In: Proceedings of the 1st SemEval, Montréal, Canada, pp. 347–355. Association for Computational Linguistics (2012)
Toutanvoa, K., Manning, C.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 SIGDAT, Hong Kong, China, October 2000, pp. 63–70. Association for Computational Linguistics (2000)
Wang, S., Zong, C.: Comparison study on critical components in composition model for phrase representation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(3), 16 (2017)
Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th ACL, pp. 1015–1024 (2012)
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)
Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)
Yin, W., Schütze, H.: An exploration of embeddings for generalized phrases. In: Proceedings of the ACL 2014 Student Research Workshop, Baltimore, Maryland, USA, June 2014, pp. 41–47. Association for Computational Linguistics (2014)
Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: Proceedings of the 2015 AAAI, pp. 2195–2202 (2015)
Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Computational Linguistics, pp. 1353–1361 (2010)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Paetzold, G.H., Specia, L. (2023). Phrase-Level Simplification for Non-native Speakers. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-24337-0_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24336-3
Online ISBN: 978-3-031-24337-0
eBook Packages: Computer ScienceComputer Science (R0)