Phrase-Level Simplification for Non-native Speakers | SpringerLink
Skip to main content

Phrase-Level Simplification for Non-native Speakers

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2019)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13451))

Abstract

Typical Lexical Simplification systems replace single words with simpler alternatives. We introduce the task of Phrase-Level Simplification, a variant of Lexical Simplification where sequences of words are replaced as a whole, allowing for the substitution of compositional expressions. We tackle this task with a novel pipeline approach by generating candidate replacements with lexicon-retrofitted POS-aware phrase embedding models, selecting them through an unsupervised comparison-based method, then ranking them with rankers trained with features that capture phrase simplicity more effectively than other popularly used feature sets. We train and evaluate this approach using BenchPS, a new dataset we created for the task that focuses on annotations on the needs of non-native English speakers. Our methods and resources result in a state-of-the-art phrase simplifier that correctly simplifies complex phrases 61% of the time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://ebiquity.umbc.edu/resource/html/id/351.

  2. 2.

    http://www.statmt.org/wmt11/translation-task.html.

  3. 3.

    https://www.google.co.uk/forms/about.

  4. 4.

    http://ghpaetzold.github.io/data/BenchPS.zip.

References

  1. Azab, M., Hokamp, C., Mihalcea, R.: Using word semantics to assist English as a second language learners. In: Proceedings of the 2015 NAACL (2015)

    Google Scholar 

  2. Biran, O., Brody, S., Elhadad, N.: Putting it simply: a context-aware approach to lexical simplification. In: Proceedings of the 49th ACL, Portland, Oregon, USA, June 2011, pp. 496–501. Association for Computational Linguistics (2011)

    Google Scholar 

  3. Brysbaert, M., New, B.: Moving beyond Kučera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behav. Res. Methods 41, 977–990 (2009)

    Article  Google Scholar 

  4. De Belder, J., Moens, M.-F.: A dataset for the evaluation of lexical simplification. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7182, pp. 426–437. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28601-8_36

    Chapter  Google Scholar 

  5. Devlin, S.: Simplifying natural language for aphasic readers. Ph.D. thesis, University of Sunderland (1999)

    Google Scholar 

  6. Devlin, S., Tait, J.: The use of a psycholinguistic database in the simplification of text for aphasic readers. Linguist. Databases 161–173 (1998)

    Google Scholar 

  7. Devlin, S., Unthank, G.: Helping aphasic people process online information. In: Proceedings of the 8th SIGACCESS, pp. 225–226 (2006)

    Google Scholar 

  8. Elhadad, N.: Comprehending technical texts: predicting and defining unfamiliar terms. In: Proceedings of the 2006 AMIA (2006)

    Google Scholar 

  9. Elhadad, N., Sutaria, K.: Mining a lexicon of technical terms and lay equivalents. In: Proceedings of the 2007 BioNLP, pp. 49–56 (2007)

    Google Scholar 

  10. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.: Retrofitting word vectors to semantic lexicons. In: Proceedings of the 2015 NAACL, Denver, Colorado, May–June 2015, pp. 1606–1615. Association for Computational Linguistics (2015)

    Google Scholar 

  11. Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)

    Google Scholar 

  12. Glavaš, G., Štajner, S.: Simplifying lexical simplification: do we need simplified corpora? In: Proceedings of the 53rd ACL, Beijing, China, July 2015, pp. 63–68. Association for Computational Linguistics (2015)

    Google Scholar 

  13. Horn, C., Manduca, C., Kauchak, D.: Learning a lexical simplifier using Wikipedia. In: Proceedings of the 52nd ACL, Baltimore, Maryland, June 2014, pp. 458–463. Association for Computational Linguistics (2014)

    Google Scholar 

  14. Kauchak, D.: Improving text simplification language modeling using unsimplified text data. In: Proceedings of the 51st ACL, Sofia, Bulgaria, August 2013, pp. 1537–1546. Association for Computational Linguistics (2013)

    Google Scholar 

  15. Lison, P., Tiedemann, J.: Opensubtitles 2016: extracting large parallel corpora from movie and TV subtitles. In: Proceedings of the 10th LREC (2016)

    Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  17. Mikolov, T., Yih, W.-T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of 2013 NAACL, pp. 746–751 (2013)

    Google Scholar 

  18. Nunes, B.P., Kawase, R., Siehndel, P., Casanova, M., Dietze, S.: As simple as it gets - a sentence simplifier for different learning levels and contexts. In: Proceedings of the 13th ICALT, pp. 128–132 (2013)

    Google Scholar 

  19. Paetzold, G., Specia, L.: Understanding the lexical simplification needs of non-native speakers of English. In: Proceedings of the 26th COLING, Osaka, Japan, pp. 717–727. The COLING 2016 Organizing Committee (2016)

    Google Scholar 

  20. Paetzold, G., Specia, L.: Lexical simplification with neural ranking. In: Proceedings of the 15th EACL, pp. 34–40. Association for Computational Linguistics (2017)

    Google Scholar 

  21. Paetzold, G.H.: Lexical simplification for non-native English speakers. Ph.D. thesis, University of Sheffield (2016)

    Google Scholar 

  22. Paetzold, G.H., Specia, L.: LEXenstein: a framework for lexical simplification. In: Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, July 2015, pp. 85–90. Association for Computational Linguistics and The Asian Federation of Natural Language Processing (2015)

    Google Scholar 

  23. Paetzold, G.H., Specia, L.: Benchmarking lexical simplification systems. In: Proceedings of the 10th LREC, Portoroz, Slovenia. European Language Resources Association (ELRA) (2016)

    Google Scholar 

  24. Paetzold, G.H., Specia, L.: Collecting and exploring everyday language for predicting psycholinguistic properties of words. In: Proceedings of the 26th COLING, Osaka, Japan, December 2016, pp. 1669–1679 (2016)

    Google Scholar 

  25. Paetzold, G.H., Specia, L.: SemEval 2016 task 11: complex word identification. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, California, June 2016, pp. 560–569. Association for Computational Linguistics (2016)

    Google Scholar 

  26. Paetzold, G.H., Specia, L.: Unsupervised lexical simplification for non-native speakers. In: Proceedings of the 13th AAAI, pp. 3761–3767. AAAI Press (2016)

    Google Scholar 

  27. Pavlick, E., Callison-Burch, C.: Simple PPDB: a paraphrase database for simplification. In: Proceedings of the 54th ACL, pp. 143–148 (2016)

    Google Scholar 

  28. Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B., Callison-Burch, C.: PPDB 2.0: better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In: Proceedings of the 53rd ACL, pp. 425–430. Association for Computational Linguistics (2015)

    Google Scholar 

  29. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  30. Shardlow, M.: A comparison of techniques to automatically identify complex words. In: Proceedings of the 51st ACL Student Research Workshop, pp. 103–109 (2013)

    Google Scholar 

  31. Specia, L.: Translating from complex to simplified sentences. In: Computational Processing of the Portuguese Language, pp. 30–39 (2010)

    Google Scholar 

  32. Specia, L., Jauhar, S.K., Mihalcea, R.: SemEval-2012 task 1: English lexical simplification. In: Proceedings of the 1st SemEval, Montréal, Canada, pp. 347–355. Association for Computational Linguistics (2012)

    Google Scholar 

  33. Toutanvoa, K., Manning, C.: Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In: Proceedings of the 2000 SIGDAT, Hong Kong, China, October 2000, pp. 63–70. Association for Computational Linguistics (2000)

    Google Scholar 

  34. Wang, S., Zong, C.: Comparison study on critical components in composition model for phrase representation. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(3), 16 (2017)

    Article  MathSciNet  Google Scholar 

  35. Wubben, S., van den Bosch, A., Krahmer, E.: Sentence simplification by monolingual machine translation. In: Proceedings of the 50th ACL, pp. 1015–1024 (2012)

    Google Scholar 

  36. Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. Assoc. Comput. Linguist. 3, 283–297 (2015)

    Article  Google Scholar 

  37. Xu, W., Napoles, C., Pavlick, E., Chen, Q., Callison-Burch, C.: Optimizing statistical machine translation for text simplification. Trans. Assoc. Comput. Linguist. 4, 401–415 (2016)

    Article  Google Scholar 

  38. Yin, W., Schütze, H.: An exploration of embeddings for generalized phrases. In: Proceedings of the ACL 2014 Student Research Workshop, Baltimore, Maryland, USA, June 2014, pp. 41–47. Association for Computational Linguistics (2014)

    Google Scholar 

  39. Zhao, Y., Liu, Z., Sun, M.: Phrase type sensitive tensor indexing model for semantic composition. In: Proceedings of the 2015 AAAI, pp. 2195–2202 (2015)

    Google Scholar 

  40. Zhu, Z., Bernhard, D., Gurevych, I.: A monolingual tree-based translation model for sentence simplification. In: Computational Linguistics, pp. 1353–1361 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gustavo H. Paetzold or Lucia Specia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Paetzold, G.H., Specia, L. (2023). Phrase-Level Simplification for Non-native Speakers. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2019. Lecture Notes in Computer Science, vol 13451. Springer, Cham. https://doi.org/10.1007/978-3-031-24337-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24337-0_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24336-3

  • Online ISBN: 978-3-031-24337-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics