Abstract
Vocabulary knowledge is essential for both native and foreign language learning. Classifying words by difficulty helps students develop better in different stages of study and gives teachers the standard to adhere to when preparing tutorials. However, classifying word difficulty is time-consuming and labor-intensive. In this paper, we propose to classify and compare the word difficulty by analyzing multi-faceted features, including intra-word, syntactic and semantic features. The results show that our method is robust against different language environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The Corpus of Contemporary American English: https://www.english-corpora.org/coca/.
- 2.
CEFR defines 6 difficulty levels {A1, A2, B1, B2, C1, C2} where A1 represents the minimum difficulty and C2 represents the highest difficulty.
- 3.
- 4.
- 5.
References
Little, D.: The common European framework of reference for languages: a research agenda. Lang. Teach. 44(3), 381–393 (2011)
Breland, H.M.: Word frequency and word difficulty: a comparison of counts in four corpora. Psychol. Sci. 7(2), 96–99 (1996)
Hiebert, E., Scott, J., Castaneda, R., Spichtig, A.: An analysis of the features of words that influence vocabulary difficulty. Educ. Sci. 9(1), 8 (2019)
Koirala, C.: The word frequency effect on second language vocabulary learning. In: Critical CALL-Proceedings of the 2015 EUROCALL Conference, Padova, Italy, p. 318. Research-publishing.net (2015)
Culligan, B.: A comparison of three test formats to assess word difficulty. Lang. Test. 32(4), 503–520 (2015)
Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: LREC, Portorož, Slovenia, pp. 23–28 (2016)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Evan, S.: The New York Times Annotated Corpus LDC2008T19. DVD. Linguistic Data Consortium, Philadelphia (2008)
Lahiri, S.: Complexity of word collocation networks: a preliminary structural analysis. In: Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 96–105. Association for Computational Linguistics, Gothenburg, April 2014. http://www.aclweb.org/anthology/E14-3011
Kominek, J., Black, A.W.: The CMU arctic speech databases. In: Proceedings of the 5th ISCA Speech Synthesis Workshop (SSW5), Pittsburgh, PA, pp. 223–224 (2004)
Koehn, P.: Europarl: a parallel corpus for statistical machine translation. In: MT Summit, vol. 5, pp. 79–86. Citeseer (2005)
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford Corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Nakanishi, K., Kobayashi, N., Shiina, H., Kitagawa, F.: Estimating word difficulty using semantic descriptions in dictionaries and web data. In: 2012 IIAI International Conference on Advanced Applied Informatics, pp. 324–329. IEEE (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, S., Jia, Q., Shen, L., Zhao, Y. (2020). Automatic Classification and Comparison of Words by Difficulty. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_72
Download citation
DOI: https://doi.org/10.1007/978-3-030-63820-7_72
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)