Automated language essay scoring systems: A literature review
- Published
- Accepted
- Subject Areas
- Artificial Intelligence, Computer Education
- Keywords
- AES, Automated Essay Scoring, Essay Grading, handcrafted features, Automatic Features Extraction
- Copyright
- © 2019 Hussein et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2019. Automated language essay scoring systems: A literature review. PeerJ Preprints 7:e27715v1 https://doi.org/10.7287/peerj.preprints.27715v1
Abstract
Background. Writing composition is a significant factor for measuring test-takers’ ability in any language exam. However, the assessment (scoring) of these writing compositions or essays is a very challenging process in terms of reliability and time. The need for objective and quick scores has raised the need for a computer system that can automatically grade essay questions targeting specific prompt. Automated Essay Scoring (AES) systems are used to overcome the challenges of scoring writing tasks by using Natural Language Processing and Machine Learning techniques. The purpose of this paper is to review the literature for the AES systems used for grading the essay questions.
Methodology. We have reviewed the existing literature using Google Scholar, EBSCO and ERIC to search the terms “AES”, “Automated Essay Scoring”, “Automated Essay Grading”, or “Automatic Essay”, and two categories have been identified: handcrafted features and automatic featuring AES systems. The systems of the first category are closely bonded to the quality of the designed features. On the other hand, the systems of the other category are based on the automatic learning of the features and relations between an essay and its score without any handcrafted features. We reviewed the systems of the two categories in terms of system primary focus, technique(s) used in the system, training data (y/n), instructional application (feedback system), and the correlation between e-scores and human scores. The paper is composed of three main sections. Firstly, we present a structured literature review of the available Handcrafted Features AES systems. Secondly, we present a structured literature review of the available Automatic Featuring AES systems. Finally, we draw a set of discussions and conclusions.
Results. AES models have been found to utilize a broad range of manually-tuned shallow and deep linguistic features. AES systems have many strengths in reducing labour-intensive marking activities, ensuring a consistent application of marking criteria, and facilitating equity in scoring. Although many techniques have been implemented to improve the AES systems, three primary challenges have been concluded: they lack the sense of the rater as a person, they can be tricked into assigning a lower or higher score to an essay than it deserved or not, and they cannot assess the creativity of the ideas and propositions and evaluating their practicality. Many techniques have been used to address the first two challenges only.
Author Comment
This is a submission to PeerJ Computer Science for review.