Publication Date
In 2025 | 1 |
Since 2024 | 7 |
Since 2021 (last 5 years) | 22 |
Since 2016 (last 10 years) | 58 |
Since 2006 (last 20 years) | 107 |
Descriptor
Source
Author
Attali, Yigal | 4 |
Kantor, Robert | 3 |
Lee, Yong-Won | 3 |
Ben-Simon, Anat | 2 |
Coniam, David | 2 |
Crossley, Scott A. | 2 |
Deane, Paul | 2 |
Gentile, Claudia | 2 |
Johnson, Robert L. | 2 |
Lenhard, Wolfgang | 2 |
McNamara, Danielle S. | 2 |
More ▼ |
Publication Type
Journal Articles | 134 |
Reports - Research | 102 |
Reports - Evaluative | 18 |
Tests/Questionnaires | 13 |
Reports - Descriptive | 8 |
Opinion Papers | 6 |
Speeches/Meeting Papers | 2 |
Guides - General | 1 |
Information Analyses | 1 |
Non-Print Media | 1 |
Education Level
Higher Education | 55 |
Postsecondary Education | 42 |
Secondary Education | 13 |
Elementary Education | 7 |
Middle Schools | 7 |
High Schools | 6 |
Grade 8 | 5 |
Junior High Schools | 4 |
Elementary Secondary Education | 3 |
Grade 11 | 2 |
Grade 12 | 2 |
More ▼ |
Audience
Practitioners | 3 |
Teachers | 3 |
Location
China | 8 |
Iran | 7 |
Turkey | 5 |
Germany | 3 |
Hong Kong | 3 |
United Kingdom (England) | 3 |
Australia | 2 |
Delaware | 2 |
Indonesia | 2 |
New Jersey | 2 |
Taiwan | 2 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Abbas, Mohsin; van Rosmalen, Peter; Kalz, Marco – IEEE Transactions on Learning Technologies, 2023
For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological, and semantic features) can be used to provide formative feedback to the students in higher education. In this study, the goal was to identify a sufficient number of features that exhibit a fair proxy of the scores given by the human raters…
Descriptors: Feedback (Response), Automation, Essays, Scoring
Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024
Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…
Descriptors: Scoring, Essays, Writing Evaluation, Memory
Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment
Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022
The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…
Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Beseiso, Majdi; Alzubi, Omar A.; Rashaideh, Hasan – Journal of Computing in Higher Education, 2021
E-learning is gradually gaining prominence in higher education, with universities enlarging provision and more students getting enrolled. The effectiveness of automated essay scoring (AES) is thus holding a strong appeal to universities for managing an increasing learning interest and reducing costs associated with human raters. The growth in…
Descriptors: Automation, Scoring, Essays, Writing Tests
Elif Sari – International Journal of Assessment Tools in Education, 2024
Employing G-theory and rater interviews, the study investigated how a high-stakes writing assessment procedure (i.e., a single-task, single-rater, and holistic scoring procedure) impacted the variability and reliability of its scores within the Turkish higher education context. Thirty-two essays written on two different writing tasks (i.e.,…
Descriptors: Foreign Countries, High Stakes Tests, Writing Evaluation, Scores
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024
In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…
Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)
Kumar, Vivekanandan S.; Boulanger, David – International Journal of Artificial Intelligence in Education, 2021
This article investigates the feasibility of using automated scoring methods to evaluate the quality of student-written essays. In 2012, Kaggle hosted an Automated Student Assessment Prize contest to find effective solutions to automated testing and grading. This article: a) analyzes the datasets from the contest -- which contained hand-graded…
Descriptors: Automation, Scoring, Essays, Writing Evaluation
Ramon-Casas, Marta; Nuño, Neus; Pons, Ferran; Cunillera, Toni – Assessment & Evaluation in Higher Education, 2019
This article presents an empirical evaluation of the validity and reliability of a peer-assessment activity to improve academic writing competences. Specifically, we explored a large group of psychology undergraduate students with different initial writing skills. Participants (n = 365) produced two different essays, which were evaluated by their…
Descriptors: Peer Evaluation, Validity, Reliability, Writing Skills
Pruchnic, Jeff; Barton, Ellen; Primeau, Sarah; Trimble, Thomas; Varty, Nicole; Foster, Tanina – Composition Forum, 2021
Over the past two decades, reflective writing has occupied an increasingly prominent position in composition theory, pedagogy, and assessment as researchers have described the value of reflection and reflective writing in college students' development of higher-order writing skills, such as genre conventions (Yancey, "Reflection";…
Descriptors: Reflection, Correlation, Essays, Freshman Composition
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Walland, Emma – Research Matters, 2022
In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…
Descriptors: Essays, Grading, Writing Evaluation, Evaluators
Shabani, Enayat A.; Panahi, Jaleh – Language Testing in Asia, 2020
The literature on using scoring rubrics in writing assessment denotes the significance of rubrics as practical and useful means to assess the quality of writing tasks. This study tries to investigate the agreement among rubrics endorsed and used for assessing the essay writing tasks by the internationally recognized tests of English language…
Descriptors: Writing Evaluation, Scoring Rubrics, Scores, Interrater Reliability
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring