Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 11 |
Since 2016 (last 10 years) | 26 |
Since 2006 (last 20 years) | 46 |
Descriptor
Essays | 64 |
Reliability | 64 |
Writing Evaluation | 27 |
Validity | 26 |
Foreign Countries | 23 |
Scoring | 23 |
Comparative Analysis | 15 |
Second Language Learning | 15 |
Scores | 14 |
English (Second Language) | 13 |
Higher Education | 13 |
More ▼ |
Source
Author
Publication Type
Journal Articles | 51 |
Reports - Research | 44 |
Reports - Descriptive | 9 |
Speeches/Meeting Papers | 5 |
Reports - Evaluative | 4 |
Tests/Questionnaires | 4 |
Opinion Papers | 3 |
Guides - General | 1 |
Non-Print Media | 1 |
Education Level
Audience
Location
United Kingdom (England) | 4 |
Australia | 3 |
Turkey | 3 |
China | 2 |
Connecticut | 2 |
Iran | 2 |
New Hampshire | 2 |
New York | 2 |
Rhode Island | 2 |
Vermont | 2 |
California | 1 |
More ▼ |
Laws, Policies, & Programs
Every Student Succeeds Act… | 2 |
Assessments and Surveys
What Works Clearinghouse Rating
Dadi Ramesh; Suresh Kumar Sanampudi – European Journal of Education, 2024
Automatic essay scoring (AES) is an essential educational application in natural language processing. This automated process will alleviate the burden by increasing the reliability and consistency of the assessment. With the advances in text embedding libraries and neural network models, AES systems achieved good results in terms of accuracy.…
Descriptors: Scoring, Essays, Writing Evaluation, Memory
Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment
Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022
The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…
Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria
Elif Sari – International Journal of Assessment Tools in Education, 2024
Employing G-theory and rater interviews, the study investigated how a high-stakes writing assessment procedure (i.e., a single-task, single-rater, and holistic scoring procedure) impacted the variability and reliability of its scores within the Turkish higher education context. Thirty-two essays written on two different writing tasks (i.e.,…
Descriptors: Foreign Countries, High Stakes Tests, Writing Evaluation, Scores
Doewes, Afrizal; Pechenizkiy, Mykola – International Educational Data Mining Society, 2021
Scoring essays is generally an exhausting and time-consuming task for teachers. Automated Essay Scoring (AES) facilitates the scoring process to be faster and more consistent. The most logical way to assess the performance of an automated scorer is by measuring the score agreement with the human raters. However, we provide empirical evidence that…
Descriptors: Man Machine Systems, Automation, Computer Assisted Testing, Scoring
Fatih Yavuz; Özgür Çelik; Gamze Yavas Çelik – British Journal of Educational Technology, 2025
This study investigates the validity and reliability of generative large language models (LLMs), specifically ChatGPT and Google's Bard, in grading student essays in higher education based on an analytical grading rubric. A total of 15 experienced English as a foreign language (EFL) instructors and two LLMs were asked to evaluate three student…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Computational Linguistics
Ramon-Casas, Marta; Nuño, Neus; Pons, Ferran; Cunillera, Toni – Assessment & Evaluation in Higher Education, 2019
This article presents an empirical evaluation of the validity and reliability of a peer-assessment activity to improve academic writing competences. Specifically, we explored a large group of psychology undergraduate students with different initial writing skills. Participants (n = 365) produced two different essays, which were evaluated by their…
Descriptors: Peer Evaluation, Validity, Reliability, Writing Skills
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Walland, Emma – Research Matters, 2022
In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…
Descriptors: Essays, Grading, Writing Evaluation, Evaluators
Ullmann, Thomas Daniel – International Journal of Artificial Intelligence in Education, 2019
Reflective writing is an important educational practice to train reflective thinking. Currently, researchers must manually analyze these writings, limiting practice and research because the analysis is time and resource consuming. This study evaluates whether machine learning can be used to automate this manual analysis. The study investigates…
Descriptors: Reflection, Writing (Composition), Writing Evaluation, Automation
Chen, Dandan; Hebert, Michael; Wilson, Joshua – American Educational Research Journal, 2022
We used multivariate generalizability theory to examine the reliability of hand-scoring and automated essay scoring (AES) and to identify how these scoring methods could be used in conjunction to optimize writing assessment. Students (n = 113) included subsamples of struggling writers and non-struggling writers in Grades 3-5 drawn from a larger…
Descriptors: Reliability, Scoring, Essays, Automation
Husain Abdulhay; Moussa Ahmadian – rEFLections, 2024
This study attempted to discern the factor structure of the achievement goal orientation and goal structure constructs across the domain-specific task of essay writing in an Iranian EFL context. A convenience sample of 116 public university learners participated in a single-session, in-class study of an essay writing sampling and an immediate…
Descriptors: Foreign Countries, Factor Structure, Goal Orientation, Factor Analysis
Zhang, Xiuyuan – AERA Online Paper Repository, 2019
The main purpose of the study is to evaluate the qualities of human essay ratings for a large-scale assessment using Rasch measurement theory. Specifically, Many-Facet Rasch Measurement (MFRM) was utilized to examine the rating scale category structure and provide important information about interpretations of ratings in the large-scale…
Descriptors: Essays, Evaluators, Writing Evaluation, Reliability
Song, Yi; Deane, Paul; Beigman Klebanov, Beata – ETS Research Report Series, 2017
This project focuses on laying the foundations for automated analysis of argumentation schemes, supporting identification and classification of the arguments being made in a text, for the purpose of scoring the quality of written analyses of arguments. We developed annotation protocols for 20 argument prompts from a college-level test under the…
Descriptors: Scoring, Automation, Persuasive Discourse, Documentation
Ghanbari, Nasim; Barati, Hossein – Language Testing in Asia, 2020
The present study reports the process of development and validation of a rating scale in the Iranian EFL academic writing assessment context. To achieve this goal, the study was conducted in three distinct phases. Early in the study, the researcher interviewed a number of raters in different universities. Next, a questionnaire was developed based…
Descriptors: Rating Scales, Writing Evaluation, English for Academic Purposes, Second Language Learning
Sanders, Joe Sutliff – Children's Literature in Education, 2015
A recent surge of conversation about children's nonfiction reveals a conflict between two positions that do not at first appear to be opposed: modeling inquiry and presenting authoritative facts. Tanya Lee Stone, the author of the Sibert Award-winning "Almost Astronauts" (2009), has recently alluded to that tension and expressed a…
Descriptors: Childrens Literature, Nonfiction, Authors, Inquiry