Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 9 |
Since 2016 (last 10 years) | 31 |
Since 2006 (last 20 years) | 65 |
Descriptor
Essays | 86 |
Interrater Reliability | 86 |
Writing Evaluation | 49 |
Scoring | 37 |
English (Second Language) | 29 |
Foreign Countries | 28 |
Second Language Learning | 24 |
Scoring Rubrics | 21 |
Comparative Analysis | 19 |
Evaluators | 19 |
Correlation | 18 |
More ▼ |
Source
Author
Ben-Simon, Anat | 2 |
Coniam, David | 2 |
Crossley, Scott A. | 2 |
Hixson, Nate | 2 |
Johnson, Robert L. | 2 |
McNamara, Danielle S. | 2 |
Powers, Donald E. | 2 |
Rhudy, Vaughn | 2 |
Abbas, Mohsin | 1 |
Abedi, Jamal | 1 |
Alexander, R. Curby | 1 |
More ▼ |
Publication Type
Education Level
Higher Education | 28 |
Postsecondary Education | 21 |
Secondary Education | 7 |
High Schools | 5 |
Elementary Secondary Education | 4 |
Elementary Education | 3 |
Grade 11 | 2 |
Grade 8 | 2 |
Middle Schools | 2 |
Adult Education | 1 |
Grade 10 | 1 |
More ▼ |
Audience
Practitioners | 4 |
Teachers | 4 |
Researchers | 1 |
Location
China | 6 |
Iran | 3 |
Germany | 2 |
Hong Kong | 2 |
Taiwan | 2 |
United Kingdom | 2 |
West Virginia | 2 |
Australia | 1 |
California | 1 |
Delaware | 1 |
Israel | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Abbas, Mohsin; van Rosmalen, Peter; Kalz, Marco – IEEE Transactions on Learning Technologies, 2023
For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological, and semantic features) can be used to provide formative feedback to the students in higher education. In this study, the goal was to identify a sufficient number of features that exhibit a fair proxy of the scores given by the human raters…
Descriptors: Feedback (Response), Automation, Essays, Scoring
Wendler, Cathy; Glazer, Nancy; Cline, Frederick – ETS Research Report Series, 2019
One of the challenges in scoring constructed-response (CR) items and tasks is ensuring that rater drift does not occur during or across scoring windows. Rater drift reflects changes in how raters interpret and use established scoring criteria to assign essay scores. Calibration is a process used to help control rater drift and, as such, serves as…
Descriptors: College Entrance Examinations, Graduate Study, Accuracy, Test Reliability
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
Kumar, Vivekanandan S.; Boulanger, David – International Journal of Artificial Intelligence in Education, 2021
This article investigates the feasibility of using automated scoring methods to evaluate the quality of student-written essays. In 2012, Kaggle hosted an Automated Student Assessment Prize contest to find effective solutions to automated testing and grading. This article: a) analyzes the datasets from the contest -- which contained hand-graded…
Descriptors: Automation, Scoring, Essays, Writing Evaluation
Ramon-Casas, Marta; Nuño, Neus; Pons, Ferran; Cunillera, Toni – Assessment & Evaluation in Higher Education, 2019
This article presents an empirical evaluation of the validity and reliability of a peer-assessment activity to improve academic writing competences. Specifically, we explored a large group of psychology undergraduate students with different initial writing skills. Participants (n = 365) produced two different essays, which were evaluated by their…
Descriptors: Peer Evaluation, Validity, Reliability, Writing Skills
Pruchnic, Jeff; Barton, Ellen; Primeau, Sarah; Trimble, Thomas; Varty, Nicole; Foster, Tanina – Composition Forum, 2021
Over the past two decades, reflective writing has occupied an increasingly prominent position in composition theory, pedagogy, and assessment as researchers have described the value of reflection and reflective writing in college students' development of higher-order writing skills, such as genre conventions (Yancey, "Reflection";…
Descriptors: Reflection, Correlation, Essays, Freshman Composition
Shabani, Enayat A.; Panahi, Jaleh – Language Testing in Asia, 2020
The literature on using scoring rubrics in writing assessment denotes the significance of rubrics as practical and useful means to assess the quality of writing tasks. This study tries to investigate the agreement among rubrics endorsed and used for assessing the essay writing tasks by the internationally recognized tests of English language…
Descriptors: Writing Evaluation, Scoring Rubrics, Scores, Interrater Reliability
Jiyeo Yun – English Teaching, 2023
Studies on automatic scoring systems in writing assessments have also evaluated the relationship between human and machine scores for the reliability of automated essay scoring systems. This study investigated the magnitudes of indices for inter-rater agreement and discrepancy, especially regarding human and machine scoring, in writing assessment.…
Descriptors: Meta Analysis, Interrater Reliability, Essays, Scoring
Lian Li; Jiehui Hu; Yu Dai; Ping Zhou; Wanhong Zhang – Reading & Writing Quarterly, 2024
This paper proposes to use depth perception to represent raters' decision in holistic evaluation of ESL essays, as an alternative medium to conventional form of numerical scores. The researchers verified the new method's accuracy and inter/intra-rater reliability by inviting 24 ESL teachers to perform different representations when rating 60…
Descriptors: Essays, Holistic Approach, Writing Evaluation, Accuracy
Ballard, Laura – ProQuest LLC, 2017
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Descriptors: Evaluators, Schemata (Cognition), Eye Movements, Scoring Rubrics
Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024
To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…
Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)
Humphry, Stephen Mark; Heldsinger, Sandy – Journal of Educational Measurement, 2019
To capitalize on professional expertise in educational assessment, it is desirable to develop and test methods of rater-mediated assessment that enable classroom teachers to make reliable and informative judgments. Accordingly, this article investigates the reliability of a two-stage method used by classroom teachers to assess primary school…
Descriptors: Essays, Elementary School Students, Writing (Composition), Writing Evaluation
Ke, Xiaohua; Zeng, Yongqiang; Luo, Haijiao – Journal of Educational Measurement, 2016
This article presents a novel method, the Complex Dynamics Essay Scorer (CDES), for automated essay scoring using complex network features. Texts produced by college students in China were represented as scale-free networks (e.g., a word adjacency model) from which typical network features, such as the in-/out-degrees, clustering coefficient (CC),…
Descriptors: Scoring, Automation, Essays, Networks
International Journal of Testing, 2019
These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…
Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage
He, Tung-hsien – SAGE Open, 2019
This study employed a mixed-design approach and the Many-Facet Rasch Measurement (MFRM) framework to investigate whether rater bias occurred between the onscreen scoring (OSS) mode and the paper-based scoring (PBS) mode. Nine human raters analytically marked scanned scripts and paper scripts using a six-category (i.e., six-criterion) rating…
Descriptors: Computer Assisted Testing, Scoring, Item Response Theory, Essays