ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	10
Since 2016 (last 10 years)	20

Descriptor

Evaluators	20
Writing Tests	20
Second Language Learning	10
Writing Evaluation	10
Language Tests	9
English (Second Language)	8
Scoring	7
Comparative Analysis	6
Essays	6
Foreign Countries	6
Item Response Theory	6
Interrater Reliability	5
Performance Based Assessment	5
Scores	5
Accuracy	4
Language Proficiency	4
Goodness of Fit	3
Rating Scales	3
Reliability	3
Bias	2
Certification	2
College Entrance Examinations	2
College Students	2
Computer Assisted Testing	2
Correlation	2
More ▼

Source

Language Testing	9
Applied Measurement in…	2
Educational and Psychological…	2
Educational Measurement:…	1
International Journal of…	1
JALT CALL Journal	1
Language Assessment Quarterly	1
Language Teaching Research…	1
Scandinavian Journal of…	1
Studies in Applied…	1

Publication Type

Journal Articles	20
Reports - Research	20
Tests/Questionnaires	1

Education Level

Higher Education	5
Postsecondary Education	5
Secondary Education	3
Elementary Education	1
High Schools	1

Audience

Location

China	1
Colombia	1
Europe	1
Germany	1
Hawaii	1
Illinois (Urbana)	1
Iran	1
New York (New York)	1
Norway	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
International English…	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 20 results Save | Export

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Exploring the Impersonal Judgments and Personal Preferences of Raters in Rater-Mediated Assessments with Unfolding Models

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019

The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…

Descriptors: Evaluative Thinking, Preferences, Evaluators, Models

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Examining Severity and Centrality Effects in TestDaF Writing and Speaking Assessments: An Extended Bayesian Many-Facet Rasch Analysis

Peer reviewed

Direct link

Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021

Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…

Descriptors: Language Tests, German, Second Languages, Writing Tests

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

Detecting Measurement Disturbances in Rater-Mediated Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Schumacker, Randall E. – Educational Measurement: Issues and Practice, 2017

The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start-up, plodding, boredom,…

Descriptors: Measurement, Testing Problems, Writing Tests, Performance Based Assessment

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…

Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models

"Digging for Gold" or "Sticking to the Criteria": Teachers' Rationales When Serving as Professional Raters

Peer reviewed

Direct link

Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020

This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…

Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Previous Page | Next Page »

Pages: 1 | 2

Attali, Yigal	2
Engelhard, George, Jr.	2
Wang, Jue	2
Wind, Stefanie A.	2
Ann Tai Choe	1
Brunfaut, Tineke	1
Buzick, Heather	1
Chuang, Ping-Lin	1
Daniel Holden	1
Daniel R. Isbell	1
Eckes, Thomas	1
Eskin, Daniel	1
Ferrara, Steve	1
Flor, Michael	1
Heidari, Jamshid	1
Janssen, Gerriet	1
Jin, Kuan-Yu	1
Jølle, Lennart	1
Khodabandeh, Farzaneh	1
Kyriakou, Nansia	1
Lamprianou, Iasonas	1
Lestari, Santi B.	1
Li, Jiuliang	1
Meier, Valerie	1
Oliveri, Maria Elena	1
More ▼