ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	12

Descriptor

Writing Tests	12
Second Language Learning	9
Evaluators	8
English (Second Language)	7
Interrater Reliability	7
Writing Evaluation	6
Item Response Theory	5
Language Tests	5
Foreign Countries	4
Rating Scales	4
Scores	4
Test Reliability	4
Essays	3
Certification	2
Computer Assisted Testing	2
Evaluation Criteria	2
Factor Analysis	2
Feedback (Response)	2
Internet	2
Listening Comprehension Tests	2
Placement Tests	2
Program Effectiveness	2
Scaling	2
Scoring	2
Scoring Rubrics	2
More ▼

Source

Language Testing

Publication Type

Journal Articles	12
Reports - Research	9
Reports - Evaluative	3
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	1
Secondary Education	1

Audience

Location

Canada	1
Colombia	1
Europe	1
France	1
Georgia	1
Germany	1
Hawaii	1
Illinois (Urbana)	1
Italy	1
Japan	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Do the TOEFL iBT® Section Scores Provide Value-Added Information to Stakeholders

Peer reviewed

Direct link

Sawaki, Yasuyo; Sinharay, Sandip – Language Testing, 2018

The present study examined the reliability of the reading, listening, speaking, and writing section scores for the TOEFL iBT® test and their interrelationship in order to collect empirical evidence to support, respectively, the "generalization" inference and the "explanation" inference in the TOEFL iBT validity argument…

Descriptors: English (Second Language), Language Tests, Second Language Learning, Computer Assisted Testing

Measuring the Impact of Rater Negotiation in Writing Performance Assessment

Peer reviewed

Direct link

Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017

Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…

Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Test Review: Test of English as a Foreign Language[TM]--Internet-Based Test (TOEFL iBT[R])

Peer reviewed

Direct link

Alderson, J. Charles – Language Testing, 2009

In this article, the author reviews the TOEFL iBT which is the latest version of the TOEFL, whose history stretches back to 1961. The TOEFL iBT was introduced in the USA, Canada, France, Germany and Italy in late 2005. Currently the TOEFL test is offered in two testing formats: (1) Internet-based testing (iBT); and (2) paper-based testing (PBT).…

Descriptors: Oral Language, Writing Tests, Listening Comprehension Tests, Test Reviews

Rater Bias Patterns in an EFL Writing Assessment

Peer reviewed

Direct link

Schaefer, Edward – Language Testing, 2008

The present study employed multi-faceted Rasch measurement (MFRM) to explore the rater bias patterns of native English-speaker (NES) raters when they rate EFL essays. Forty NES raters rated 40 essays written by female Japanese university students on a single topic adapted from the TOEFL Test of Written English (TWE). The essays were assessed using…

Descriptors: Writing Evaluation, Writing Tests, Program Effectiveness, Essays

Evaluating Rater Responses to an Online Training Program for L2 Writing Assessment

Peer reviewed

Direct link

Elder, Catherine; Barkhuizen, Gary; Knoch, Ute; von Randow, Janet – Language Testing, 2007

The use of online rater self-training is growing in popularity and has obvious practical benefits, facilitating access to training materials and rating samples and allowing raters to reorient themselves to the rating scale and self monitor their behaviour at their own convenience. However there has thus far been little research into rater…

Descriptors: Writing Evaluation, Writing Tests, Scoring Rubrics, Rating Scales

Alderson, J. Charles	1
Ann Tai Choe	1
Attali, Yigal	1
Barkhuizen, Gary	1
Brunfaut, Tineke	1
Chuang, Ping-Lin	1
Daniel Holden	1
Daniel R. Isbell	1
Elder, Catherine	1
Janssen, Gerriet	1
Knoch, Ute	1
Kyriakou, Nansia	1
Lamprianou, Iasonas	1
Lestari, Santi B.	1
Meier, Valerie	1
Ping-Lin Chuang	1
Sawaki, Yasuyo	1
Schaefer, Edward	1
Sinharay, Sandip	1
Trace, Jonathan	1
Tsagari, Dina	1
Wind, Stefanie A.	1
Yan, Xun	1
Yu-Tzu Chang	1
von Randow, Janet	1
More ▼