ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	6
Since 2016 (last 10 years)	10
Since 2006 (last 20 years)	21

Descriptor

Evaluators	23
Writing Evaluation	23
Writing Tests	23
Second Language Learning	13
English (Second Language)	11
Language Tests	11
Essays	8
Scores	8
Correlation	6
Evaluation Criteria	6
Interrater Reliability	6
Scoring	6
Foreign Countries	5
Accuracy	4
Comparative Analysis	4
Language Proficiency	4
Performance Based Assessment	4
Certification	3
Computer Assisted Testing	3
Essay Tests	3
Feedback (Response)	3
Item Response Theory	3
Prompting	3
Rating Scales	3
Scoring Rubrics	3
More ▼

Source

Language Testing	9
ETS Research Report Series	2
Language Assessment Quarterly	2
Assessment in Education:…	1
Educational and Psychological…	1
Higher Education Research and…	1
Journal of Effective Teaching	1
Language Teaching Research…	1
Language Testing in Asia	1
ProQuest LLC	1
Scandinavian Journal of…	1
More ▼

Publication Type

Reports - Research	21
Journal Articles	20
Tests/Questionnaires	2
Dissertations/Theses -…	1
Reports - Evaluative	1
Speeches/Meeting Papers	1

Education Level

Higher Education	4
Postsecondary Education	4
Secondary Education	4
Elementary Education	2
Elementary Secondary Education	1
High Schools	1

Audience

Location

Norway	2
China	1
Europe	1
Hawaii	1
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

International English…	2
Test of English as a Foreign…	2

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Evaluating Rater Accuracy in Rater-Mediated Assessments Using an Unfolding Model

Peer reviewed

Direct link

Wang, Jue; Engelhard, George, Jr.; Wolfe, Edward W. – Educational and Psychological Measurement, 2016

The number of performance assessments continues to increase around the world, and it is important to explore new methods for evaluating the quality of ratings obtained from raters. This study describes an unfolding model for examining rater accuracy. Accuracy is defined as the difference between observed and expert ratings. Dichotomous accuracy…

Descriptors: Evaluators, Accuracy, Performance Based Assessment, Models

"Digging for Gold" or "Sticking to the Criteria": Teachers' Rationales When Serving as Professional Raters

Peer reviewed

Direct link

Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020

This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…

Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions

Establishing Comparability across Writing Tasks with Picture Prompts of Three Alternate Tests

Peer reviewed

Direct link

Li, Jiuliang – Language Assessment Quarterly, 2018

In language testing programs, different test forms are often used to administer the same test. Demonstrating the comparability of these forms is essential to avoid criticisms of potential test unfairness. However, studies with this objective are scarce. This study aims to investigate the extent to which the picture-prompt writing tasks of three…

Descriptors: Writing Tests, Language Tests, Check Lists, Culture Fair Tests

Rater Strategies for Reaching Agreement on Pupil Text Quality

Peer reviewed

Direct link

Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015

Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…

Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices

A Comparison of Newly-Trained and Experienced Raters on a Standardized Writing Assessment

Peer reviewed

Direct link

Attali, Yigal – Language Testing, 2016

A short training program for evaluating responses to an essay writing task consisted of scoring 20 training essays with immediate feedback about the correct score. The same scoring session also served as a certification test for trainees. Participants with little or no previous rating experience completed this session and 14 trainees who passed an…

Descriptors: Writing Evaluation, Writing Tests, Standardized Tests, Evaluators

Effect of Genre on the Generalizability of Writing Scores

Peer reviewed

Direct link

Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015

In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…

Descriptors: Writing Tests, Generalization, Scores, Writing Instruction

Automated Trait Scores for "TOEFL"® Writing Tasks. Research Report. ETS RR-15-14

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…

Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)

The Differences in Error Rate and Type between IELTS Writing Bands and Their Impact on Academic Workload

Peer reviewed

Direct link

Müller, Amanda – Higher Education Research and Development, 2015

This paper attempts to demonstrate the differences in writing between International English Language Testing System (IELTS) bands 6.0, 6.5 and 7.0. An analysis of exemplars provided from the IELTS test makers reveals that IELTS 6.0, 6.5 and 7.0 writers can make a minimum of 206 errors, 96 errors and 35 errors per 1000 words. The following section…

Descriptors: English (Second Language), Second Language Learning, Language Tests, Scores

Organization of Ideas in Writing: What Are Raters Sensitive to?

Peer reviewed

Direct link

Ruegg, Rachael; Sugiyama, Yuko – Language Testing in Asia, 2013

Whether foreign language writing is rated using analytic rating scales or holistically, the organization of ideas is invariably one of the aspects assessed. However, it is unclear what raters are sensitive to when rating writing for organization. Which do they value more highly, the physical aspects of organization, such as paragraphing and the…

Descriptors: Second Language Learning, Writing Evaluation, Evaluation Criteria, Writing Tests

Previous Page | Next Page »

Pages: 1 | 2

Attali, Yigal	2
Jølle, Lennart	2
Wolfe, Edward W.	2
Ann Tai Choe	1
Arnold, Voiza	1
Barkhuizen, Gary	1
Bouwer, Renske	1
Bridgeman, Brent	1
Béguin, Anton	1
Chuang, Ping-Lin	1
Daniel Holden	1
Daniel R. Isbell	1
Davey, Tim	1
Eckes, Thomas	1
Elder, Catherine	1
Engelhard, George, Jr.	1
Feltovich, Brian	1
Glew, David	1
Jeffery, Jill V.	1
Knoch, Ute	1
Kyriakou, Nansia	1
Lamprianou, Iasonas	1
Li, Jiuliang	1
Lim, Gad S.	1
More ▼