ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	11

Descriptor

Essays	16
Rating Scales	16
Writing Evaluation	11
English (Second Language)	8
Interrater Reliability	8
Second Language Learning	8
Evaluators	7
Foreign Countries	6
Scores	6
Reliability	5
Second Language Instruction	5
College Students	4
Test Reliability	4
Validity	4
Comparative Analysis	3
Evaluation Methods	3
Language Tests	3
Scoring	3
Writing Tests	3
Age Differences	2
Computer Software	2
Error of Measurement	2
Evaluation Criteria	2
Models	2
Scaling	2
More ▼

Source

Language Testing	2
Language Testing in Asia	2
AERA Online Paper Repository	1
Assessing Writing	1
International Educational…	1
Language Assessment Quarterly	1
Language Learning	1
ProQuest LLC	1
Research Matters	1
rEFLections	1

Publication Type

Journal Articles	9
Reports - Research	8
Reports - Evaluative	5
Speeches/Meeting Papers	3
Tests/Questionnaires	3
Dissertations/Theses -…	1
Reports - General	1

Education Level

Higher Education	4
Postsecondary Education	3

Audience

Location

Iran	2
Hawaii	1
Japan	1
South Korea	1
Tunisia	1
United Kingdom (England)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of Written English

What Works Clearinghouse Rating

Showing 1 to 15 of 16 results Save | Export

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Judges' Views on Pairwise Comparative Judgement and Rank Ordering as Alternatives to Analytical Essay Marking

Download full text

Walland, Emma – Research Matters, 2022

In this article, I report on examiners' views and experiences of using Pairwise Comparative Judgement (PCJ) and Rank Ordering (RO) as alternatives to traditional analytical marking for GCSE English Language essays. Fifteen GCSE English Language examiners took part in the study. After each had judged 100 pairs of essays using PCJ and eight packs of…

Descriptors: Essays, Grading, Writing Evaluation, Evaluators

Factor Analysis Study of the Achievement Goal Framework in the Domain-Specific Task of EFL Writing

Peer reviewed
PDF on ERIC

Download full text

Husain Abdulhay; Moussa Ahmadian – rEFLections, 2024

This study attempted to discern the factor structure of the achievement goal orientation and goal structure constructs across the domain-specific task of essay writing in an Iranian EFL context. A convenience sample of 116 public university learners participated in a single-session, in-class study of an essay writing sampling and an immediate…

Descriptors: Foreign Countries, Factor Structure, Goal Orientation, Factor Analysis

Investigating Human Essay Rating Quality in a Large-Scale Assessment Using Many-Facet Rasch Measurement

Peer reviewed

Direct link

Zhang, Xiuyuan – AERA Online Paper Repository, 2019

The main purpose of the study is to evaluate the qualities of human essay ratings for a large-scale assessment using Rasch measurement theory. Specifically, Many-Facet Rasch Measurement (MFRM) was utilized to examine the rating scale category structure and provide important information about interpretations of ratings in the large-scale…

Descriptors: Essays, Evaluators, Writing Evaluation, Reliability

Development and Validation of a Rating Scale for Iranian EFL Academic Writing Assessment: A Mixed-Methods Study

Peer reviewed

Direct link

Ghanbari, Nasim; Barati, Hossein – Language Testing in Asia, 2020

The present study reports the process of development and validation of a rating scale in the Iranian EFL academic writing assessment context. To achieve this goal, the study was conducted in three distinct phases. Early in the study, the researcher interviewed a number of raters in different universities. Next, a questionnaire was developed based…

Descriptors: Rating Scales, Writing Evaluation, English for Academic Purposes, Second Language Learning

Writing Scale Effects on Raters: An Exploratory Study

Peer reviewed

Direct link

Jeong, Heejeong – Language Testing in Asia, 2019

In writing assessment, finding a valid, reliable, and efficient scale is critical. Appropriate scales, increase rater reliability, and can also save time and money. This exploratory study compared the effects of a binary scale and an analytic scale across teacher raters and expert raters. The purpose of the study is to find out how different scale…

Descriptors: Writing Evaluation, English (Second Language), Second Language Learning, Second Language Instruction

Generalizability of CEFR Criterial Grammatical Features in a Korean EFL Corpus across A1, A2, B1, and B2 Levels

Peer reviewed

Direct link

Kim, Susie – Language Assessment Quarterly, 2021

Recent research conducted as part of the English Profile identified grammatical criterial features that are characteristic of each Common European Framework of Reference (CEFR) proficiency level. The extent to which these criterial features attest in English learners of various first language backgrounds call for an empirical examination. In this…

Descriptors: Guidelines, Rating Scales, Second Language Learning, Second Language Instruction

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Rater Bias Patterns in an EFL Writing Assessment

Peer reviewed

Direct link

Schaefer, Edward – Language Testing, 2008

The present study employed multi-faceted Rasch measurement (MFRM) to explore the rater bias patterns of native English-speaker (NES) raters when they rate EFL essays. Forty NES raters rated 40 essays written by female Japanese university students on a single topic adapted from the TOEFL Test of Written English (TWE). The essays were assessed using…

Descriptors: Writing Evaluation, Writing Tests, Program Effectiveness, Essays

Rating Scale Impact on EFL Essay Marking: A Mixed-Method Study

Peer reviewed

Direct link

Barkaoui, Khaled – Assessing Writing, 2007

Educators often have to choose among different types of rating scales to assess second-language (L2) writing performance. There is little research, however, on how different rating scales affect rater performance. This study employed a mixed-method approach to investigate the effects of two different rating scales on EFL essay scores, rating…

Descriptors: Writing Evaluation, Writing Tests, Rating Scales, Essays

Scalar Analysis of the Test of Written English. TOEFL Research Reports. Report 38.

Download full text

Henning, Grant – 1992

The psychometric characteristics of the Test of Written English (TWE) rating scale were explored. Rasch model scalar analysis methodology was employed with more than 4,000 scored essays across 2 elicitation prompts to gather information about the rating scale and rating process. Results suggested that the intervals between TWE scale steps were…

Descriptors: English (Second Language), Equated Scores, Essays, Interrater Reliability

Measures of Linguistic Accuracy in Second Language Writing Research.

Peer reviewed

Polio, Charlene G. – Language Learning, 1997

Investigates the reliability of measures of linguistic accuracy in second language writing. The study uses a holistic scale, error-free T-units, and an error classification system on the essays of English-as-a-Second-Language students and discusses why disagreements arise within a rater and between raters. (24 references) (Author/CK)

Descriptors: College Students, English (Second Language), Error Analysis (Language), Error of Measurement

Reliability of Essay Rating and Score Adjustment. Program Statistics Research Technical Report No. 93-36.

Download full text

Longford, Nicholas T. – 1993

A model-based approach to rater reliability for essays read by multiple readers is presented. Variation of rater severity (between-rater variation) and rater inconsistency (within-rater variation) is considered in the presence of between-examinee variation. An additive variance component model is posited and the method of moments for its…

Descriptors: Educational Diagnosis, Error of Measurement, Essays, Estimation (Mathematics)

A Study of Composition Ability as Assessed With a Standardized Instrument for Second and Third Grade Children.

Download full text

Biesbrock, Edieann – 1969

Written compositions of 200 second and third grade children were compared four times over a 2-year period to determine relationships of composition ability with intelligence and reading level. A global essay instrument developed at the University of Georgia was used for the comparisons to find out whether or not the instrument revealed any…

Descriptors: Age Differences, Elementary School Students, Essays, Grade 2

Previous Page | Next Page »

Pages: 1 | 2

Ann Tai Choe	1
Barati, Hossein	1
Barkaoui, Khaled	1
Biesbrock, Edieann	1
Daniel Holden	1
Daniel R. Isbell	1
Doewes, Afrizal	1
Ghanbari, Nasim	1
Henning, Grant	1
Husain Abdulhay	1
Jeong, Heejeong	1
Kim, Susie	1
Kurdhi, Nughthoh Arfawi	1
Longford, Nicholas T.	1
Moussa Ahmadian	1
Polio, Charlene G.	1
Powills, Judith A.	1
Saxena, Akrati	1
Schaefer, Edward	1
Walland, Emma	1
Yu-Tzu Chang	1
Yun, Jiyeo	1
Zhang, Xiuyuan	1
More ▼