ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	19

Descriptor

Evaluators	19
Language Tests	19
Writing Tests	19
Second Language Learning	15
English (Second Language)	13
Writing Evaluation	11
Essays	8
Scoring	8
Correlation	7
Scores	7
Computer Assisted Testing	6
Language Proficiency	6
Comparative Analysis	5
Evaluation Criteria	5
Foreign Countries	5
Prompting	5
Computer Software	4
Performance Based Assessment	4
Rating Scales	3
Test Reliability	3
Accuracy	2
Decision Making	2
Error Patterns	2
Generalization	2
German	2
More ▼

Source

Language Testing	7
ETS Research Report Series	5
Language Assessment Quarterly	2
Higher Education Research and…	1
International Journal of…	1
JALT CALL Journal	1
Language Teaching Research…	1
Studies in Applied…	1

Publication Type

Journal Articles	19
Reports - Research	19
Tests/Questionnaires	2

Education Level

Higher Education	4
Postsecondary Education	4
Elementary Education	1
High Schools	1
Secondary Education	1

Audience

Location

China	1
Germany	1
Hawaii	1
Iran	1
Netherlands	1
New York (New York)	1

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	5
International English…	2

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Assessing the Content Quality of Essays in Content and Language Integrated Learning: Exploring the Construct from Subject Specialists' Perspectives

Peer reviewed

Direct link

Takanori Sato – Language Testing, 2024

Assessing the content of learners' compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical…

Descriptors: Language Tests, Content and Language Integrated Learning, Writing Evaluation, Writing Tests

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

The Intersection of AI and Language Assessment: A Study on the Reliability of ChatGPT in Grading IELTS Writing Task 2

Peer reviewed
PDF on ERIC

Download full text

Osama Koraishi – Language Teaching Research Quarterly, 2024

This study conducts a comprehensive quantitative evaluation of OpenAI's language model, ChatGPT 4, for grading Task 2 writing of the IELTS exam. The objective is to assess the alignment between ChatGPT's grading and that of official human raters. The analysis encompassed a multifaceted approach, including a comparison of means and reliability…

Descriptors: Second Language Learning, English (Second Language), Language Tests, Artificial Intelligence

A Sequential Approach to Detecting Differential Rater Functioning in Sparse Rater-Mediated Assessment Networks

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2023

Researchers frequently evaluate rater judgments in performance assessments for evidence of differential rater functioning (DRF), which occurs when rater severity is systematically related to construct-irrelevant student characteristics after controlling for student achievement levels. However, researchers have observed that methods for detecting…

Descriptors: Evaluators, Decision Making, Student Characteristics, Performance Based Assessment

Examining Severity and Centrality Effects in TestDaF Writing and Speaking Assessments: An Extended Bayesian Many-Facet Rasch Analysis

Peer reviewed

Direct link

Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021

Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…

Descriptors: Language Tests, German, Second Languages, Writing Tests

Generalizability of Writing Scores and Language Program Placement Decisions: Score Dependability, Task Variability, and Score Profiles on an ESL Placement Test

Peer reviewed
PDF on ERIC

Download full text

Eskin, Daniel – Studies in Applied Linguistics & TESOL, 2022

For agencies that deliver high-stakes Second Language (L2) proficiency exams, a research agenda has been undertaken for years to examine the role of rater, task, and rubric as sources of variability into their performance assessments (Lee, 2006; Sawaki & Sinharay, 2013; Xi, 2007; Xi & Mollaun, 2006). However, these challenges are more…

Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Student Placement

Establishing Comparability across Writing Tasks with Picture Prompts of Three Alternate Tests

Peer reviewed

Direct link

Li, Jiuliang – Language Assessment Quarterly, 2018

In language testing programs, different test forms are often used to administer the same test. Demonstrating the comparability of these forms is essential to avoid criticisms of potential test unfairness. However, studies with this objective are scarce. This study aims to investigate the extent to which the picture-prompt writing tasks of three…

Descriptors: Writing Tests, Language Tests, Check Lists, Culture Fair Tests

Effect of Genre on the Generalizability of Writing Scores

Peer reviewed

Direct link

Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015

In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…

Descriptors: Writing Tests, Generalization, Scores, Writing Instruction

A Comparative Analysis of Face to Face Instruction vs. Telegram Mobile Instruction in Terms of Narrative Writing

Peer reviewed
PDF on ERIC

Download full text

Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018

The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…

Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction

Automated Trait Scores for "TOEFL"® Writing Tasks. Research Report. ETS RR-15-14

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015

The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…

Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)

The Differences in Error Rate and Type between IELTS Writing Bands and Their Impact on Academic Workload

Peer reviewed

Direct link

Müller, Amanda – Higher Education Research and Development, 2015

This paper attempts to demonstrate the differences in writing between International English Language Testing System (IELTS) bands 6.0, 6.5 and 7.0. An analysis of exemplars provided from the IELTS test makers reveals that IELTS 6.0, 6.5 and 7.0 writers can make a minimum of 206 errors, 96 errors and 35 errors per 1000 words. The following section…

Descriptors: English (Second Language), Second Language Learning, Language Tests, Scores

Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior

Peer reviewed

Direct link

Eckes, Thomas – Language Assessment Quarterly, 2012

This research investigated the relation between rater cognition and rater behavior, focusing on differential severity/leniency regarding criteria. Participants comprised a sample of 18 raters examined in a previous rater cognition study (Eckes, 2008b). These raters, who were known to differ widely in their perceptions of criterion importance,…

Descriptors: Writing Evaluation, Writing Tests, Evaluation Criteria, Multivariate Analysis

TOEFL11: A Corpus of Non-Native English. Research Report. ETS RR-13-24

Peer reviewed
PDF on ERIC

Download full text

Blanchard, Daniel; Tetreault, Joel; Higgins, Derrick; Cahill, Aoife; Chodorow, Martin – ETS Research Report Series, 2013

This report presents work on the development of a new corpus of non-native English writing. It will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring. In this report, the corpus is described in detail.

Descriptors: Language Tests, Second Language Learning, English (Second Language), Writing Tests

The Development and Maintenance of Rating Quality in Performance Writing Assessment: A Longitudinal Study of New and Experienced Raters

Peer reviewed

Direct link

Lim, Gad S. – Language Testing, 2011

Raters are central to writing performance assessment, and rater development--training, experience, and expertise--involves a temporal dimension. However, few studies have examined new and experienced raters' rating performance longitudinally over multiple time points. This study uses operational data from the writing section of the MELAB (n =…

Descriptors: Expertise, Writing Evaluation, Performance Based Assessment, Writing Tests

Previous Page | Next Page »

Pages: 1 | 2

Eckes, Thomas	3
Ann Tai Choe	1
Attali, Yigal	1
Blanchard, Daniel	1
Bouwer, Renske	1
Breyer, F. Jay	1
Bridgeman, Brent	1
Brunfaut, Tineke	1
Béguin, Anton	1
Cahill, Aoife	1
Chodorow, Martin	1
Daniel Holden	1
Daniel R. Isbell	1
Davey, Tim	1
Eskin, Daniel	1
Gentile, Claudia	1
Heidari, Jamshid	1
Higgins, Derrick	1
Jin, Kuan-Yu	1
Kantor, Robert	1
Khodabandeh, Farzaneh	1
Lee, Yong-Won	1
Lestari, Santi B.	1
Li, Jiuliang	1
Lim, Gad S.	1
More ▼