ERIC - Search Results

Publication Date

In 2025	1
Since 2024	9

Source

Language Testing

Author

Iasonas Lamprianou	2
Reeta Neittaanmäki	2
Ann Tai Choe	1
Daniel Holden	1
Daniel R. Isbell	1
Emma Marsden	1
Haerim Hwang	1
Huiying Cai	1
Hyunwoo Kim	1
John Read	1
Junlan Pan	1
Kyoungwon Bishop	1
Maria Treadaway	1
Ping-Lin Chuang	1
Shangchao Min	1
Xun Yan	1
Yu-Tzu Chang	1
More ▼

Publication Type

Journal Articles	9
Reports - Research	9
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3

Audience

Location

Finland	2
China	1
Hawaii	1
Illinois (Urbana)	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

A Shortened Test Is Feasible: Evaluating a Large-Scale Multistage Adaptive English Language Assessment

Peer reviewed

Direct link

Shangchao Min; Kyoungwon Bishop – Language Testing, 2024

This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1-12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST…

Descriptors: Adaptive Testing, Test Construction, Language Tests, English Language Learners

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Developing Internet-Based "Tests of Aptitude for Language Learning (TALL)": An Open Research Endeavour

Peer reviewed

Direct link

Junlan Pan; Emma Marsden – Language Testing, 2024

"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…

Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

Korean Syntactic Complexity Analyzer (KOSCA): An NLP Application for the Analysis of Syntactic Complexity in Second Language Production

Peer reviewed

Direct link

Haerim Hwang; Hyunwoo Kim – Language Testing, 2024

Given the lack of computational tools available for assessing second language (L2) production in Korean, this study introduces a novel automated tool called the Korean Syntactic Complexity Analyzer (KOSCA) for measuring syntactic complexity in L2 Korean production. As an open-source graphic user interface (GUI) developed in Python, KOSCA provides…

Descriptors: Korean, Natural Language Processing, Syntax, Computer Graphics

Setting Standards for a Diagnostic Test of Aviation English for Student Pilots

Peer reviewed

Direct link

Maria Treadaway; John Read – Language Testing, 2024

Standard-setting is an essential component of test development, supporting the meaningfulness and appropriate interpretation of test scores. However, in the high-stakes testing environment of aviation, standard-setting studies are underexplored. To address this gap, we document two stages in the standard-setting procedures for the Overseas Flight…

Descriptors: Standard Setting, Diagnostic Tests, High Stakes Tests, English for Special Purposes

Language Tests	7
Test Reliability	5
Evaluators	4
Second Language Learning	4
Foreign Countries	3
Interrater Reliability	3
Item Response Theory	3
Rating Scales	3
Test Construction	3
Test Validity	3
Evaluation Criteria	2
Evaluation Methods	2
High Stakes Tests	2
Language Proficiency	2
Listening Comprehension Tests	2
Natural Language Processing	2
Placement Tests	2
Reading Tests	2
Writing Evaluation	2
Writing Tests	2
Achievement Rating	1
Adaptive Testing	1
Aptitude Tests	1
Aviation Education	1
Causal Models	1
More ▼