ERIC - Search Results

Publication Date

In 2025	1
Since 2024	9
Since 2021 (last 5 years)	24
Since 2016 (last 10 years)	51
Since 2006 (last 20 years)	91

Descriptor

Language Tests	84
Test Reliability	69
Second Language Learning	67
English (Second Language)	49
Foreign Countries	42
Test Validity	41
Interrater Reliability	37
Language Proficiency	36
Scores	29
Evaluators	22
Correlation	20
Comparative Analysis	19
Test Construction	18
Testing	18
Item Response Theory	17
Scoring	17
Rating Scales	15
Reliability	14
Oral Language	13
Writing Evaluation	13
Writing Tests	12
Reading Comprehension	11
Statistical Analysis	11
Native Speakers	10
Second Language Instruction	10
More ▼

Source

Language Testing

115

Publication Type

Journal Articles	115
Reports - Research	80
Reports - Evaluative	23
Reports - Descriptive	8
Information Analyses	6
Tests/Questionnaires	5
Opinion Papers	3
Speeches/Meeting Papers	1

Education Level

Higher Education	21
Postsecondary Education	14
Secondary Education	10
Elementary Education	6
Elementary Secondary Education	4
High Schools	2
Junior High Schools	2
Middle Schools	2
Adult Education	1
Early Childhood Education	1
Grade 12	1
Grade 6	1
Grade 7	1
Intermediate Grades	1
Kindergarten	1
Primary Education	1
More ▼

Audience

Location

Netherlands	7
China	6
Finland	4
Germany	4
Australia	3
Japan	3
South Korea	3
Canada	2
France	2
Hong Kong	2
Taiwan	2
United Kingdom	2
Arizona	1
Bulgaria	1
China (Guangzhou)	1
Colombia	1
Denmark	1
Europe	1
Georgia	1
Hawaii	1
Illinois	1
Illinois (Urbana)	1
India	1
Indiana	1
Italy	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	9
ACTFL Oral Proficiency…	1
English Proficiency Test	1
Graduate Record Examinations	1
International English…	1
Peabody Picture Vocabulary…	1
Test of Written English	1

What Works Clearinghouse Rating

Language Testing X

Showing 1 to 15 of 115 results Save | Export

Communal Factors in Rater Severity and Consistency over Time in High-Stakes Oral Assessment

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to major changes in the rating system in a high-stakes testing context. The study is based on longitudinal data collected from 2009 to 2019 from the second language (L2) Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. We investigated…

Descriptors: Foreign Countries, Interrater Reliability, Evaluators, Item Response Theory

A Shortened Test Is Feasible: Evaluating a Large-Scale Multistage Adaptive English Language Assessment

Peer reviewed

Direct link

Shangchao Min; Kyoungwon Bishop – Language Testing, 2024

This paper evaluates the multistage adaptive test (MST) design of a large-scale academic language assessment (ACCESS) for Grades 1-12, with an aim to simplify the current MST design, using both operational and simulated test data. Study 1 explored the operational population data (1,456,287 test-takers) of the listening and reading tests of MST…

Descriptors: Adaptive Testing, Test Construction, Language Tests, English Language Learners

A Meta-Analysis of Self-Assessment and Language Performance in Language Testing and Assessment

Peer reviewed

Direct link

Li, Minzi; Zhang, Xian – Language Testing, 2021

This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…

Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Developing Internet-Based "Tests of Aptitude for Language Learning (TALL)": An Open Research Endeavour

Peer reviewed

Direct link

Junlan Pan; Emma Marsden – Language Testing, 2024

"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…

Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction

Revisiting Rating Scale Development for Rater-Mediated Language Performance Assessments: Modelling Construct and Contextual Choices Made by Scale Developers

Peer reviewed

Direct link

Knoch, Ute; Deygers, Bart; Khamboonruang, Apichat – Language Testing, 2021

Rating scale development in the field of language assessment is often considered in dichotomous ways: It is assumed to be guided either by expert intuition or by drawing on performance data. Even though quite a few authors have argued that rating scale development is rarely so easily classifiable, this dyadic view has dominated language testing…

Descriptors: Rating Scales, Test Construction, Language Tests, Test Use

Making Each Point Count: Revising a Local Adaptation of the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE Rubric

Peer reviewed

Direct link

Yu-Tzu Chang; Ann Tai Choe; Daniel Holden; Daniel R. Isbell – Language Testing, 2024

In this Brief Report, we describe an evaluation of and revisions to a rubric adapted from the Jacobs et al.'s (1981) ESL COMPOSITION PROFILE, with four rubric categories and 20-point rating scales, in the context of an intensive English program writing placement test. Analysis of 4 years of rating data (2016-2021, including 434 essays) using…

Descriptors: Language Tests, Rating Scales, Second Language Learning, English (Second Language)

The Typology of Second Language Listening Constructs: A Systematic Review

Peer reviewed

Direct link

Aryadoust, Vahid; Luo, Lan – Language Testing, 2023

This study reviewed conceptualizations and operationalizations of second language (L2) listening constructs. A total of 157 peer-reviewed papers published in 19 journals in applied linguistics were coded for (1) publication year, author, source title, location, language, and reliability and (2) listening subskills, cognitive processes, attributes,…

Descriptors: Test Format, Listening Comprehension Tests, Second Language Learning, Second Language Instruction

Operationalizing the Reading-into-Writing Construct in Analytic Rating Scales: Effects of Different Approaches on Rating

Peer reviewed

Direct link

Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023

Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…

Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes

More Efficient Processes for Creating Automated Essay Scoring Frameworks: A Demonstration of Two Algorithms

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021

Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…

Descriptors: Scoring, Essays, Writing Evaluation, Computer Software

"How Do Raters Learn to Rate?" Many-Facet Rasch Modeling of Rater Performance over the Course of a Rater Certification Program

Peer reviewed

Direct link

Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023

This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…

Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification

Developing a Local Academic English Listening Test Using Authentic Unscripted Audio-Visual Texts

Peer reviewed

Direct link

Park, Yena; Lee, Senyung; Shin, Sun-Young – Language Testing, 2022

Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target…

Descriptors: Listening Comprehension Tests, English for Academic Purposes, Test Construction, Foreign Students

A Nonparametric Procedure for Exploring Differences in Rating Quality across Test-Taker Subgroups in Rater-Mediated Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A. – Language Testing, 2019

Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…

Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Knoch, Ute	4
Alderson, J. Charles	2
Aryadoust, Vahid	2
Attali, Yigal	2
Brown, James Dean	2
Chapelle, Carol A.	2
Deygers, Bart	2
Elder, Catherine	2
Haug, Tobias	2
Iasonas Lamprianou	2
Jarvis, Scott	2
Kunnan, Antony John	2
Lee, Yong-Won	2
Lin, Chih-Kai	2
Reeta Neittaanmäki	2
Schoonen, Rob	2
Shin, Sun-Young	2
Stansfield, Charles W.	2
Wind, Stefanie A.	2
Winke, Paula	2
Yan, Xun	2
de Jong, Nivja H.	2
Alanen, Riikka	1
Allan, Alistair	1
More ▼