ERIC - Search Results

Publication Date

In 2025	1
Since 2024	4
Since 2021 (last 5 years)	8
Since 2016 (last 10 years)	11
Since 2006 (last 20 years)	15

Source

Language Testing

Publication Type

Journal Articles	15
Reports - Research	14
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Higher Education	15
Postsecondary Education	15
Adult Education	1
Elementary Secondary Education	1
Secondary Education	1

Audience

Location

China	1
Europe	1
Finland	1
Indiana	1
Iran	1
Japan	1
Netherlands	1
Pennsylvania (Philadelphia)	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 15 results Save | Export

All Types of Experience Are Equal, but Some Are More Equal: The Effect of Different Types of Experience on Rater Severity and Rater Consistency

Peer reviewed

Direct link

Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024

This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…

Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience

Triangulating Natural Language Processing (NLP)-Based Analysis of Rater Comments and Many-Facet Rasch Measurement (MFRM): An Innovative Approach to Investigating Raters' Application of Rating Scales in Writing Assessment

Peer reviewed

Direct link

Huiying Cai; Xun Yan – Language Testing, 2024

Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…

Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation

Developing Internet-Based "Tests of Aptitude for Language Learning (TALL)": An Open Research Endeavour

Peer reviewed

Direct link

Junlan Pan; Emma Marsden – Language Testing, 2024

"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…

Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction

A New Scoring Method for Item Response Theory Analysis of C-Tests

Peer reviewed

Direct link

Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025

This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…

Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction

Developing a Local Academic English Listening Test Using Authentic Unscripted Audio-Visual Texts

Peer reviewed

Direct link

Park, Yena; Lee, Senyung; Shin, Sun-Young – Language Testing, 2022

Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target…

Descriptors: Listening Comprehension Tests, English for Academic Purposes, Test Construction, Foreign Students

The Use of Generalizability Theory in Investigating the Score Dependability of Classroom-Based L2 Reading Assessment

Peer reviewed

Direct link

Liao, Ray J. T. – Language Testing, 2023

Among the variety of selected response formats used in L2 reading assessment, multiple-choice (MC) is the most commonly adopted, primarily due to its efficiency and objectiveness. Given the impact of assessment results on teaching and learning, it is necessary to investigate the degree to which the MC format reliably measures learners' L2 reading…

Descriptors: Reading Tests, Language Tests, Second Language Learning, Second Language Instruction

The Longitudinal Stability of Rating Characteristics in an EFL Examination: Methodological and Substantive Considerations

Peer reviewed

Direct link

Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021

This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…

Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation

Examining the L2 Reading Comprehension Ability of Adult ELLs: Developing a Diagnostic Test within the Cognitive Diagnostic Assessment Framework

Peer reviewed

Direct link

Toprak, Tugba Elif; Cakir, Abdulvahit – Language Testing, 2021

Cognitive diagnostic assessment (CDA) has been applied to language assessment in a number of studies in which a diagnostic classification model (DCM) was retrofitted to the results of a non-diagnostic assessment. However, the need to apply CDA through utilization of an inductive rather than a retrofitted approach has been a recurrent theme in…

Descriptors: English (Second Language), Second Language Learning, Undergraduate Students, Young Adults

Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring

Peer reviewed

Direct link

Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013

Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…

Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests

Critical Language Awareness for the Heritage Context: Development and Validation of a Measurement Questionnaire

Peer reviewed

Direct link

Beaudrie, Sara; Amezcua, Angelica; Loza, Sergio – Language Testing, 2019

Critical language awareness (CLA) is increasingly identified as a central component of the Spanish heritage language (SHL) classroom (Leeman, 2005; Martínez, 2003; among others). As a minority language, SHL is subject to sociopolitical, cultural, and economic forces that devalue its status. It is devalued in the eyes of the public, as a legitimate…

Descriptors: Metalinguistics, Heritage Education, Spanish, Second Language Learning

Evaluating Different Standard-Setting Methods in an ESL Placement Testing Context

Peer reviewed

Direct link

Shin, Sun-Young; Lidster, Ryan – Language Testing, 2017

In language programs, it is crucial to place incoming students into appropriate levels to ensure that course curriculum and materials are well targeted to their learning needs. Deciding how and where to set cutscores on placement tests is thus of central importance to programs, but previous studies in educational measurement disagree as to which…

Descriptors: Language Tests, English (Second Language), Standard Setting (Scoring), Student Placement

A Comparison of Three Test Formats to Assess Word Difficulty

Peer reviewed

Direct link

Culligan, Brent – Language Testing, 2015

This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…

Descriptors: Language Tests, Test Format, Comparative Analysis, Vocabulary

An Examination of Rater Performance on a Local Oral English Proficiency Test: A Mixed-Methods Approach

Peer reviewed

Direct link

Yan, Xun – Language Testing, 2014

This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of…

Descriptors: Mixed Methods Research, Language Tests, Oral Language, Language Proficiency

Predicting Grades from an English Language Assessment: The Importance of Peeling the Onion

Peer reviewed

Direct link

Bridgeman, Brent; Cho, Yeonsuk; DiPietro, Stephen – Language Testing, 2016

Data from 787 international undergraduate students at an urban university in the United States were used to demonstrate the importance of separating a sample into meaningful subgroups in order to demonstrate the ability of an English language assessment to predict the first-year grade point average (GPA). For example, when all students were pooled…

Descriptors: Grade Prediction, English Curriculum, Language Tests, Undergraduate Students

Native Speakers' Perceptions of Fluency and Accent in L2 Speech

Peer reviewed

Direct link

Pinget, Anne-France; Bosker, Hans Rutger; Quené, Hugo; de Jong, Nivja H. – Language Testing, 2014

Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal,…

Descriptors: Native Speakers, Language Fluency, Suprasegmentals, Second Language Learning

Language Tests	9
Foreign Countries	8
English (Second Language)	7
Second Language Learning	7
Test Reliability	7
Undergraduate Students	6
Interrater Reliability	5
Test Items	5
College Students	4
Correlation	4
Language Proficiency	4
Reliability	4
Test Construction	4
College Entrance Examinations	3
Comparative Analysis	3
Evaluators	3
Foreign Students	3
High Stakes Tests	3
Item Response Theory	3
Scores	3
Test Format	3
Test Validity	3
Asians	2
English	2
English for Academic Purposes	2
More ▼

Shin, Sun-Young	2
Amezcua, Angelica	1
Attali, Yigal	1
Beaudrie, Sara	1
Bosker, Hans Rutger	1
Bridgeman, Brent	1
Cakir, Abdulvahit	1
Cho, Yeonsuk	1
Culligan, Brent	1
DiPietro, Stephen	1
Emma Marsden	1
Esmat Babaii	1
Farshad Effatpanah	1
Huiying Cai	1
Iasonas Lamprianou	1
Junlan Pan	1
Kyriakou, Nansia	1
Lamprianou, Iasonas	1
Lee, Senyung	1
Lewis, Will	1
Liao, Ray J. T.	1
Lidster, Ryan	1
Loza, Sergio	1
Mona Tabatabaee-Yazdi	1
Park, Yena	1
More ▼