Publication Date
In 2025 | 1 |
Since 2024 | 4 |
Since 2021 (last 5 years) | 8 |
Since 2016 (last 10 years) | 11 |
Since 2006 (last 20 years) | 15 |
Descriptor
Source
Language Testing | 15 |
Author
Shin, Sun-Young | 2 |
Amezcua, Angelica | 1 |
Attali, Yigal | 1 |
Beaudrie, Sara | 1 |
Bosker, Hans Rutger | 1 |
Bridgeman, Brent | 1 |
Cakir, Abdulvahit | 1 |
Cho, Yeonsuk | 1 |
Culligan, Brent | 1 |
DiPietro, Stephen | 1 |
Emma Marsden | 1 |
More ▼ |
Publication Type
Journal Articles | 15 |
Reports - Research | 14 |
Reports - Evaluative | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 15 |
Postsecondary Education | 15 |
Adult Education | 1 |
Elementary Secondary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Reeta Neittaanmäki; Iasonas Lamprianou – Language Testing, 2024
This article focuses on rater severity and consistency and their relation to different types of rater experience over a long period of time. The article is based on longitudinal data collected from 2009 to 2019 from the second language Finnish speaking subtest in the National Certificates of Language Proficiency in Finland. The study investigated…
Descriptors: Foreign Countries, Interrater Reliability, Error of Measurement, Experience
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Junlan Pan; Emma Marsden – Language Testing, 2024
"Tests of Aptitude for Language Learning" (TALL) is an openly accessible internet-based battery to measure the multifaceted construct of foreign language aptitude, using language domain-specific instruments and L1-sensitive instructions and stimuli. This brief report introduces the components of this theory-informed battery and…
Descriptors: Language Tests, Aptitude Tests, Second Language Learning, Test Construction
Farshad Effatpanah; Purya Baghaei; Mona Tabatabaee-Yazdi; Esmat Babaii – Language Testing, 2025
This study aimed to propose a new method for scoring C-Tests as measures of general language proficiency. In this approach, the unit of analysis is sentences rather than gaps or passages. That is, the gaps correctly reformulated in each sentence were aggregated as sentence score, and then each sentence was entered into the analysis as a polytomous…
Descriptors: Item Response Theory, Language Tests, Test Items, Test Construction
Park, Yena; Lee, Senyung; Shin, Sun-Young – Language Testing, 2022
Despite consistent calls for authentic stimuli in listening tests for better construct representation, unscripted texts have been rarely adopted in high-stakes listening tests due to perceived inefficiency. This study details how a local academic listening test was developed using authentic unscripted audio-visual texts from the local target…
Descriptors: Listening Comprehension Tests, English for Academic Purposes, Test Construction, Foreign Students
Liao, Ray J. T. – Language Testing, 2023
Among the variety of selected response formats used in L2 reading assessment, multiple-choice (MC) is the most commonly adopted, primarily due to its efficiency and objectiveness. Given the impact of assessment results on teaching and learning, it is necessary to investigate the degree to which the MC format reliably measures learners' L2 reading…
Descriptors: Reading Tests, Language Tests, Second Language Learning, Second Language Instruction
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Toprak, Tugba Elif; Cakir, Abdulvahit – Language Testing, 2021
Cognitive diagnostic assessment (CDA) has been applied to language assessment in a number of studies in which a diagnostic classification model (DCM) was retrofitted to the results of a non-diagnostic assessment. However, the need to apply CDA through utilization of an inductive rather than a retrofitted approach has been a recurrent theme in…
Descriptors: English (Second Language), Second Language Learning, Undergraduate Students, Young Adults
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Beaudrie, Sara; Amezcua, Angelica; Loza, Sergio – Language Testing, 2019
Critical language awareness (CLA) is increasingly identified as a central component of the Spanish heritage language (SHL) classroom (Leeman, 2005; Martínez, 2003; among others). As a minority language, SHL is subject to sociopolitical, cultural, and economic forces that devalue its status. It is devalued in the eyes of the public, as a legitimate…
Descriptors: Metalinguistics, Heritage Education, Spanish, Second Language Learning
Shin, Sun-Young; Lidster, Ryan – Language Testing, 2017
In language programs, it is crucial to place incoming students into appropriate levels to ensure that course curriculum and materials are well targeted to their learning needs. Deciding how and where to set cutscores on placement tests is thus of central importance to programs, but previous studies in educational measurement disagree as to which…
Descriptors: Language Tests, English (Second Language), Standard Setting (Scoring), Student Placement
Culligan, Brent – Language Testing, 2015
This study compared three common vocabulary test formats, the Yes/No test, the Vocabulary Knowledge Scale (VKS), and the Vocabulary Levels Test (VLT), as measures of vocabulary difficulty. Vocabulary difficulty was defined as the item difficulty estimated through Item Response Theory (IRT) analysis. Three tests were given to 165 Japanese students,…
Descriptors: Language Tests, Test Format, Comparative Analysis, Vocabulary
Yan, Xun – Language Testing, 2014
This paper reports on a mixed-methods approach to evaluate rater performance on a local oral English proficiency test. Three types of reliability estimates were reported to examine rater performance from different perspectives. Quantitative results were also triangulated with qualitative rater comments to arrive at a more representative picture of…
Descriptors: Mixed Methods Research, Language Tests, Oral Language, Language Proficiency
Bridgeman, Brent; Cho, Yeonsuk; DiPietro, Stephen – Language Testing, 2016
Data from 787 international undergraduate students at an urban university in the United States were used to demonstrate the importance of separating a sample into meaningful subgroups in order to demonstrate the ability of an English language assessment to predict the first-year grade point average (GPA). For example, when all students were pooled…
Descriptors: Grade Prediction, English Curriculum, Language Tests, Undergraduate Students
Pinget, Anne-France; Bosker, Hans Rutger; Quené, Hugo; de Jong, Nivja H. – Language Testing, 2014
Oral fluency and foreign accent distinguish L2 from L1 speech production. In language testing practices, both fluency and accent are usually assessed by raters. This study investigates what exactly native raters of fluency and accent take into account when judging L2. Our aim is to explore the relationship between objectively measured temporal,…
Descriptors: Native Speakers, Language Fluency, Suprasegmentals, Second Language Learning