NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 11 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A. – Language Testing, 2019
Differences in rater judgments that are systematically related to construct-irrelevant characteristics threaten the fairness of rater-mediated writing assessments. Accordingly, it is essential that researchers and practitioners examine the degree to which the psychometric quality of rater judgments is comparable across test-taker subgroups.…
Descriptors: Nonparametric Statistics, Interrater Reliability, Differences, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
van Batenburg, Eline S. L.; Oostdam, Ron J.; van Gelderen, Amos J. S.; de Jong, Nivja H. – Language Testing, 2018
This article explores ways to assess interactional performance, and reports on the use of a test format that standardizes the interlocutor's linguistic and interactional contributions to the exchange. It describes the construction and administration of six scripted speech tasks (instruction, advice, and sales tasks) with pre-vocational learners (n…
Descriptors: Second Language Learning, Speech Tests, Interaction, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wind, Stefanie A.; Peterson, Meghan E. – Language Testing, 2018
The use of assessments that require rater judgment (i.e., rater-mediated assessments) has become increasingly popular in high-stakes language assessments worldwide. Using a systematic literature review, the purpose of this study is to identify and explore the dominant methods for evaluating rating quality within the context of research on…
Descriptors: Language Tests, Evaluators, Evaluation Methods, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Longabach, Tanya; Peyton, Vicki – Language Testing, 2018
K-12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the…
Descriptors: Comparative Analysis, Test Reliability, Second Language Learning, Language Proficiency
Peer reviewed Peer reviewed
Direct linkDirect link
Davis, Larry – Language Testing, 2016
Two factors were investigated that are thought to contribute to consistency in rater scoring judgments: rater training and experience in scoring. Also considered were the relative effects of scoring rubrics and exemplars on rater performance. Experienced teachers of English (N = 20) scored recorded responses from the TOEFL iBT speaking test prior…
Descriptors: Evaluators, Oral Language, Scores, Language Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Kuiken, Folkert; Vedder, Ineke – Language Testing, 2017
The importance of functional adequacy as an essential component of L2 proficiency has been observed by several authors (Pallotti, 2009; De Jong, Steinel, Florijn, Schoonen, & Hulstijn, 2012a, b). The rationale underlying the present study is that the assessment of writing proficiency in L2 is not fully possible without taking into account the…
Descriptors: Second Language Learning, Rating Scales, Computational Linguistics, Persuasive Discourse
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Sun-Young; Lidster, Ryan – Language Testing, 2017
In language programs, it is crucial to place incoming students into appropriate levels to ensure that course curriculum and materials are well targeted to their learning needs. Deciding how and where to set cutscores on placement tests is thus of central importance to programs, but previous studies in educational measurement disagree as to which…
Descriptors: Language Tests, English (Second Language), Standard Setting (Scoring), Student Placement
Peer reviewed Peer reviewed
Direct linkDirect link
Jarvis, Scott – Language Testing, 2017
The present study discusses the relevance of measures of lexical diversity (LD) to the assessment of learner corpora. It also argues that existing measures of LD, many of which have become specialized for use with language corpora, are fundamentally measures of lexical repetition, are based on an etic perspective of language, and lack construct…
Descriptors: Computational Linguistics, English (Second Language), Second Language Learning, Native Speakers
Peer reviewed Peer reviewed
Direct linkDirect link
Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014
One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…
Descriptors: Second Language Learning, Language Tests, French, Language Teachers
Peer reviewed Peer reviewed
Direct linkDirect link
O'Toole, J. M.; King, R. A. R. – Language Testing, 2011
The "cloze" test is one possible investigative instrument for predicting text comprehensibility. Conceptual coding of student replacement of deleted words has been considered to be more valid than exact coding, partly because conceptual coding seemed fairer to poorer readers. This paper reports a quantitative study of 447 Australian…
Descriptors: Cloze Procedure, Test Results, Language Tests, Reading Comprehension
Peer reviewed Peer reviewed
Direct linkDirect link
Lee-Ellis, Sunyoung – Language Testing, 2009
Despite the importance of having a reliable and valid measure of Second Language (L2) proficiency, L2 researchers of less commonly taught languages rarely have such a tool. Existing proficiency measures (e.g., DLPT, OPI) are often costly, labor-intensive, time-consuming, or unavailable to the public. With the intent to provide a practical and…
Descriptors: Uncommonly Taught Languages, Test Validity, Korean, Test Construction