Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 14 |
Descriptor
Reliability | 14 |
Language Tests | 8 |
English (Second Language) | 7 |
Scores | 7 |
Second Language Learning | 6 |
Correlation | 5 |
Evaluators | 5 |
Scoring | 5 |
Foreign Countries | 4 |
Comparative Analysis | 3 |
English | 3 |
More ▼ |
Source
Language Testing | 14 |
Author
Lin, Chih-Kai | 2 |
Amezcua, Angelica | 1 |
Attali, Yigal | 1 |
Beaudrie, Sara | 1 |
Cai, Yuyang | 1 |
Gierl, Mark J. | 1 |
Huiying Cai | 1 |
Janssen, Gerriet | 1 |
Jarvis, Scott | 1 |
Kilpatrick, Cynthia D | 1 |
Kunnan, Antony John | 1 |
More ▼ |
Publication Type
Journal Articles | 14 |
Reports - Research | 12 |
Reports - Evaluative | 2 |
Education Level
Higher Education | 5 |
Postsecondary Education | 4 |
Elementary Secondary Education | 1 |
High Schools | 1 |
Audience
Location
China | 1 |
Colombia | 1 |
Finland | 1 |
South Korea | 1 |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 1 |
What Works Clearinghouse Rating
Li, Minzi; Zhang, Xian – Language Testing, 2021
This meta-analysis explores the correlation between self-assessment (SA) and language performance. Sixty-seven studies with 97 independent samples involving more than 68,500 participants were included in our analysis. It was found that the overall correlation between SA and language performance was 0.466 (p < 0.01). Moderator analysis was…
Descriptors: Meta Analysis, Self Evaluation (Individuals), Likert Scales, Research Reports
Huiying Cai; Xun Yan – Language Testing, 2024
Rater comments tend to be qualitatively analyzed to indicate raters' application of rating scales. This study applied natural language processing (NLP) techniques to quantify meaningful, behavioral information from a corpus of rater comments and triangulated that information with a many-facet Rasch measurement (MFRM) analysis of rater scores. The…
Descriptors: Natural Language Processing, Item Response Theory, Rating Scales, Writing Evaluation
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Olson, Daniel J. – Language Testing, 2023
Measuring language dominance, broadly defined as the relative strength of each of a bilingual's two languages, remains a crucial methodological issue in bilingualism research. While various methods have been proposed, the Bilingual Language Profile (BLP) has been one of the most widely used tools for measuring language dominance. While previous…
Descriptors: Bilingualism, Language Dominance, Native Language, Second Language Learning
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Cai, Yuyang; Kunnan, Antony John – Language Testing, 2020
An essential hypothesis of modern language assessment theory pertains to the interaction between strategy use ability (strategic competence) and second language knowledge. However, how they interact with each other is rarely explored. Drawing on relevant research in the literature, in this paper we proposed three interaction patterns (i.e.,…
Descriptors: English (Second Language), Second Language Learning, Nursing Education, Reading Tests
Attali, Yigal; Lewis, Will; Steier, Michael – Language Testing, 2013
Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This…
Descriptors: Scoring, Essay Tests, Reliability, High Stakes Tests
Tommerdahl, Jodi; Kilpatrick, Cynthia D – Language Testing, 2014
It is currently unclear to what extent a spontaneous language sample of a given number of utterances is representative of a child's ability in morphology and syntax. This lack of information about the regularity of children's linguistic productions and the reliability of spontaneous language samples have serious implications for language…
Descriptors: Morphology (Languages), Young Children, Morphemes, Syntax
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Beaudrie, Sara; Amezcua, Angelica; Loza, Sergio – Language Testing, 2019
Critical language awareness (CLA) is increasingly identified as a central component of the Spanish heritage language (SHL) classroom (Leeman, 2005; MartÃnez, 2003; among others). As a minority language, SHL is subject to sociopolitical, cultural, and economic forces that devalue its status. It is devalued in the eyes of the public, as a legitimate…
Descriptors: Metalinguistics, Heritage Education, Spanish, Second Language Learning
Shin, Sun-Young; Lidster, Ryan – Language Testing, 2017
In language programs, it is crucial to place incoming students into appropriate levels to ensure that course curriculum and materials are well targeted to their learning needs. Deciding how and where to set cutscores on placement tests is thus of central importance to programs, but previous studies in educational measurement disagree as to which…
Descriptors: Language Tests, English (Second Language), Standard Setting (Scoring), Student Placement
Lin, Chih-Kai; Zhang, Jinming – Language Testing, 2014
Research on the relationship between English language proficiency standards and academic content standards serves to provide information about the extent to which English language learners (ELLs) are expected to encounter academic language use that facilitates their content learning, such as in mathematics and science. Standards-to-standards…
Descriptors: Language Proficiency, Academic Standards, Generalizability Theory, English Language Learners
Lee, HyeSun; Winke, Paula – Language Testing, 2013
We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed…
Descriptors: Academic Ability, Stakeholders, Reliability, Listening Comprehension Tests
McCarthy, Philip M.; Jarvis, Scott – Language Testing, 2007
A reliable index of lexical diversity (LD) has remained stubbornly elusive for over 60 years. Meanwhile, researchers in fields as varied as "stylistics," "neuropathology," "language acquisition," and even "forensics" continue to use flawed LD indices--often ignorant that their results are questionable and in…
Descriptors: Second Language Learning, English (Second Language), Foreign Countries, Adolescents