NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign…1
What Works Clearinghouse Rating
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021
Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…
Descriptors: Language Tests, German, Second Languages, Writing Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020
This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…
Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions
Peer reviewed Peer reviewed
Direct linkDirect link
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Jiuliang – Language Assessment Quarterly, 2018
In language testing programs, different test forms are often used to administer the same test. Demonstrating the comparability of these forms is essential to avoid criticisms of potential test unfairness. However, studies with this objective are scarce. This study aims to investigate the extent to which the picture-prompt writing tasks of three…
Descriptors: Writing Tests, Language Tests, Check Lists, Culture Fair Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015
Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…
Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices
Peer reviewed Peer reviewed
Direct linkDirect link
Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015
In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…
Descriptors: Writing Tests, Generalization, Scores, Writing Instruction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018
The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…
Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
He, Qingping; Anwyll, Steve; Glanville, Matthew; Deavall, Angela – Educational Research, 2013
Background: Although there has been considerable research into the reliability of marking for the Key Stage 3 (KS3) National Curriculum tests (NCTs) and public examinations such as the General Certificate of Secondary Education examinations (GCSEs) in England, little is understood about the level of reliability of marking of the Key Stage 2 (KS2)…
Descriptors: National Curriculum, Foreign Countries, Writing Skills, Writing Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests