Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 6 |
Since 2006 (last 20 years) | 10 |
Descriptor
Evaluators | 10 |
Foreign Countries | 10 |
Writing Tests | 10 |
Language Tests | 5 |
Second Language Learning | 5 |
Writing Evaluation | 5 |
English (Second Language) | 4 |
Interrater Reliability | 4 |
Scores | 4 |
Scoring | 4 |
Comparative Analysis | 3 |
More ▼ |
Source
Language Testing | 3 |
Assessment in Education:… | 1 |
ETS Research Report Series | 1 |
Educational Research | 1 |
International Journal of… | 1 |
JALT CALL Journal | 1 |
Language Assessment Quarterly | 1 |
Scandinavian Journal of… | 1 |
Author
Jølle, Lennart | 2 |
Anwyll, Steve | 1 |
Bouwer, Renske | 1 |
Breyer, F. Jay | 1 |
Béguin, Anton | 1 |
Deavall, Angela | 1 |
Eckes, Thomas | 1 |
Glanville, Matthew | 1 |
He, Qingping | 1 |
Heidari, Jamshid | 1 |
Janssen, Gerriet | 1 |
More ▼ |
Publication Type
Journal Articles | 10 |
Reports - Research | 10 |
Tests/Questionnaires | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Lamprianou, Iasonas; Tsagari, Dina; Kyriakou, Nansia – Language Testing, 2021
This longitudinal study (2002-2014) investigates the stability of rating characteristics of a large group of raters over time in the context of the writing paper of a national high-stakes examination. The study uses one measure of rater severity and two measures of rater consistency. The results suggest that the rating characteristics of…
Descriptors: Longitudinal Studies, Evaluators, High Stakes Tests, Writing Evaluation
Eckes, Thomas; Jin, Kuan-Yu – International Journal of Testing, 2021
Severity and centrality are two main kinds of rater effects posing threats to the validity and fairness of performance assessments. Adopting Jin and Wang's (2018) extended facets modeling approach, we separately estimated the magnitude of rater severity and centrality effects in the web-based TestDaF (Test of German as a Foreign Language) writing…
Descriptors: Language Tests, German, Second Languages, Writing Tests
Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020
This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…
Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Li, Jiuliang – Language Assessment Quarterly, 2018
In language testing programs, different test forms are often used to administer the same test. Demonstrating the comparability of these forms is essential to avoid criticisms of potential test unfairness. However, studies with this objective are scarce. This study aims to investigate the extent to which the picture-prompt writing tasks of three…
Descriptors: Writing Tests, Language Tests, Check Lists, Culture Fair Tests
Jølle, Lennart – Assessment in Education: Principles, Policy & Practice, 2015
Novice members of a Norwegian national rater panel tasked with assessing Year 8 pupils' written texts were studied during three successive preparation sessions (2011-2012). The purpose was to investigate how the raters successfully make use of different decision-making strategies in an assessment situation where pre-set criteria and standards give…
Descriptors: Interrater Reliability, Writing Evaluation, Decision Making, Novices
Bouwer, Renske; Béguin, Anton; Sanders, Ted; van den Bergh, Huub – Language Testing, 2015
In the present study, aspects of the measurement of writing are disentangled in order to investigate the validity of inferences made on the basis of writing performance and to describe implications for the assessment of writing. To include genre as a facet in the measurement, we obtained writing scores of 12 texts in four different genres for each…
Descriptors: Writing Tests, Generalization, Scores, Writing Instruction
Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018
The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…
Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction
He, Qingping; Anwyll, Steve; Glanville, Matthew; Deavall, Angela – Educational Research, 2013
Background: Although there has been considerable research into the reliability of marking for the Key Stage 3 (KS3) National Curriculum tests (NCTs) and public examinations such as the General Certificate of Secondary Education examinations (GCSEs) in England, little is understood about the level of reliability of marking of the Key Stage 2 (KS2)…
Descriptors: National Curriculum, Foreign Countries, Writing Skills, Writing Tests
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests