Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 7 |
Since 2006 (last 20 years) | 15 |
Descriptor
Evaluators | 21 |
Scoring | 21 |
Writing Tests | 21 |
Essays | 11 |
Language Tests | 8 |
Second Language Learning | 8 |
Computer Assisted Testing | 7 |
Correlation | 6 |
English (Second Language) | 6 |
Scores | 6 |
Writing Evaluation | 6 |
More ▼ |
Source
ETS Research Report Series | 5 |
Language Testing | 3 |
Applied Measurement in… | 2 |
Contemporary Issues in… | 1 |
Educational and Psychological… | 1 |
JALT CALL Journal | 1 |
ProQuest LLC | 1 |
Scandinavian Journal of… | 1 |
Author
Attali, Yigal | 2 |
Engelhard, George, Jr. | 2 |
Wolfe, Edward W. | 2 |
Arnold, Voiza | 1 |
Baker, Eva L. | 1 |
Blanchard, Daniel | 1 |
Breyer, F. Jay | 1 |
Bridgeman, Brent | 1 |
Brown, Michelle Stallone | 1 |
Brunfaut, Tineke | 1 |
Buzick, Heather | 1 |
More ▼ |
Publication Type
Reports - Research | 18 |
Journal Articles | 14 |
Speeches/Meeting Papers | 4 |
Reports - Evaluative | 2 |
Tests/Questionnaires | 2 |
Dissertations/Theses -… | 1 |
Education Level
Higher Education | 3 |
Postsecondary Education | 3 |
Secondary Education | 2 |
Elementary Education | 1 |
Elementary Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Test of English as a Foreign… | 5 |
Graduate Record Examinations | 1 |
National Assessment of… | 1 |
What Works Clearinghouse Rating
Wang, Jue; Engelhard, George, Jr. – Educational and Psychological Measurement, 2019
The purpose of this study is to explore the use of unfolding models for evaluating the quality of ratings obtained in rater-mediated assessments. Two different judgmental processes can be used to conceptualize ratings: impersonal judgments and personal preferences. Impersonal judgments are typically expected in rater-mediated assessments, and…
Descriptors: Evaluative Thinking, Preferences, Evaluators, Models
Lestari, Santi B.; Brunfaut, Tineke – Language Testing, 2023
Assessing integrated reading-into-writing task performances is known to be challenging, and analytic rating scales have been found to better facilitate the scoring of these performances than other common types of rating scales. However, little is known about how specific operationalizations of the reading-into-writing construct in analytic rating…
Descriptors: Reading Writing Relationship, Writing Tests, Rating Scales, Writing Processes
Jølle, Lennart; Skar, Gustaf B. – Scandinavian Journal of Educational Research, 2020
This paper reports findings from a project called "The National Panel of Raters" (NPR) that took place within a writing test programme in Norway (2010-2016). A recent research project found individual differences between the raters in the NPR. This paper reports results from an explorative follow up-study where 63 NPR members were…
Descriptors: Foreign Countries, Validity, Scoring, Program Descriptions
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Trace, Jonathan; Janssen, Gerriet; Meier, Valerie – Language Testing, 2017
Previous research in second language writing has shown that when scoring performance assessments even trained raters can exhibit significant differences in severity. When raters disagree, using discussion to try to reach a consensus is one popular form of score resolution, particularly in contexts with limited resources, as it does not require…
Descriptors: Performance Based Assessment, Second Language Learning, Scoring, Evaluators
Buzick, Heather; Oliveri, Maria Elena; Attali, Yigal; Flor, Michael – Applied Measurement in Education, 2016
Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K-12 large-scale assessment. In this…
Descriptors: Essays, Learning Disabilities, Attention Deficit Hyperactivity Disorder, Scoring
Heidari, Jamshid; Khodabandeh, Farzaneh; Soleimani, Hassan – JALT CALL Journal, 2018
The emergence of computer technology in English language teaching has paved the way for teachers' application of Mobile Assisted Language Learning (mall) and its advantages in teaching. This study aimed to compare the effectiveness of the face to face instruction with Telegram mobile instruction. Based on a toefl test, 60 English foreign language…
Descriptors: Comparative Analysis, Conventional Instruction, Teaching Methods, Computer Assisted Instruction
Attali, Yigal; Sinharay, Sandip – ETS Research Report Series, 2015
The "e-rater"® automated essay scoring system is used operationally in the scoring of "TOEFL iBT"® independent and integrated tasks. In this study we explored the psychometric added value of reporting four trait scores for each of these two tasks, beyond the total e-rater score.The four trait scores are word choice, grammatical…
Descriptors: Writing Tests, Scores, Language Tests, English (Second Language)
Blanchard, Daniel; Tetreault, Joel; Higgins, Derrick; Cahill, Aoife; Chodorow, Martin – ETS Research Report Series, 2013
This report presents work on the development of a new corpus of non-native English writing. It will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring. In this report, the corpus is described in detail.
Descriptors: Language Tests, Second Language Learning, English (Second Language), Writing Tests
Ramineni, Chaitanya; Trapani, Catherine S.; Williamson, David M.; Davey, Tim; Bridgeman, Brent – ETS Research Report Series, 2012
Scoring models for the "e-rater"® system were built and evaluated for the "TOEFL"® exam's independent and integrated writing prompts. Prompt-specific and generic scoring models were built, and evaluation statistics, such as weighted kappas, Pearson correlations, standardized differences in mean scores, and correlations with…
Descriptors: Scoring, Prompting, Evaluators, Computer Software
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Eckes, Thomas – Language Testing, 2008
Research on rater effects in language performance assessments has provided ample evidence for a considerable degree of variability among raters. Building on this research, I advance the hypothesis that experienced raters fall into types or classes that are clearly distinguishable from one another with respect to the importance they attach to…
Descriptors: Performance Based Assessment, Language Tests, Measures (Individuals), Scoring
Jeffery, Jill V. – ProQuest LLC, 2010
"Voice" is widely considered to be a feature of effective writing. It's no surprise, then, that voice criteria frequently appear on rubrics used to score student essays in large-scale writing assessments. However, composition theorists hold vastly different views regarding voice and how it should be applied in the evaluation of student writing, if…
Descriptors: Expository Writing, Evaluators, Writing Evaluation, Writing Tests
Wang, Jinhao; Brown, Michelle Stallone – Contemporary Issues in Technology and Teacher Education (CITE Journal), 2008
The purpose of the current study was to analyze the relationship between automated essay scoring (AES) and human scoring in order to determine the validity and usefulness of AES for large-scale placement tests. Specifically, a correlational research design was used to examine the correlations between AES performance and human raters' performance.…
Descriptors: Scoring, Essays, Computer Assisted Testing, Sentence Structure
Lee, Yong-Won; Gentile, Claudia; Kantor, Robert – ETS Research Report Series, 2008
The main purpose of the study was to investigate the distinctness and reliability of analytic (or multitrait) rating dimensions and their relationships to holistic scores and "e-rater"® essay feature variables in the context of the TOEFL® computer-based test (CBT) writing assessment. Data analyzed in the study were analytic and holistic…
Descriptors: English (Second Language), Language Tests, Second Language Learning, Scoring
Previous Page | Next Page »
Pages: 1 | 2