ERIC Number: ED606720
Record Type: Non-Journal
Publication Date: 2012-Oct
Pages: 46
Abstractor: ERIC
ISBN: N/A
ISSN: N/A
EISSN: N/A
Findings from the 2011 West Virginia Online Writing Scoring Comparability Study
Hixson, Nate; Rhudy, Vaughn
West Virginia Department of Education
To provide an opportunity for teachers to better understand the automated scoring process used by the state of West Virginia on our annual West Virginia Educational Standards Test 2 (WESTEST 2) Online Writing Assessment, the West Virginia Department of Education (WVDE) Office of Assessment and Accountability and the Office of Research conduct an annual comparability study. Each year educators from throughout West Virginia receive training from the Office of Assessment and Accountability and then hand score randomly selected student compositions. The educators' hand scores are then compared to the operational computer (engine) scores, and the comparability of the two scoring methods is examined. A scoring group made up of 43 participants representing all eight regions scored a randomly selected set of student essays using the appropriate grade-level West Virginia (WV) Writing Rubrics. A total of 2,550 essays were each scored by two different human scorers to allow for comparison of human-to-human scores as well as human-to-engine scores. The authors first sought to determine the extent to which human scorers calibrated their scoring process to align with the automated scoring engine via a series of training papers. They found that calibration was generally quite good in Grades 5-11, but there was room for improvement in Grades 3 and 4. They also found that calibration rates were relatively static across the set of training papers. The authors next sought to determine the comparability of human-to-human and human-to-engine agreement rates. They examined both exact and exact/adjacent agreement rates (i.e., scores that were exactly matched or within 1 point of each other on a 6-point scale). Looking at well-calibrated human scorers, the analyses showed that, with few exceptions, both exact and exact/adjacent agreement rates were comparable for the human-to-human and human-to-engine pairs. Finally, the authors examined the average essay scores assigned by the automated scoring engine and those assigned by a sufficiently calibrated human scorer. Their analyses revealed that for four of the available grade levels there were no significant differences. The authors recommend improving the calibration process; examining new measures of calibration among scorers to assist in interpreting results; using multiple and different measures to examine agreement between scoring methods; and adding a qualitative research component to next year's online writing comparability study to examine teacher outcomes.
Descriptors: Writing Tests, Computer Assisted Testing, Automation, Scoring, Interrater Reliability, Essays, Elementary Secondary Education
West Virginia Department of Education. 1900 Kanawha Boulevard East, Charleston, WV 25305. Tel: 304-558-3660; Fax: 304-558-0198; Web site: http://wvde.state.wv.us
Publication Type: Reports - Research
Education Level: Elementary Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: West Virginia Department of Education (WVDE), Office of Research
Identifiers - Location: West Virginia
Grant or Contract Numbers: N/A