Findings from the 2011 West Virginia Online Writing Scoring Comparability Study.

Hixson, Nate; Rhudy, Vaughn

Notes FAQ Contact Us

Back to results

Download full text

ERIC Number: ED606720

Record Type: Non-Journal

Publication Date: 2012-Oct

Pages: 46

Abstractor: ERIC

ISBN: N/A

ISSN: N/A

EISSN: N/A

Findings from the 2011 West Virginia Online Writing Scoring Comparability Study

Hixson, Nate; Rhudy, Vaughn

West Virginia Department of Education

To provide an opportunity for teachers to better understand the automated scoring process used by the state of West Virginia on our annual West Virginia Educational Standards Test 2 (WESTEST 2) Online Writing Assessment, the West Virginia Department of Education (WVDE) Office of Assessment and Accountability and the Office of Research conduct an annual comparability study. Each year educators from throughout West Virginia receive training from the Office of Assessment and Accountability and then hand score randomly selected student compositions. The educators' hand scores are then compared to the operational computer (engine) scores, and the comparability of the two scoring methods is examined. A scoring group made up of 43 participants representing all eight regions scored a randomly selected set of student essays using the appropriate grade-level West Virginia (WV) Writing Rubrics. A total of 2,550 essays were each scored by two different human scorers to allow for comparison of human-to-human scores as well as human-to-engine scores. The authors first sought to determine the extent to which human scorers calibrated their scoring process to align with the automated scoring engine via a series of training papers. They found that calibration was generally quite good in Grades 5-11, but there was room for improvement in Grades 3 and 4. They also found that calibration rates were relatively static across the set of training papers. The authors next sought to determine the comparability of human-to-human and human-to-engine agreement rates. They examined both exact and exact/adjacent agreement rates (i.e., scores that were exactly matched or within 1 point of each other on a 6-point scale). Looking at well-calibrated human scorers, the analyses showed that, with few exceptions, both exact and exact/adjacent agreement rates were comparable for the human-to-human and human-to-engine pairs. Finally, the authors examined the average essay scores assigned by the automated scoring engine and those assigned by a sufficiently calibrated human scorer. Their analyses revealed that for four of the available grade levels there were no significant differences. The authors recommend improving the calibration process; examining new measures of calibration among scorers to assist in interpreting results; using multiple and different measures to examine agreement between scoring methods; and adding a qualitative research component to next year's online writing comparability study to examine teacher outcomes.

Descriptors: Writing Tests, Computer Assisted Testing, Automation, Scoring, Interrater Reliability, Essays, Elementary Secondary Education

West Virginia Department of Education. 1900 Kanawha Boulevard East, Charleston, WV 25305. Tel: 304-558-3660; Fax: 304-558-0198; Web site: http://wvde.state.wv.us

Publication Type: Reports - Research

Education Level: Elementary Secondary Education

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: West Virginia Department of Education (WVDE), Office of Research

Identifiers - Location: West Virginia

Grant or Contract Numbers: N/A