An Evaluation of "Intentional" Weighting of Extended-Response or Constructed-Response Items in Tests with Mixed Item Types.

Ito, Kyoko; Sykes, Robert C.

This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single extended response writing prompt is "intentionally" or "purposefully" weighted twice in computing student scores (ER context) and one in which a set of constructed response items, including one extended response item, is intentionally weighted twice (CR context). The weighting option was compared with the use of no weighting. In either context, the criterion for the two options (weighting and no weighting with a shorter form) was the administration of twice as many items as the items that are deliberately weighted, combined with no use of weighting. The three options were compared in terms of student scores as well as raw-score-to-scale score conversion tables. The state uses number-correct scoring as opposed to pattern scoring. Either intentionally weighted or un-weighted scores on the shorter form are, on average, very comparable to the criterion un-weighted scores from the longer form. On the level of individual student scores, as compared with the un-weighted shorter-form scores, more of the weighted shorter-form scores (2-5% more in the ER context and 1-9% more in the CR context) differ from the target scores by more than 10 points. The ramifications of the small decreases in individual score accuracy associated with purposeful weighting would depend on the purposes for which the scores are used and other factors. However, it is clear that purposeful weighting can never compensate for the loss of score accuracy caused by the shorter length of an actual test taken. Two appendixes contain a discussion of the item response models used in the study and a sample graph for one test item. (Contains 13 tables, 24 figures, and 12 references.) (Author/SLD)