Exploring the Relationship between Optimal Methods of Item Scoring and Selection and Predictive Validity. Conference Paper.

Benton, Tom

Notes FAQ Contact Us

Back to results

Direct link

ERIC Number: ED605444

Record Type: Non-Journal

Publication Date: 2018-Nov

Pages: 27

Abstractor: ERIC

ISBN: N/A

ISSN: ISSN-

EISSN: N/A

Exploring the Relationship between Optimal Methods of Item Scoring and Selection and Predictive Validity. Conference Paper

Benton, Tom

Cambridge Assessment, Paper presented at the Annual Conference of the Association for Educational Assessment in Europe (19th, Arnhem-Nijmegen, The Netherlands, Nov 2018)

One of the questions with the longest history in educational assessment is whether it is possible to increase the reliability of a test simply by altering the way in which scores on individual test items are combined to make the overall test score. Most usually, the score available on each item is communicated to the candidate within a question paper. The score they are awarded on the test as a whole is then calculated simply as the sum of these item scores. However, throughout the history of assessment, psychometricians have been tempted to try to improve on this simple and transparent process with the aim of making the resulting scores more reliable. The simplest way in which scoring might be altered is to assign a weight to each item and then create candidate scores as a weighted sum of item scores rather than a simple sum. The aim is that highly reliable, (presumably) higher quality items are given more weight than those that appear to have less relevance to the construct being measured. This goal seems reasonable as, after all, it is unlikely that all items are of exactly equal relevance and placing greater emphasis on candidate achievement on the most reliable items should improve the quality of the final scores. In other words, it should mean that candidates are more likely to be placed in the correct rank order in terms of their true abilities in some domain. One early attempt at this type of rescoring, though by no means the first, can be found in Guilford (1941). Throughout this paper the focus of analysis will be to evaluate the performance of different approaches against real data sets. In particular, the central question will be whether test scores derived in particular ways are better at predicting achievement more widely. Although not a pure measure of validity, which would require a bespoke and specific definition of what each test was trying to achieve along with an independent measure of whether this had been done, the focus upon predictive value does at least allow us to understand whether increases in (estimated) reliabilities actually translate into any meaningful gains when the test scores are used to attempt to predict something about students more widely. As such, it helps to avoid the pitfall noted by Kane and Case (2004) of aiming purely to maximise reliability.

Descriptors: Test Items, Scoring, Predictive Validity, Test Reliability, Item Response Theory, Test Construction

University of Cambridge Local Examinations Syndicate (Cambridge Assessment). The Triangle Building, Shaftesbury Road, Cambridge, CB2 8EA, UK. Tel: +44-1223-55331; Fax: +44-1223-460278; e-mail: info@cambridgeassessment.org.uk; Web site: https://www.cambridgeassessment.org.uk/

Publication Type: Speeches/Meeting Papers; Reports - Research

Education Level: N/A

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: Cambridge Assessment (United Kingdom)

Grant or Contract Numbers: N/A