ERIC Number: ED576706
Record Type: Non-Journal
Publication Date: 2012-Apr
Pages: 45
Abstractor: ERIC
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model
Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet
Pearson, Paper presented at the Annual Meeting of the National Council on Measurement in Education (Vancouver, British Columbia, Canada)
Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive testing (CAT) and test pre-equating. In practice, item parameter invariance can be threatened by a number of factors including context effects, item position effects, instructional effects, variable sample sizes, and other sources of item parameter drift that are not formally modeled in IRT applications. In their 2009 paper, Meyers et al. modeled the impact of item position change (IPC) on changes in Rasch Item Difficulty using the scaling and equating procedures employed by one specific K-12 testing program. This study helped assess the practical impact of IPC between test administrations. However, as they acknowledged, the generalizability of their findings is unclear because the study evaluated "only one large K-12 testing program that utilize[d] a particular measurement model, equating procedures, test administration procedures (i.e., an untimed test), and test construction procedures." This study extends the Meyers et al. 2009 study by investigating the impact of IPC in the context of operational testing programs that employ the 3PL model, alternative equating procedures, and different item re-use policies. The purpose is to expand upon the previous research by investigating the impact of IPC, sample size, subject area, grade, elapsed time between item uses, and number of previous uses on change in the IRT a-, b-, and c-parameters as well as the resulting D[superscript 2] statistic (Murphy et al., 2010) defined as the weighted sum of the squared deviation between the Item Characteristic Curves. The variables included in this investigation were selected because they are often ones influenced by policies that may constrain test construction and administration in large-scale assessment programs. This study confirmed that changing positions between administrations has a negative impact on the measurement properties of a test. When items shift substantially, some percentage of students can be classified into the wrong achievement level. In high-stakes testing in particular, like the assessment programs modeled in this study, this poses a large risk.
Descriptors: Equated Scores, Test Items, Test Format, Item Response Theory, Difficulty Level, Testing Programs, Sample Size, Change, Time, Regression (Statistics), Standardized Tests, Criterion Referenced Tests, Elementary Secondary Education, Instructional Program Divisions, Academic Achievement, Intellectual Disciplines, Reading Tests, Mathematics Tests, Writing Tests, Science Tests, Social Studies
Pearson. One Lake Street, Upper Saddle River, New Jersey 07458. Tel: 800-848-9500; Web site: http://www.pearsoned.com/
Publication Type: Speeches/Meeting Papers; Reports - Research; Numerical/Quantitative Data
Education Level: Elementary Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Pearson
Grant or Contract Numbers: N/A
Author Affiliations: N/A