Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 2 |
Since 2016 (last 10 years) | 4 |
Since 2006 (last 20 years) | 19 |
Descriptor
Item Response Theory | 19 |
Test Items | 12 |
Models | 9 |
Computation | 6 |
Guessing (Tests) | 6 |
Scores | 6 |
Simulation | 6 |
Difficulty Level | 5 |
Error of Measurement | 5 |
Accuracy | 4 |
Comparative Analysis | 4 |
More ▼ |
Source
Educational and Psychological… | 7 |
Applied Psychological… | 4 |
International Journal of… | 3 |
Journal of Educational… | 2 |
Applied Measurement in… | 1 |
Educational Assessment | 1 |
Structural Equation Modeling:… | 1 |
Author
DeMars, Christine E. | 19 |
Jurich, Daniel P. | 3 |
Socha, Alan | 3 |
Wise, Steven L. | 2 |
Goodman, Joshua T. | 1 |
Lau, Abigail | 1 |
Phan, Ha | 1 |
Waterbury, Glenn Thomas | 1 |
Zilberberg, Anna | 1 |
Publication Type
Journal Articles | 19 |
Reports - Research | 15 |
Reports - Descriptive | 3 |
Reports - Evaluative | 1 |
Education Level
Elementary Education | 1 |
Grade 4 | 1 |
Higher Education | 1 |
Intermediate Grades | 1 |
Postsecondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Trends in International… | 1 |
What Works Clearinghouse Rating
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Waterbury, Glenn Thomas; DeMars, Christine E. – Educational Assessment, 2021
Vertical scaling is used to put tests of different difficulty onto a common metric. The Rasch model is often used to perform vertical scaling, despite its strict functional form. Few, if any, studies have examined anchor item choice when using the Rasch model to vertically scale data that do not fit the model. The purpose of this study was to…
Descriptors: Test Items, Equated Scores, Item Response Theory, Scaling
DeMars, Christine E. – Educational and Psychological Measurement, 2019
Previous work showing that revised parallel analysis can be effective with dichotomous items has used a two-parameter model and normally distributed abilities. In this study, both two- and three-parameter models were used with normally distributed and skewed ability distributions. Relatively minor skew and kurtosis in the underlying ability…
Descriptors: Item Analysis, Models, Error of Measurement, Item Response Theory
DeMars, Christine E. – Educational and Psychological Measurement, 2016
Partially compensatory models may capture the cognitive skills needed to answer test items more realistically than compensatory models, but estimating the model parameters may be a challenge. Data were simulated to follow two different partially compensatory models, a model with an interaction term and a product model. The model parameters were…
Descriptors: Item Response Theory, Models, Thinking Skills, Test Items
DeMars, Christine E.; Jurich, Daniel P. – Educational and Psychological Measurement, 2015
In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data…
Descriptors: Test Bias, Guessing (Tests), Ability, Differences
Socha, Alan; DeMars, Christine E. – Applied Psychological Measurement, 2013
The software program DIMTEST can be used to assess the unidimensionality of item scores. The software allows the user to specify a guessing parameter. Using simulated data, the effects of guessing parameter specification for use with the ATFIND procedure for empirically deriving the Assessment Subtest (AT; that is, a subtest composed of items that…
Descriptors: Item Response Theory, Computer Software, Guessing (Tests), Simulation
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
DeMars, Christine E.; Jurich, Daniel P. – Applied Psychological Measurement, 2012
The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…
Descriptors: Computer Software, Item Response Theory, Scaling, Equated Scores
DeMars, Christine E. – International Journal of Testing, 2013
This tutorial addresses possible sources of confusion in interpreting trait scores from the bifactor model. The bifactor model may be used when subscores are desired, either for formative feedback on an achievement test or for theoretically different constructs on a psychological test. The bifactor model is often chosen because it requires fewer…
Descriptors: Test Interpretation, Scores, Models, Correlation
DeMars, Christine E. – Applied Psychological Measurement, 2012
A testlet is a cluster of items that share a common passage, scenario, or other context. These items might measure something in common beyond the trait measured by the test as a whole; if so, the model for the item responses should allow for this testlet trait. But modeling testlet effects that are negligible makes the model unnecessarily…
Descriptors: Test Items, Item Response Theory, Comparative Analysis, Models
Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Descriptors: Sample Size, Test Length, Correlation, Test Format
Jurich, Daniel P.; DeMars, Christine E.; Goodman, Joshua T. – Applied Psychological Measurement, 2012
The prevalence of high-stakes test scores as a basis for significant decisions necessitates the dissemination of accurate and fair scores. However, the magnitude of these decisions has created an environment in which examinees may be prone to resort to cheating. To reduce the risk of cheating, multiple test forms are commonly administered. When…
Descriptors: High Stakes Tests, Scores, Prevention, Cheating
DeMars, Christine E.; Lau, Abigail – Educational and Psychological Measurement, 2011
There is a long history of differential item functioning (DIF) detection methods for known, manifest grouping variables, such as sex or ethnicity. But if the experiences or cognitive processes leading to DIF are not perfectly correlated with the manifest groups, it would be more informative to uncover the latent groups underlying DIF. The use of…
Descriptors: Test Bias, Accuracy, Item Response Theory, Models
DeMars, Christine E. – Structural Equation Modeling: A Multidisciplinary Journal, 2012
In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…
Descriptors: Item Response Theory, Structural Equation Models, Computation, Computer Software
DeMars, Christine E. – Educational and Psychological Measurement, 2008
The graded response (GR) and generalized partial credit (GPC) models do not imply that examinees ordered by raw observed score will necessarily be ordered on the expected value of the latent trait (OEL). Factors were manipulated to assess whether increased violations of OEL also produced increased Type I error rates in differential item…
Descriptors: Test Items, Raw Scores, Test Theory, Error of Measurement
Previous Page | Next Page ยป
Pages: 1 | 2