ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	2
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	19

Descriptor

Item Response Theory	19
Test Items	12
Models	9
Computation	6
Guessing (Tests)	6
Scores	6
Simulation	6
Difficulty Level	5
Error of Measurement	5
Accuracy	4
Comparative Analysis	4
Computer Software	4
Correlation	4
Statistical Analysis	4
Test Bias	4
Equated Scores	3
Scaling	3
Statistical Bias	3
Ability	2
Educational Testing	2
Evaluation Methods	2
Goodness of Fit	2
Item Analysis	2
Reaction Time	2
Reliability	2
More ▼

Source

Educational and Psychological…	7
Applied Psychological…	4
International Journal of…	3
Journal of Educational…	2
Applied Measurement in…	1
Educational Assessment	1
Structural Equation Modeling:…	1

Author

DeMars, Christine E.	19
Jurich, Daniel P.	3
Socha, Alan	3
Wise, Steven L.	2
Goodman, Joshua T.	1
Lau, Abigail	1
Phan, Ha	1
Waterbury, Glenn Thomas	1
Zilberberg, Anna	1

Publication Type

Journal Articles	19
Reports - Research	15
Reports - Descriptive	3
Reports - Evaluative	1

Education Level

Elementary Education	1
Grade 4	1
Higher Education	1
Intermediate Grades	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…

What Works Clearinghouse Rating

Showing 1 to 15 of 19 results Save | Export

Violation of Conditional Independence in the Many-Facets Rasch Model

Peer reviewed

Direct link

DeMars, Christine E. – Applied Measurement in Education, 2021

Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…

Descriptors: Item Response Theory, Test Items, Ability, Scores

Anchors Aweigh: How the Choice of Anchor Items Affects the Vertical Scaling of 3PL Data with the Rasch Model

Peer reviewed

Direct link

Waterbury, Glenn Thomas; DeMars, Christine E. – Educational Assessment, 2021

Vertical scaling is used to put tests of different difficulty onto a common metric. The Rasch model is often used to perform vertical scaling, despite its strict functional form. Few, if any, studies have examined anchor item choice when using the Rasch model to vertically scale data that do not fit the model. The purpose of this study was to…

Descriptors: Test Items, Equated Scores, Item Response Theory, Scaling

Revised Parallel Analysis with Nonnormal Ability and a Guessing Parameter

Peer reviewed

Direct link

DeMars, Christine E. – Educational and Psychological Measurement, 2019

Previous work showing that revised parallel analysis can be effective with dichotomous items has used a two-parameter model and normally distributed abilities. In this study, both two- and three-parameter models were used with normally distributed and skewed ability distributions. Relatively minor skew and kurtosis in the underlying ability…

Descriptors: Item Analysis, Models, Error of Measurement, Item Response Theory

Partially Compensatory Multidimensional Item Response Theory Models: Two Alternate Model Forms

Peer reviewed

Direct link

DeMars, Christine E. – Educational and Psychological Measurement, 2016

Partially compensatory models may capture the cognitive skills needed to answer test items more realistically than compensatory models, but estimating the model parameters may be a challenge. Data were simulated to follow two different partially compensatory models, a model with an interaction term and a product model. The model parameters were…

Descriptors: Item Response Theory, Models, Thinking Skills, Test Items

The Interaction of Ability Differences and Guessing When Modeling Differential Item Functioning with the Rasch Model: Conventional and Tailored Calibration

Peer reviewed

Direct link

DeMars, Christine E.; Jurich, Daniel P. – Educational and Psychological Measurement, 2015

In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data…

Descriptors: Test Bias, Guessing (Tests), Ability, Differences

A Note on Specifying the Guessing Parameter in ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Applied Psychological Measurement, 2013

The software program DIMTEST can be used to assess the unidimensionality of item scores. The software allows the user to specify a guessing parameter. Using simulated data, the effects of guessing parameter specification for use with the ATFIND procedure for empirically deriving the Assessment Subtest (AT; that is, a subtest composed of items that…

Descriptors: Item Response Theory, Computer Software, Guessing (Tests), Simulation

Differential Item Functioning Detection with the Mantel-Haenszel Procedure: The Effects of Matching Types and Other Factors

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015

The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…

Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping

Software Note: Using BILOG for Fixed-Anchor Item Calibration

Peer reviewed

Direct link

DeMars, Christine E.; Jurich, Daniel P. – Applied Psychological Measurement, 2012

The nonequivalent groups anchor test (NEAT) design is often used to scale item parameters from two different test forms. A subset of items, called the anchor items or common items, are administered as part of both test forms. These items are used to adjust the item calibrations for any differences in the ability distributions of the groups taking…

Descriptors: Computer Software, Item Response Theory, Scaling, Equated Scores

A Tutorial on Interpreting Bifactor Model Scores

Peer reviewed

Direct link

DeMars, Christine E. – International Journal of Testing, 2013

This tutorial addresses possible sources of confusion in interpreting trait scores from the bifactor model. The bifactor model may be used when subscores are desired, either for formative feedback on an achievement test or for theoretically different constructs on a psychological test. The bifactor model is often chosen because it requires fewer…

Descriptors: Test Interpretation, Scores, Models, Correlation

Confirming Testlet Effects

Peer reviewed

Direct link

DeMars, Christine E. – Applied Psychological Measurement, 2012

A testlet is a cluster of items that share a common passage, scenario, or other context. These items might measure something in common beyond the trait measured by the test as a whole; if so, the model for the item responses should allow for this testlet trait. But modeling testlet effects that are negligible makes the model unnecessarily…

Descriptors: Test Items, Item Response Theory, Comparative Analysis, Models

An Investigation of Sample Size Splitting on ATFIND and DIMTEST

Peer reviewed

Direct link

Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013

Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…

Descriptors: Sample Size, Test Length, Correlation, Test Format

Investigating the Impact of Compromised Anchor Items on IRT Equating under the Nonequivalent Anchor Test Design

Peer reviewed

Direct link

Jurich, Daniel P.; DeMars, Christine E.; Goodman, Joshua T. – Applied Psychological Measurement, 2012

The prevalence of high-stakes test scores as a basis for significant decisions necessitates the dissemination of accurate and fair scores. However, the magnitude of these decisions has created an environment in which examinees may be prone to resort to cheating. To reduce the risk of cheating, multiple test forms are commonly administered. When…

Descriptors: High Stakes Tests, Scores, Prevention, Cheating

Differential Item Functioning Detection with Latent Classes: How Accurately Can We Detect Who Is Responding Differentially?

Peer reviewed

Direct link

DeMars, Christine E.; Lau, Abigail – Educational and Psychological Measurement, 2011

There is a long history of differential item functioning (DIF) detection methods for known, manifest grouping variables, such as sex or ethnicity. But if the experiences or cognitive processes leading to DIF are not perfectly correlated with the manifest groups, it would be more informative to uncover the latent groups underlying DIF. The use of…

Descriptors: Test Bias, Accuracy, Item Response Theory, Models

A Comparison of Limited-Information and Full-Information Methods in M"plus" for Estimating Item Response Theory Parameters for Nonnormal Populations

Peer reviewed

Direct link

DeMars, Christine E. – Structural Equation Modeling: A Multidisciplinary Journal, 2012

In structural equation modeling software, either limited-information (bivariate proportions) or full-information item parameter estimation routines could be used for the 2-parameter item response theory (IRT) model. Limited-information methods assume the continuous variable underlying an item response is normally distributed. For skewed and…

Descriptors: Item Response Theory, Structural Equation Models, Computation, Computer Software

Polytomous Differential Item Functioning and Violations of Ordering of the Expected Latent Trait by the Raw Score

Peer reviewed

Direct link

DeMars, Christine E. – Educational and Psychological Measurement, 2008

The graded response (GR) and generalized partial credit (GPC) models do not imply that examinees ordered by raw observed score will necessarily be ordered on the expected value of the latent trait (OEL). Factors were manipulated to assess whether increased violations of OEL also produced increased Type I error rates in differential item…

Descriptors: Test Items, Raw Scores, Test Theory, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2