ERIC - Search Results

Publication Date

In 2025	0
Since 2024	4
Since 2021 (last 5 years)	38

Descriptor

Statistical Analysis	38
Test Items	38
Item Response Theory	19
Foreign Countries	9
Test Bias	9
Equated Scores	8
Error of Measurement	8
Models	8
Goodness of Fit	6
Scores	6
Comparative Analysis	5
Computation	5
Sample Size	5
Achievement Tests	4
Classification	4
Difficulty Level	4
International Assessment	4
Reaction Time	4
Testing	4
Cheating	3
Effect Size	3
High Stakes Tests	3
Identification	3
Licensing Examinations…	3
Measurement	3
More ▼

Source

Journal of Educational and…	7
Educational and Psychological…	6
International Journal of…	5
ProQuest LLC	4
Applied Measurement in…	3
Journal of Educational…	2
Measurement:…	2
Cognitive Research:…	1
Educational Measurement:…	1
Eurasian Journal of…	1
Grantee Submission	1
International Journal of…	1
Journal of Applied Research…	1
Large-scale Assessments in…	1
Practical Assessment,…	1
Research Papers in Education	1
More ▼

Publication Type

Journal Articles	33
Reports - Research	27
Reports - Descriptive	5
Dissertations/Theses -…	4
Reports - Evaluative	2

Education Level

Secondary Education	6
Higher Education	4
Postsecondary Education	4
Junior High Schools	3
Middle Schools	3
Elementary Education	2
Elementary Secondary Education	1
Grade 8	1

Audience

Location

Turkey	2
Ecuador	1
Florida	1
Hungary	1
Kazakhstan	1
Mexico	1
Netherlands (Amsterdam)	1
Peru	1
South Africa	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	3
Program for the International…	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 38 results Save | Export

Investigating Bifactor Models and Fit Indices for Unidimensionality: An Illustration with Method Effects Due to Item Wording

Direct link

Leighton, Elizabeth A. – ProQuest LLC, 2022

The use of unidimensional scales that contain both positively and negatively worded items is common in both the educational and psychological fields. However, dimensionality investigations of these instruments often lead to a rejection of the theorized unidimensional model in favor of multidimensional structures, leaving researchers at odds for…

Descriptors: Test Items, Language Usage, Models, Statistical Analysis

Reevaluating the SIBTEST Classification Heuristics for Dichotomous Differential Item Functioning

Peer reviewed

Direct link

Weese, James D.; Turner, Ronna C.; Ames, Allison; Crawford, Brandon; Liang, Xinya – Educational and Psychological Measurement, 2022

A simulation study was conducted to investigate the heuristics of the SIBTEST procedure and how it compares with ETS classification guidelines used with the Mantel-Haenszel procedure. Prior heuristics have been used for nearly 25 years, but they are based on a simulation study that was restricted due to computer limitations and that modeled item…

Descriptors: Test Bias, Heuristics, Classification, Statistical Analysis

Optimizing Diagnostic Classification Models Application Considering Real-Life Constraints

Peer reviewed

Direct link

Su, Kun; Henson, Robert A. – Journal of Educational and Behavioral Statistics, 2023

This article provides a process to carefully evaluate the suitability of a content domain for which diagnostic classification models (DCMs) could be applicable and then optimized steps for constructing a test blueprint for applying DCMs and a real-life example illustrating this process. The content domains were carefully evaluated using a set of…

Descriptors: Classification, Models, Science Tests, Physics

Evaluating the Effects of Missing Data Handling Methods on Scale Linking Accuracy

Peer reviewed

Direct link

Wu, Tong; Kim, Stella Y.; Westine, Carl – Educational and Psychological Measurement, 2023

For large-scale assessments, data are often collected with missing responses. Despite the wide use of item response theory (IRT) in many testing programs, however, the existing literature offers little insight into the effectiveness of various approaches to handling missing responses in the context of scale linking. Scale linking is commonly used…

Descriptors: Data Analysis, Responses, Statistical Analysis, Measurement

Reconceptualization of Coefficient Alpha Reliability for Test Summed and Scaled Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Educational Measurement: Issues and Practice, 2022

Coefficient alpha reliability persists as the most common reliability coefficient reported in research. The assumptions for its use are, however, not well-understood. The current paper challenges the commonly used expressions of coefficient alpha and argues that while these expressions are correct when estimating reliability for summed scores,…

Descriptors: Reliability, Scores, Scaling, Statistical Analysis

Use of the Lagrange Multiplier Test for Assessing Measurement Invariance under Model Misspecification

Peer reviewed

Direct link

Guastadisegni, Lucia; Cagnone, Silvia; Moustaki, Irini; Vasdekis, Vassilis – Educational and Psychological Measurement, 2022

This article studies the Type I error, false positive rates, and power of four versions of the Lagrange multiplier test to detect measurement noninvariance in item response theory (IRT) models for binary data under model misspecification. The tests considered are the Lagrange multiplier test computed with the Hessian and cross-product approach,…

Descriptors: Measurement, Statistical Analysis, Item Response Theory, Test Items

On the Generalized S-X[superscript 2]-Test of Item Fit: Some Variants, Residuals, and a Graphical Visualization

Peer reviewed

Direct link

Ranger, Jochen; Brauer, Kay – Journal of Educational and Behavioral Statistics, 2022

The generalized S-X[superscript 2]-test is a test of item fit for items with polytomous responses format. The test is based on a comparison of the observed and expected number of responses in strata defined by the test score. In this article, we make four contributions. We demonstrate that the performance of the generalized S-X[superscript 2]-test…

Descriptors: Goodness of Fit, Test Items, Statistical Analysis, Item Response Theory

Testing Differential Item Functioning without Predefined Anchor Items Using Robust Regression

Peer reviewed

Direct link

Wang, Weimeng; Liu, Yang; Liu, Hongyun – Journal of Educational and Behavioral Statistics, 2022

Differential item functioning (DIF) occurs when the probability of endorsing an item differs across groups for individuals with the same latent trait level. The presence of DIF items may jeopardize the validity of an instrument; therefore, it is crucial to identify DIF items in routine operations of educational assessment. While DIF detection…

Descriptors: Test Bias, Test Items, Equated Scores, Regression (Statistics)

Assessing Differential Bundle Functioning Using Meta-Analysis

Peer reviewed

Direct link

Li, Lanrong; Becker, Betsy Jane – Journal of Educational Measurement, 2021

Differential bundle functioning (DBF) has been proposed to quantify the accumulated amount of differential item functioning (DIF) in an item cluster/bundle (Douglas, Roussos, and Stout). The simultaneous item bias test (SIBTEST, Shealy and Stout) has been used to test for DBF (e.g., Walker, Zhang, and Surber). Research on DBF may have the…

Descriptors: Test Bias, Test Items, Meta Analysis, Effect Size

Detecting Item Preknowledge Using Revisits with Speed and Accuracy

Peer reviewed

Direct link

Demirkaya, Onur; Bezirhan, Ummugul; Zhang, Jinming – Journal of Educational and Behavioral Statistics, 2023

Examinees with item preknowledge tend to obtain inflated test scores that undermine test score validity. With the availability of process data collected in computer-based assessments, the research on detecting item preknowledge has progressed on using both item scores and response times. Item revisit patterns of examinees can also be utilized as…

Descriptors: Test Items, Prior Learning, Knowledge Level, Reaction Time

Finding the Right Grain-Size for Measurement in the Classroom

Peer reviewed

Direct link

Mark Wilson – Journal of Educational and Behavioral Statistics, 2024

This article introduces a new framework for articulating how educational assessments can be related to teacher uses in the classroom. It articulates three levels of assessment: macro (use of standardized tests), meso (externally developed items), and micro (on-the-fly in the classroom). The first level is the usual context for educational…

Descriptors: Educational Assessment, Measurement, Standardized Tests, Test Items

A Comparison of IRT Linking Approaches under the Nonequivalent Groups Anchor Test Design

Direct link

Jiajing Huang – ProQuest LLC, 2022

The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…

Descriptors: Item Response Theory, Test Format, Test Items, Test Construction

Interval Estimation of Item Response Probabilities along Studied Latent Dimensions

Peer reviewed

Direct link

Raykov, Tenko; Marcoulides, George A.; Pusic, Martin – Measurement: Interdisciplinary Research and Perspectives, 2021

An interval estimation procedure is discussed that can be used to evaluate the probability of a particular response for a binary or binary scored item at a pre-specified point along an underlying latent continuum. The item is assumed to: (a) be part of a unidimensional multi-component measuring instrument that may contain also polytomous items,…

Descriptors: Item Response Theory, Computation, Probability, Test Items

Designing and Evaluating Tasks to Measure Individual Differences in Experimental Psychology: A Tutorial

Peer reviewed

Direct link

Marc Brysbaert – Cognitive Research: Principles and Implications, 2024

Experimental psychology is witnessing an increase in research on individual differences, which requires the development of new tasks that can reliably assess variations among participants. To do this, cognitive researchers need statistical methods that many researchers have not learned during their training. The lack of expertise can pose…

Descriptors: Experimental Psychology, Individual Differences, Statistical Analysis, Task Analysis

Impacts of Differences in Group Abilities and Anchor Test Features on Three Non-IRT Test Equating Methods

Peer reviewed
PDF on ERIC

Download full text

Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024

The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…

Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests

Previous Page | Next Page »

Pages: 1 | 2 | 3

Robitzsch, Alexander	2
Akin-Arikan, Çigdem	1
Alahmadi, Sarah	1
Almehrizi, Rashid S.	1
Altintas, Ozge	1
Ames, Allison	1
Barry, Carol L.	1
Becker, Betsy Jane	1
Bezirhan, Ummugul	1
Black, Beth	1
Brauer, Kay	1
Brooks, Gordon	1
Cagnone, Silvia	1
Clarisse van Rensburg	1
Crawford, Brandon	1
Demirkaya, Onur	1
Diaz, Emily	1
Fatih Orcan	1
Gelbal, Selahattin	1
Guastadisegni, Lucia	1
Haimiao Yuan	1
Harring, Jeffrey R.	1
He, Qingping	1
Heine, Jörg-Henrik	1
Henson, Robert A.	1
More ▼