ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	20

Descriptor

Test Length	31
Item Response Theory	29
Computation	13
Simulation	13
Sample Size	10
Monte Carlo Methods	9
Maximum Likelihood Statistics	8
Models	7
Bayesian Statistics	6
Comparative Analysis	6
Error of Measurement	6
Test Items	6
Adaptive Testing	5
Classification	5
Computer Assisted Testing	5
Scores	4
Ability	3
Accuracy	3
Algorithms	3
Computer Software	3
Correlation	3
Evaluation Methods	3
Foreign Countries	3
Markov Processes	3
Mathematical Models	3
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	31
Reports - Evaluative	15
Reports - Research	15
Reports - Descriptive	1

Education Level

High Schools	1
Secondary Education	1

Audience

Location

Australia	1
Michigan	1
Netherlands	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Armed Forces Qualification…	1
Center for Epidemiologic…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 31 results Save | Export

Two Approaches to Estimation of Classification Accuracy Rate under Item Response Theory

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Applied Psychological Measurement, 2013

Within the framework of item response theory (IRT), there are two recent lines of work on the estimation of classification accuracy (CA) rate. One approach estimates CA when decisions are made based on total sum scores, the other based on latent trait estimates. The former is referred to as the Lee approach, and the latter, the Rudner approach,…

Descriptors: Item Response Theory, Accuracy, Classification, Computation

Using Logistic Approximations of Marginal Trace Lines to Develop Short Assessments

Peer reviewed

Direct link

Stucky, Brian D.; Thissen, David; Edelen, Maria Orlando – Applied Psychological Measurement, 2013

Test developers often need to create unidimensional scales from multidimensional data. For item analysis, "marginal trace lines" capture the relation with the general dimension while accounting for nuisance dimensions and may prove to be a useful technique for creating short-form tests. This article describes the computations needed to obtain…

Descriptors: Test Construction, Test Length, Item Analysis, Item Response Theory

Effects of Vertical Scaling Methods on Linear Growth Estimation

Peer reviewed

Direct link

Lei, Pui-Wa; Zhao, Yu – Applied Psychological Measurement, 2012

Vertical scaling is necessary to facilitate comparison of scores from test forms of different difficulty levels. It is widely used to enable the tracking of student growth in academic performance over time. Most previous studies on vertical scaling methods assume relatively long tests and large samples. Little is known about their performance when…

Descriptors: Scaling, Item Response Theory, Test Length, Sample Size

Deriving Stopping Rules for Multidimensional Computerized Adaptive Testing

Peer reviewed

Direct link

Wang, Chun; Chang, Hua-Hua; Boughton, Keith A. – Applied Psychological Measurement, 2013

Multidimensional computerized adaptive testing (MCAT) is able to provide a vector of ability estimates for each examinee, which could be used to provide a more informative profile of an examinee's performance. The current literature on MCAT focuses on the fixed-length tests, which can generate less accurate results for those examinees whose…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Length, Item Banks

The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length

Peer reviewed

Direct link

Tendeiro, Jorge N.; Meijer, Rob R. – Applied Psychological Measurement, 2013

To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…

Descriptors: Probability, Nonparametric Statistics, Goodness of Fit, Test Length

The Random-Threshold Generalized Unfolding Model and Its Application of Computerized Adaptive Testing

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei; Wu, Shiu-Lien – Applied Psychological Measurement, 2013

The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs…

Descriptors: Computer Assisted Testing, Adaptive Testing, Models, Bayesian Statistics

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

A Comparison of Four Methods of IRT Subscoring

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…

Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis

A Test-Length Correction to the Estimation of Extreme Proficiency Levels

Peer reviewed

Direct link

Magis, David; Beland, Sebastien; Raiche, Gilles – Applied Psychological Measurement, 2011

In this study, the estimation of extremely large or extremely small proficiency levels, given the item parameters of a logistic item response model, is investigated. On one hand, the estimation of proficiency levels by maximum likelihood (ML), despite being asymptotically unbiased, may yield infinite estimates. On the other hand, with an…

Descriptors: Test Length, Computation, Item Response Theory, Maximum Likelihood Statistics

Curtailment and Stochastic Curtailment to Shorten the CES-D

Peer reviewed

Direct link

Finkelman, Matthew D.; Smits, Niels; Kim, Wonsuk; Riley, Barth – Applied Psychological Measurement, 2012

The Center for Epidemiologic Studies-Depression (CES-D) scale is a well-known self-report instrument that is used to measure depressive symptomatology. Respondents who take the full-length version of the CES-D are administered a total of 20 items. This article investigates the use of curtailment and stochastic curtailment (SC), two sequential…

Descriptors: Measures (Individuals), Depression (Psychology), Test Length, Computer Assisted Testing

An Evaluation of Item Response Theory Classification Accuracy and Consistency Indices

Peer reviewed

Direct link

Wyse, Adam E.; Hao, Shiqi – Applied Psychological Measurement, 2012

This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…

Descriptors: Item Response Theory, Classification, Accuracy, Reliability

Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

Peer reviewed

Direct link

Kieftenbeld, Vincent; Natesan, Prathiba – Applied Psychological Measurement, 2012

Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

Descriptors: Test Length, Markov Processes, Item Response Theory, Monte Carlo Methods

Three Approaches to Using Lengthy Ordinal Scales in Structural Equation Models: Parceling, Latent Scoring, and Shortening Scales

Peer reviewed

Direct link

Yang, Chongming; Nay, Sandra; Hoyle, Rick H. – Applied Psychological Measurement, 2010

Lengthy scales or testlets pose certain challenges for structural equation modeling (SEM) if all the items are included as indicators of a latent construct. Three general approaches to modeling lengthy scales in SEM (parceling, latent scoring, and shortening) have been reviewed and evaluated. A hypothetical population model is simulated containing…

Descriptors: Structural Equation Models, Measures (Individuals), Sample Size, Item Response Theory

Variations on Stochastic Curtailment in Sequential Mastery Testing

Peer reviewed

Direct link

Finkelman, Matthew David – Applied Psychological Measurement, 2010

In sequential mastery testing (SMT), assessment via computer is used to classify examinees into one of two mutually exclusive categories. Unlike paper-and-pencil tests, SMT has the capability to use variable-length stopping rules. One approach to shortening variable-length tests is stochastic curtailment, which halts examination if the probability…

Descriptors: Mastery Tests, Computer Assisted Testing, Adaptive Testing, Test Length

Simultaneous Estimation of Overall and Domain Abilities: A Higher-Order IRT Model Approach

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao – Applied Psychological Measurement, 2009

Assessments consisting of different domains (e.g., content areas, objectives) are typically multidimensional in nature but are commonly assumed to be unidimensional for estimation purposes. The different domains of these assessments are further treated as multi-unidimensional tests for the purpose of obtaining diagnostic information. However, when…

Descriptors: Ability, Tests, Item Response Theory, Data Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3

Finch, Holmes	3
Meijer, Rob R.	3
de la Torre, Jimmy	3
Song, Hao	2
Stark, Stephen	2
Wang, Wen-Chung	2
Woods, Carol M.	2
Beland, Sebastien	1
Bell, Richard	1
Boughton, Keith A.	1
Chang, Hua-Hua	1
Cheng, Ying	1
Chernyshenko, Oleksandr S.	1
De Ayala, R. J.	1
Drasgow, Fritz	1
Due, Allan M.	1
Edelen, Maria Orlando	1
Finkelman, Matthew D.	1
Finkelman, Matthew David	1
Glas, Cees A. W.	1
Hambleton, Ronald K.	1
Hao, Shiqi	1
Hendrawan, Irene	1
Hong, Yuan	1
More ▼