ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	2
Since 2006 (last 20 years)	22

Descriptor

Test Length	57
Item Response Theory	53
Sample Size	17
Simulation	17
Test Items	17
Mathematical Models	14
Computer Assisted Testing	12
Adaptive Testing	11
Comparative Analysis	11
Computation	11
Estimation (Mathematics)	11
Computer Simulation	10
Ability	9
Classification	9
Monte Carlo Methods	9
Bayesian Statistics	8
Correlation	8
Goodness of Fit	8
Maximum Likelihood Statistics	8
Scores	8
Equations (Mathematics)	6
Error of Measurement	6
Models	6
Test Construction	6
Chi Square	5
More ▼

Source

Applied Psychological…	15
Educational and Psychological…	7
Journal of Educational…	3
Applied Measurement in…	2
Psychometrika	2
Educational Research and…	1
European Journal of Science…	1
Evaluation in Education:…	1
International Journal of…	1
Journal of Educational…	1
Machine-Mediated Learning	1
Montgomery County Public…	1
Psychological Methods	1
More ▼

Publication Type

Reports - Evaluative	57
Journal Articles	36
Speeches/Meeting Papers	16
Reports - Research	2
Numerical/Quantitative Data	1

Education Level

Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Grade 1	1
Grade 2	1
Grade 3	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Grade 9	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Kindergarten	1
Middle Schools	1
Primary Education	1
More ▼

Audience

Location

Maryland	1
Netherlands	1
Taiwan	1

Laws, Policies, & Programs

Assessments and Surveys

Law School Admission Test	2
Armed Forces Qualification…	1
Measures of Academic Progress	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 57 results Save | Export

Profile Analyses as Feedback by Evaluating the Balance in Exam Scores

Peer reviewed
PDF on ERIC

Download full text

Vaheoja, Monika; Verhelst, N. D.; Eggen, T.J.H.M. – European Journal of Science and Mathematics Education, 2019

In this article, the authors applied profile analysis to Maths exam data to demonstrate how different exam forms, differing in difficulty and length, can be reported and easily interpreted. The results were presented for different groups of participants and for different institutions in different Maths domains by evaluating the balance. Some…

Descriptors: Feedback (Response), Foreign Countries, Statistical Analysis, Scores

Student Outcomes on MAP Growth: Comparison of Virtual and In-Person Administrations

Download full text

James, Syretta R.; Liu, Shihching Jessica; Maina, Nyambura; Wade, Julie; Wang, Helen; Wilson, Heather; Wolanin, Natalie – Montgomery County Public Schools, 2021

The impact of the COVID-19 pandemic continues to overwhelm the functioning and outcomes of educational systems throughout the nation. The public education system is under particular scrutiny given that students, families, and educators are under considerable stress to maintain academic progress. Since the beginning of the crisis, school-systems…

Descriptors: Achievement Tests, COVID-19, Pandemics, Public Schools

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

Using Logistic Approximations of Marginal Trace Lines to Develop Short Assessments

Peer reviewed

Direct link

Stucky, Brian D.; Thissen, David; Edelen, Maria Orlando – Applied Psychological Measurement, 2013

Test developers often need to create unidimensional scales from multidimensional data. For item analysis, "marginal trace lines" capture the relation with the general dimension while accounting for nuisance dimensions and may prove to be a useful technique for creating short-form tests. This article describes the computations needed to obtain…

Descriptors: Test Construction, Test Length, Item Analysis, Item Response Theory

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

A Comparison of Four Methods of IRT Subscoring

Peer reviewed

Direct link

de la Torre, Jimmy; Song, Hao; Hong, Yuan – Applied Psychological Measurement, 2011

Lack of sufficient reliability is the primary impediment for generating and reporting subtest scores. Several current methods of subscore estimation do so either by incorporating the correlational structure among the subtest abilities or by using the examinee's performance on the overall test. This article conducted a systematic comparison of four…

Descriptors: Item Response Theory, Scoring, Methods, Comparative Analysis

Relating Unidimensional IRT Parameters to a Multidimensional Response Space: A Review of Two Alternative Projection IRT Models for Scoring Subscales

Peer reviewed

Direct link

Kahraman, Nilufer; Thompson, Tony – Journal of Educational Measurement, 2011

A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article…

Descriptors: Test Length, Test Items, Alignment (Education), Models

Correcting Fallacies in Validity, Reliability, and Classification

Peer reviewed

Direct link

Sijtsma, Klaas – International Journal of Testing, 2009

This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…

Descriptors: Construct Validity, Reliability, Classification, Test Theory

Checking Dimensionality in Item Response Models with Principal Component Analysis on Standardized Residuals

Peer reviewed

Direct link

Chou, Yeh-Tai; Wang, Wen-Chung – Educational and Psychological Measurement, 2010

Dimensionality is an important assumption in item response theory (IRT). Principal component analysis on standardized residuals has been used to check dimensionality, especially under the family of Rasch models. It has been suggested that an eigenvalue greater than 1.5 for the first eigenvalue signifies a violation of unidimensionality when there…

Descriptors: Test Length, Sample Size, Correlation, Item Response Theory

Computerized Classification Testing with the Rasch Model

Peer reviewed

Direct link

Eggen, Theo J. H. M. – Educational Research and Evaluation, 2011

If classification in a limited number of categories is the purpose of testing, computerized adaptive tests (CATs) with algorithms based on sequential statistical testing perform better than estimation-based CATs (e.g., Eggen & Straetmans, 2000). In these computerized classification tests (CCTs), the Sequential Probability Ratio Test (SPRT) (Wald,…

Descriptors: Test Length, Adaptive Testing, Classification, Item Analysis

Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

Peer reviewed

Direct link

Kieftenbeld, Vincent; Natesan, Prathiba – Applied Psychological Measurement, 2012

Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

Descriptors: Test Length, Markov Processes, Item Response Theory, Monte Carlo Methods

Computerized Classification Testing under the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Wang, Wen-Chung; Liu, Chen-Wei – Educational and Psychological Measurement, 2011

The generalized graded unfolding model (GGUM) has been recently developed to describe item responses to Likert items (agree-disagree) in attitude measurement. In this study, the authors (a) developed two item selection methods in computerized classification testing under the GGUM, the current estimate/ability confidence interval method and the cut…

Descriptors: Computer Assisted Testing, Adaptive Testing, Classification, Item Response Theory

On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

Peer reviewed

Direct link

Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey – Educational and Psychological Measurement, 2009

The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…

Descriptors: Nonparametric Statistics, Item Response Theory, Test Items, Simulation

Variations on Stochastic Curtailment in Sequential Mastery Testing

Peer reviewed

Direct link

Finkelman, Matthew David – Applied Psychological Measurement, 2010

In sequential mastery testing (SMT), assessment via computer is used to classify examinees into one of two mutually exclusive categories. Unlike paper-and-pencil tests, SMT has the capability to use variable-length stopping rules. One approach to shortening variable-length tests is stochastic curtailment, which halts examination if the probability…

Descriptors: Mastery Tests, Computer Assisted Testing, Adaptive Testing, Test Length

Multidimensional Rasch Analysis of a Psychological Test with Multiple Subtests: A Statistical Solution for the Bandwidth-Fidelity Dilemma

Peer reviewed

Direct link

Cheng, Ying-Yao; Wang, Wen-Chung; Ho, Yi-Hui – Educational and Psychological Measurement, 2009

Educational and psychological tests are often composed of multiple short subtests, each measuring a distinct latent trait. Unfortunately, short subtests suffer from low measurement precision, which makes the bandwidth-fidelity dilemma inevitable. In this study, the authors demonstrate how a multidimensional Rasch analysis can be employed to take…

Descriptors: Item Response Theory, Measurement, Correlation, Measures (Individuals)

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

Gessaroli, Marc E.	5
Wang, Wen-Chung	5
De Ayala, R. J.	4
De Champlain, Andre	3
Kim, Seock-Ho	3
Meijer, Rob R.	3
De Champlain, Andre F.	2
Eggen, Theo J. H. M.	2
Finch, Holmes	2
Sijtsma, Klaas	2
Song, Hao	2
de la Torre, Jimmy	2
van der Linden, Wim J.	2
Allen, Nancy L.	1
Ang, Cheng	1
Bergstrom, Betty A.	1
Bush, M. Joan	1
Chen, Cheng-Te	1
Cheng, Ying	1
Cheng, Ying-Yao	1
Chou, Yeh-Tai	1
Cohen, Allan S.	1
Cui, Ying	1
De Champlain, Judy E.	1
More ▼