Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 4 |
Descriptor
Scoring | 12 |
Test Reliability | 6 |
Reliability | 5 |
Test Items | 5 |
Higher Education | 4 |
Test Validity | 4 |
Scoring Formulas | 3 |
Analysis of Variance | 2 |
Classification | 2 |
Estimation (Mathematics) | 2 |
Guessing (Tests) | 2 |
More ▼ |
Source
Applied Psychological… | 12 |
Author
Lee, Won-Chan | 2 |
Baldwin, Peter | 1 |
Blackman, Nicole J-M. | 1 |
Brennan, Robert L. | 1 |
Budescu, David V. | 1 |
Clauser, Brian | 1 |
Davison, Mark L. | 1 |
Downey, Ronald G. | 1 |
Harik, Polina | 1 |
Kolen, Michael J. | 1 |
Koval, John J. | 1 |
More ▼ |
Publication Type
Journal Articles | 8 |
Reports - Evaluative | 4 |
Reports - Research | 3 |
Reports - Descriptive | 2 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Defining Issues Test | 1 |
Rod and Frame Test | 1 |
United States Medical… | 1 |
What Works Clearinghouse Rating
van der Ark, L. Andries; van der Palm, Daniel W.; Sijtsma, Klaas – Applied Psychological Measurement, 2011
This study presents a general framework for single-administration reliability methods, such as Cronbach's alpha, Guttman's lambda-2, and method MS. This general framework was used to derive a new approach to estimating test-score reliability by means of the unrestricted latent class model. This new approach is the latent class reliability…
Descriptors: Simulation, Reliability, Measurement, Psychology
Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013
Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…
Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis
Classification Consistency and Accuracy for Complex Assessments under the Compound Multinomial Model
Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009
For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…
Descriptors: Classification, Reliability, Test Items, Scoring
Lee, Won-Chan – Applied Psychological Measurement, 2007
This article introduces a multinomial error model, which models an examinee's test scores obtained over repeated measurements of an assessment that consists of polytomously scored items. A compound multinomial error model is also introduced for situations in which items are stratified according to content categories and/or prespecified numbers of…
Descriptors: Simulation, Error of Measurement, Scoring, Test Items

Moreland, John R.; Liss-Levinson, Nechama – Applied Psychological Measurement, 1977
One reason the fear of success (FOS) literature is confusing and contradictory is because the FOS construct is not being reliably measured across studies. Current scoring guidelines are insufficient for ensuring the reliable measurement of this construct. (RC)
Descriptors: Fear, Fear of Success, Females, Imagery

Blackman, Nicole J-M.; Koval, John J. – Applied Psychological Measurement, 1993
Four indexes of agreement between ratings of a person that correct for chance and are interpretable as intraclass correlation coefficients for different analysis of variance models are investigated. Relationships among the estimators are established for finite samples, and the equivalence of these estimators in large samples is demonstrated. (SLD)
Descriptors: Analysis of Variance, Equations (Mathematics), Estimation (Mathematics), Interrater Reliability

McGarvey, Bill; And Others – Applied Psychological Measurement, 1977
The most consistently used scoring system for the rod-and-frame task has been the total number of degrees in error from the true vertical. Since a logical case can be made for at least four alternative scoring systems, a thorough comparison of all five systems was performed. (Author/CTM)
Descriptors: Analysis of Variance, Cognitive Style, Cognitive Tests, Elementary Education

Schulz, E. Matthew; Kolen, Michael J.; Nicewander, W. Alan – Applied Psychological Measurement, 1999
Developed a procedure for defining achievement levels on continuous scales using aspects of Guttman scaling (L. Guttman, 1950) and Item Response Theory. Using data from high school mathematics tests for about 6,000 students, found the new procedure to have higher reliability, higher classification consistency, and lower classification error than…
Descriptors: Academic Achievement, Classification, Estimation (Mathematics), High School Students

Davison, Mark L.; Robbins, Stephen – Applied Psychological Measurement, 1978
Empirically weighted scores for Rest's Defining Issues Test were found to be more reliable than the simple sum of scores theoretically weighted sum, or Rest's p scores. They also had slightly higher correlations with Kohlberg's interview scores. Empirically weighted scores also showed more significant change in two longitudinal studies. (CTM)
Descriptors: Higher Education, Longitudinal Studies, Moral Development, Moral Values

Poizner, Sharon B.; And Others – Applied Psychological Measurement, 1978
Binary, probability, and ordinal scoring procedures for multiple-choice items were examined. In two situations, it was found that both the probability and ordinal scoring systems were more reliable than the binary scoring method. (Author/CTM)
Descriptors: Confidence Testing, Guessing (Tests), Higher Education, Multiple Choice Tests

Budescu, David V. – Applied Psychological Measurement, 1988
A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)
Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)

Downey, Ronald G. – Applied Psychological Measurement, 1979
This research attempted to interrelate several methods of producing option weights (i.e., Guttman internal and external weights and judges' weights) and examined their effects on reliability and on concurrent, predictive, and face validity. It was concluded that option weighting offered limited, if any, improvement over unit weighting. (Author/CTM)
Descriptors: Achievement Tests, Answer Keys, Comparative Testing, High Schools