Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 0 |
Since 2006 (last 20 years) | 10 |
Descriptor
Test Reliability | 23 |
Reliability | 16 |
Higher Education | 9 |
Test Items | 8 |
Models | 7 |
Test Validity | 7 |
Correlation | 6 |
Foreign Countries | 6 |
Test Construction | 6 |
Classification | 5 |
Comparative Analysis | 5 |
More ▼ |
Source
Applied Psychological… | 38 |
Author
Publication Type
Journal Articles | 38 |
Reports - Research | 38 |
Information Analyses | 1 |
Opinion Papers | 1 |
Reports - Evaluative | 1 |
Tests/Questionnaires | 1 |
Education Level
Higher Education | 4 |
Postsecondary Education | 2 |
Early Childhood Education | 1 |
Elementary Education | 1 |
Grade 2 | 1 |
High Schools | 1 |
Primary Education | 1 |
Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Harik, Polina; Baldwin, Peter; Clauser, Brian – Applied Psychological Measurement, 2013
Growing reliance on complex constructed response items has generated considerable interest in automated scoring solutions. Many of these solutions are described in the literature; however, relatively few studies have been published that "compare" automated scoring strategies. Here, comparisons are made among five strategies for…
Descriptors: Computer Assisted Testing, Automation, Scoring, Comparative Analysis
Wyse, Adam E.; Hao, Shiqi – Applied Psychological Measurement, 2012
This article introduces two new classification consistency indices that can be used when item response theory (IRT) models have been applied. The new indices are shown to be related to Rudner's classification accuracy index and Guo's classification accuracy index. The Rudner- and Guo-based classification accuracy and consistency indices are…
Descriptors: Item Response Theory, Classification, Accuracy, Reliability
Weijters, Bert; Geuens, Maggie; Schillewaert, Niels – Applied Psychological Measurement, 2010
The severity of bias in respondents' self-reports due to acquiescence response style (ARS) and extreme response style (ERS) depends strongly on how consistent these response styles are over the course of a questionnaire. In the literature, different alternative hypotheses on response style (in)consistency circulate. Therefore, nine alternative…
Descriptors: Models, Response Style (Tests), Questionnaires, Measurement Techniques
Semmes, Robert; Davison, Mark L.; Close, Catherine – Applied Psychological Measurement, 2011
If numerical reasoning items are administered under time limits, will two dimensions be required to account for the responses, a numerical ability dimension and a speed dimension? A total of 182 college students answered 74 numerical reasoning items. Every item was taken with and without time limits by half the students. Three psychometric models…
Descriptors: Individual Differences, Logical Thinking, Timed Tests, College Students
Raymond, Mark R.; Harik, Polina; Clauser, Brian E. – Applied Psychological Measurement, 2011
Prior research indicates that the overall reliability of performance ratings can be improved by using ordinary least squares (OLS) regression to adjust for rater effects. The present investigation extends previous work by evaluating the impact of OLS adjustment on standard errors of measurement ("SEM") at specific score levels. In…
Descriptors: Performance Based Assessment, Licensing Examinations (Professions), Least Squares Statistics, Item Response Theory
Almehrizi, Rashid S. – Applied Psychological Measurement, 2013
The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…
Descriptors: Raw Scores, Scaling, Reliability, Computation
Bechger, Timo M.; Maris, Gunter; Hsiao, Ya Ping – Applied Psychological Measurement, 2010
The main purpose of this article is to demonstrate how halo effects may be detected and quantified using two independent ratings of the same person. A practical illustration is given to show how halo effects can be avoided. (Contains 2 tables, 7 figures, and 2 notes.)
Descriptors: Performance Based Assessment, Test Reliability, Test Length, Language Tests
Laenen, Annouschka; Alonso, Ariel; Molenberghs, Geert; Vangeneugden, Tony; Mallinckrodt, Craig H. – Applied Psychological Measurement, 2010
Longitudinal studies are permeating clinical trials in psychiatry. Therefore, it is of utmost importance to study the psychometric properties of rating scales, frequently used in these trials, within a longitudinal framework. However, intrasubject serial correlation and memory effects are problematic issues often encountered in longitudinal data.…
Descriptors: Psychiatry, Rating Scales, Memory, Psychometrics
Liu, Mei; Holland, Paul W. – Applied Psychological Measurement, 2008
The simplified version of the Dorans and Holland (2000) measure of population invariance, the root mean square difference (RMSD), is used to explore the degree of dependence of linking functions on the Law School Admission Test (LSAT) subpopulations defined by examinees' gender, ethnic background, geographic region, law school application status,…
Descriptors: Law Schools, Equated Scores, Geographic Regions, Geometric Concepts
Henson, Robert; Roussos, Louis; Douglas, Jeff; He, Xuming – Applied Psychological Measurement, 2008
Cognitive diagnostic models (CDMs) model the probability of correctly answering an item as a function of an examinee's attribute mastery pattern. Because estimation of the mastery pattern involves more than a continuous measure of ability, reliability concepts introduced by classical test theory and item response theory do not apply. The cognitive…
Descriptors: Diagnostic Tests, Classification, Probability, Item Response Theory

van der Linden, Wim J.; Boekkooi-Timminga, Ellen – Applied Psychological Measurement, 1988
Gulliksen's matched random subtests method is a graphical method to split a test into parallel test halves, allowing maximization of coefficient alpha as a lower bound to the classical test reliability coefficient. This problem is formulated as a zero-one programing problem solvable by algorithms that already exist. (TJH)
Descriptors: Algorithms, Equations (Mathematics), Programing, Test Reliability

Magnusson, D.; Backteman, G. – Applied Psychological Measurement, 1979
A longitudinal study of approximately 1,000 students aged 10-16 showed high stability of intelligence and creativity. Stability coefficients for intelligence were higher than those for creativity. Results supported the construct validity of creativity. (MH)
Descriptors: Creativity, Creativity Tests, Elementary Secondary Education, Foreign Countries

Divgi, D. R. – Applied Psychological Measurement, 1980
The dependence of reliability indices for mastery tests on mean and cutoff scores was examined in the case of three decision-theoretic indices. Dependence of kappa on mean and cutoff scores was opposite to that of the proportion of correct decisions, which was linearly related to average threshold loss. (Author/BW)
Descriptors: Classification, Cutting Scores, Mastery Tests, Test Reliability

Whitely, Susan E. – Applied Psychological Measurement, 1979
Two sources of inconsistency were separated by reanalyzing data from a major study on short-term consistency. Little evidence was found for generalizability or behavioral predictability. Results supported the assumption that measurement error from short-term fluctuations is not due to systematic individual differences in response consistency.…
Descriptors: Behavior Change, Cognitive Processes, College Freshmen, Error of Measurement

Brennan, Robert L.; Lockwood, Robert E. – Applied Psychological Measurement, 1980
Generalizability theory is used to characterize and quantify expected variance in cutting scores and to compare the Nedelsky and Angoff procedures for establishing a cutting score. Results suggest that the restricted nature of the Nedelsky (inferred) probability scale may limit its applicability in certain contexts. (Author/BW)
Descriptors: Cutting Scores, Generalization, Statistical Analysis, Test Reliability