Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 1 |
Since 2016 (last 10 years) | 1 |
Since 2006 (last 20 years) | 6 |
Descriptor
Source
Author
Brennan, Robert L. | 32 |
Kane, Michael T. | 4 |
Lee, Won-Chan | 4 |
Hanson, Bradley A. | 2 |
Yin, Ping | 2 |
Gao, Xiaohong | 1 |
Johnson, Eugene G. | 1 |
Kane, Michael F. | 1 |
Kim, Seonghoon | 1 |
Kim, Stella Y. | 1 |
Kreiter, Clarence D. | 1 |
More ▼ |
Publication Type
Journal Articles | 18 |
Reports - Research | 13 |
Reports - Evaluative | 9 |
Speeches/Meeting Papers | 4 |
Reports - Descriptive | 3 |
Information Analyses | 2 |
Numerical/Quantitative Data | 2 |
Opinion Papers | 1 |
Education Level
Higher Education | 1 |
Postsecondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Iowa Tests of Basic Skills | 1 |
Work Keys (ACT) | 1 |
What Works Clearinghouse Rating
Brennan, Robert L.; Kim, Stella Y.; Lee, Won-Chan – Educational and Psychological Measurement, 2022
This article extends multivariate generalizability theory (MGT) to tests with different random-effects designs for each level of a fixed facet. There are numerous situations in which the design of a test and the resulting data structure are not definable by a single design. One example is mixed-format tests that are composed of multiple-choice and…
Descriptors: Multivariate Analysis, Generalizability Theory, Multiple Choice Tests, Test Construction
Brennan, Robert L. – Measurement: Interdisciplinary Research and Perspectives, 2015
Koretz, in his article published in this issue, provides compelling arguments that the high stakes currently associated with accountability testing lead to behavioral changes in students, teachers, and other stakeholders that often have negative consequences, such as inflated scores. Koretz goes on to argue that these negative consequences require…
Descriptors: Accountability, High Stakes Tests, Behavior Change, Student Behavior
Lee, Eunjung; Lee, Won-Chan; Brennan, Robert L. – College Board, 2012
In almost all high-stakes testing programs, test equating is necessary to ensure that test scores across multiple test administrations are equivalent and can be used interchangeably. Test equating becomes even more challenging in mixed-format tests, such as Advanced Placement Program® (AP®) Exams, that contain both multiple-choice and constructed…
Descriptors: Test Construction, Test Interpretation, Test Norms, Test Reliability
Brennan, Robert L. – Applied Measurement in Education, 2011
Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…
Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory
Classification Consistency and Accuracy for Complex Assessments under the Compound Multinomial Model
Lee, Won-Chan; Brennan, Robert L.; Wan, Lei – Applied Psychological Measurement, 2009
For a test that consists of dichotomously scored items, several approaches have been reported in the literature for estimating classification consistency and accuracy indices based on a single administration of a test. Classification consistency and accuracy have not been studied much, however, for "complex" assessments--for example,…
Descriptors: Classification, Reliability, Test Items, Scoring
Yi, Hyun Sook; Kim, Seonghoon; Brennan, Robert L. – Applied Psychological Measurement, 2007
Large-scale testing programs involving classification decisions typically have multiple forms available and conduct equating to ensure cut-score comparability across forms. A test developer might be interested in the extent to which an examinee who happens to take a particular form would have a consistent classification decision if he or she had…
Descriptors: Classification, Reliability, Indexes, Computation

Brennan, Robert L. – Journal of Educational Measurement, 2001
Reviews important milestones in the history of reliability, current issues related to reliability, and likely prospects for reliability from the perspective of what constitutes a replication of a measurement procedure. Pays special attention to the fixed/random aspects of facets that characterize replications. (SLD)
Descriptors: Educational Testing, Measurement Techniques, Reliability
Lee, Won-Chan; Hanson, Bradley A.; Brennan, Robert L. – 2000
This paper describes procedures for estimating various indices of classification consistency and accuracy for multiple category classifications using data from a single test administration. The estimates of the classification consistency and accuracy indices are compared under three different psychometric models: the two-parameter beta binomial,…
Descriptors: Classification, Estimation (Mathematics), Item Response Theory, Reliability
Kane, Michael T.; Brennan, Robert L. – 1977
A large number of seemingly diverse coefficients have been proposed as indices of dependability, or reliability, for domain-referenced and/or mastery tests. In this paper, it is shown that most of these indices are special cases of two generalized indices of agreement: one that is corrected for chance, and one that is not. The special cases of…
Descriptors: Bayesian Statistics, Correlation, Criterion Referenced Tests, Cutting Scores

Brennan, Robert L.; Kane, Michael T. – Journal of Educational Measurement, 1977
An index for the dependability of mastery tests is described. Assumptions necessary for the index and the mathematical development of the index are provided. (Author/JKS)
Descriptors: Criterion Referenced Tests, Mastery Tests, Mathematical Models, Test Reliability

Brennan, Robert L.; Prediger, Dale J. – Educational and Psychological Measurement, 1981
This paper considers some appropriate and inappropriate uses of coefficient kappa and alternative kappa-like statistics. Discussion is restricted to the descriptive characteristics of these statistics for measuring agreement with categorical data in studies of reliability and validity. (Author)
Descriptors: Classification, Error of Measurement, Mathematical Models, Test Reliability

Brennan, Robert L.; Lockwood, Robert E. – Applied Psychological Measurement, 1980
Generalizability theory is used to characterize and quantify expected variance in cutting scores and to compare the Nedelsky and Angoff procedures for establishing a cutting score. Results suggest that the restricted nature of the Nedelsky (inferred) probability scale may limit its applicability in certain contexts. (Author/BW)
Descriptors: Cutting Scores, Generalization, Statistical Analysis, Test Reliability

Brennan, Robert L. – Applied Psychological Measurement, 2000
Reviews relevant aspects of generalizability theory related to performance assessments and discusses the role of various facets in assessing the generalizability of performance assessments. Also considers some popular estimates of reliability for performance assessments from the perspective of generalizability theory. (SLD)
Descriptors: Estimation (Mathematics), Evaluation Methods, Generalizability Theory, Performance Based Assessment

Brennan, Robert L. – Educational and Psychological Measurement, 1975
Variance components from split-plot factorial design (SPF) were used to estimate reliability for schools and persons within schools. Reliability for persons within SPF and randomized block design (RB) schools were compared and reliability for SPF and RB design schools were compared. (Author/BJG)
Descriptors: Analysis of Variance, Evaluation Methods, Schools, Statistical Analysis
Brennan, Robert L.; Light, Richard J. – 1973
Basic to many psychological investigations is the question of agreement between observers who independently categorize people. Several recent studies have proposed measures of agreement when a set of nominal scale categories have been pre-defined and imposed on both observers. This study, in contrast, developes a measure of agreement for settings…
Descriptors: Analysis of Variance, Correlation, Hypothesis Testing, Rating Scales