Publication Date
In 2025 | 2 |
Since 2024 | 15 |
Since 2021 (last 5 years) | 68 |
Since 2016 (last 10 years) | 171 |
Since 2006 (last 20 years) | 439 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
Researchers | 28 |
Practitioners | 2 |
Policymakers | 1 |
Students | 1 |
Location
Turkey | 14 |
Canada | 10 |
United States | 10 |
California | 9 |
Netherlands | 9 |
Australia | 6 |
Germany | 6 |
South Korea | 6 |
Iowa | 5 |
Norway | 5 |
Turkey (Ankara) | 5 |
More ▼ |
Laws, Policies, & Programs
Individuals with Disabilities… | 2 |
No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating

Abedi, Jamal – Multivariate Behavioral Research, 1996
The Interrater/Test Reliability System (ITRS) is described. The ITRS is a comprehensive computer tool used to address questions of interrater reliability that computes several different indices of interrater reliability and the generalizability coefficient over raters and topics. The system is available in IBM compatible or Macintosh format. (SLD)
Descriptors: Computer Software, Computer Software Evaluation, Evaluation Methods, Evaluators

Kane, Michael – Applied Measurement in Education, 1996
This overview of the role of error and tolerance for error in measurement asserts that the generic precision associated with a measurement procedure is defined as the root mean square error, or standard error, in some relevant population. This view of precision is explored in several applications of measurement. (SLD)
Descriptors: Error of Measurement, Error Patterns, Generalizability Theory, Measurement Techniques

Norcini, John J.; And Others – Evaluation and the Health Professions, 1990
Aggregate scoring was applied to a recertifying examination for medical professionals to generate an answer key and allow comparison of peer examinees. Results for 1,927 candidates for recertification indicate considerable agreement between the traditional answer key and the aggregate answer key. (TJH)
Descriptors: Answer Keys, Criterion Referenced Tests, Error of Measurement, Generalizability Theory

Marcoulides, George A. – Educational and Psychological Measurement, 1994
Effects of different weighting schemes on selecting the optimal number of observations in multivariate-multifacet generalizability designs are studied when cost constraints are imposed. Comparison of four schemes through simulation indicates that all four produce similar optimal values and that reliability should be similar. (SLD)
Descriptors: Budgeting, Comparative Analysis, Costs, Factor Analysis

Goldstein, Zvi; Marcoulides, George A. – Educational and Psychological Measurement, 1991
An efficient search procedure is presented for determining the optimal number of observations of facets in a design that maximize generalizability when resource constraints are imposed. The procedure is illustrated for three-facet and four-facet designs, with extensions for other configurations. (Author/SLD)
Descriptors: Cost Effectiveness, Decision Making, Equations (Mathematics), Generalizability Theory

Smith, Philip L.; Luecht, Richard M. – Applied Psychological Measurement, 1992
The implications of serially correlated effects on the results of generalizability analyses are discussed. Simulated data are provided that demonstrate the biases that serially correlated effects introduce into the results. Serial correlation in measurement effects can have a marked influence on the impression of the dependability of measurement…
Descriptors: Computer Simulation, Correlation, Equations (Mathematics), Estimation (Mathematics)

Goodwin, Laura D.; And Others – Journal of Special Education, 1991
Using data from an individually administered interview schedule (the Consumer Satisfaction Inventory), reliability among nine interviewers was estimated with several statistical methods, including simple percentages of agreement, kappa and weighted kappa, Pearson correlations, t tests on interviewers' means, and generalizability theory techniques.…
Descriptors: Disabilities, Educational Research, Elementary Secondary Education, Estimation (Mathematics)

Swartz, Carl W.; Hooper, Stephen R.; Mongomery, James W.; Wakely, Melissa B.; De Kruif, Renee E. L.; Reed, Martha; Brown, Timothy T.; Levine, Melvin D.; White, Kinnard P. – Educational and Psychological Measurement, 1999
Used generalizability theory to investigate the impact of the number of raters and the type of decision (relative versus absolute) on the reliability of writing scores. Results from 251 middle school students and 20 intermediate grade students show that reliability coefficients decline as the number of raters declines and when absolute decisions…
Descriptors: Estimation (Mathematics), Generalizability Theory, Holistic Evaluation, Intermediate Grades

Hoyt, William T.; Melby, Janet N. – Counseling Psychologist, 1999
Addresses generalizability theory (GT), which offers a flexible framework for assessing dependability of measurement. GT allows for consideration of multiple sources of error, allowing investigators to assess the overall impact of measurement error. Illustrative analyses demonstrate the special advantages of GT for planning studies in which…
Descriptors: Counseling Psychology, Generalizability Theory, Measurement, Research Design

Norcini, John; Grosso, Lou – Applied Measurement in Education, 1998
Ratings of test item relevance were collected from 57 practitioners from a pretest of a medical certifying examination. Ratings were correlated with item difficulty, but the relationship between ratings and item discrimination was less clear. Application of generalizability theory shows that reasonable estimates of item, stem, and total test…
Descriptors: Certification, Difficulty Level, Estimation (Mathematics), Generalizability Theory

Heck, Ronald H.; Johnsrud, Linda K.; Rosser, Vicki J. – Research in Higher Education, 2000
Little research exists on the assessment of administrators' performance in higher education. The authors offer an evaluation model for assessing and monitoring the effectiveness of academic deans and directors, using generalizability theory as a basis for developing more accurate assessment procedures. (JM)
Descriptors: Academic Deans, Administrator Effectiveness, Administrator Evaluation, College Administration
Yin, Ping – Educational and Psychological Measurement, 2005
The main purpose of this study is to examine the content structure of the Multistate Bar Examination (MBE) using the "table of specifications" model from the perspective of multivariate generalizability theory. Specifically, using MBE data collected over different years (six administrations: three from the February test and three from July test),…
Descriptors: Correlation, Generalizability Theory, Statistical Analysis, Multivariate Analysis
Johnson, Robert L.; Penny, James; Gordon, Belita; Shumate, Steven R.; Fisher, Steven P. – Language Assessment Quarterly, 2005
Many studies have indicated that at least 2 raters should score writing assessments to improve interrater reliability. However, even for assessments that characteristically demonstrate high levels of rater agreement, 2 raters of the same essay can occasionally report different, or discrepant, scores. If a single score, typically referred to as an…
Descriptors: Interrater Reliability, Scores, Evaluation, Reliability
Stern, Hal S. – Psychological Methods, 2005
I. Klugkist, O. Laudy, and H. Hoijtink (2005) presented a Bayesian approach to analysis of variance models with inequality constraints. Constraints may play 2 distinct roles in data analysis. They may represent prior information that allows more precise inferences regarding parameter values, or they may describe a theory to be judged against the…
Descriptors: Probability, Inferences, Bayesian Statistics, Data Analysis
Cheong, Yuk Fai – International Journal of Testing, 2006
This article considers and illustrates a strategy to study effects of school context on differential item functioning (DIF) in large-scale assessment. The approach employs a hierarchical generalized linear modeling framework to (a) detect DIF, and (b) identify school-level correlates of the between-group differences in item performance. To…
Descriptors: Context Effect, Test Bias, Causal Models, Educational Assessment