ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	4
Since 2006 (last 20 years)	13

Descriptor

Generalizability Theory	23
Reliability	10
Error of Measurement	8
Scores	7
Scoring	6
Test Items	6
Comparative Analysis	5
Computation	3
Decision Making	3
Mathematics Tests	3
Performance Based Assessment	3
Test Construction	3
Test Interpretation	3
Validity	3
Computer Assisted Testing	2
Cutting Scores	2
Design	2
Difficulty Level	2
Educational Assessment	2
Error Patterns	2
Estimation (Mathematics)	2
Foreign Countries	2
Grade 5	2
Grade 8	2
Higher Education	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	23
Reports - Research	16
Reports - Evaluative	6
Reports - Descriptive	1
Speeches/Meeting Papers	1

Education Level

Elementary Secondary Education	2
Grade 5	2
Grade 8	2
Higher Education	2
Elementary Education	1
Grade 10	1
Grade 4	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
Postsecondary Education	1
Secondary Education	1
More ▼

Audience

Location

California	2
United Kingdom	2
California (Los Angeles)	1
North Carolina	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Analyzing Complete Generalizability Theory Designs Using Structural Equation Models

Peer reviewed

Direct link

Walter P. Vispoel; Hyeri Hong; Hyeryung Lee; Terrence D. Jorgensen – Applied Measurement in Education, 2023

We illustrate how to analyze complete generalizability theory (GT) designs using structural equation modeling software ("lavaan" in R), compare results to those obtained from numerous ANOVA-based packages, and apply those results in practical ways using data obtained from a large sample of respondents, who completed the Self-Perception…

Descriptors: Generalizability Theory, Design, Structural Equation Models, Error of Measurement

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Estimating Variance Components from Sparse Data Matrices in Large-Scale Educational Assessments

Peer reviewed

Direct link

DeMars, Christine – Applied Measurement in Education, 2015

In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…

Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory

Evaluating Score and Decision Consistency across Claims in a Validation Argument

Peer reviewed

Direct link

Schmidgall, Jonathan – Applied Measurement in Education, 2017

This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…

Descriptors: Scores, Reliability, Validity, Generalizability Theory

Designing, Evaluating, and Deploying Automated Scoring Systems with Validity in Mind: Methodological Design Decisions

Peer reviewed

Direct link

Rupp, André A. – Applied Measurement in Education, 2018

This article discusses critical methodological design decisions for collecting, interpreting, and synthesizing empirical evidence during the design, deployment, and operational quality-control phases for automated scoring systems. The discussion is inspired by work on operational large-scale systems for automated essay scoring but many of the…

Descriptors: Design, Automation, Scoring, Test Scoring Machines

Quantifying Error in Survey Measures of School and Classroom Environments

Peer reviewed

Direct link

Schweig, Jonathan David – Applied Measurement in Education, 2014

Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…

Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

An Application of Generalizability Theory to Evaluate the Technical Quality of an Alternate Assessment

Peer reviewed

Direct link

Taylor, Melinda Ann; Pastor, Dena A. – Applied Measurement in Education, 2013

Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources…

Descriptors: Generalizability Theory, Alternative Assessment, Test Reliability, Scores

Generalizability Theory and Classical Test Theory

Peer reviewed

Direct link

Brennan, Robert L. – Applied Measurement in Education, 2011

Broadly conceived, reliability involves quantifying the consistencies and inconsistencies in observed scores. Generalizability theory, or G theory, is particularly well suited to addressing such matters in that it enables an investigator to quantify and distinguish the sources of inconsistencies in observed scores that arise, or could arise, over…

Descriptors: Generalizability Theory, Test Theory, Test Reliability, Item Response Theory

Rater Language Background as a Source of Measurement Error in the Testing of English Language Learners

Peer reviewed

Direct link

Kachchaf, Rachel; Solano-Flores, Guillermo – Applied Measurement in Education, 2012

We examined how rater language background affects the scoring of short-answer, open-ended test items in the assessment of English language learners (ELLs). Four native English and four native Spanish-speaking certified bilingual teachers scored 107 responses of fourth- and fifth-grade Spanish-speaking ELLs to mathematics items administered in…

Descriptors: Error of Measurement, English Language Learners, Scoring, Bilingual Teachers

Application of Generalizability Theory to Concept Map Assessment Research

Peer reviewed

Direct link

Yin, Yue; Shavelson, Richard J. – Applied Measurement in Education, 2008

In the first part of this article, the use of Generalizability (G) theory in examining the dependability of concept map assessment scores and designing a concept map assessment for a particular practical application is discussed. In the second part, the application of G theory is demonstrated by comparing the technical qualities of two frequently…

Descriptors: Generalizability Theory, Concept Mapping, Validity, Reliability

An Empirical Examination of the Impact of Group Discussion and Examinee Performance Information on Judgments Made in the Angoff Standard-Setting Procedure

Peer reviewed

Direct link

Clauser, Brian E.; Harik, Polina; Margolis, Melissa J.; McManus, I. C.; Mollon, Jennifer; Chis, Liliana; Williams, Simon – Applied Measurement in Education, 2009

Numerous studies have compared the Angoff standard-setting procedure to other standard-setting methods, but relatively few studies have evaluated the procedure based on internal criteria. This study uses a generalizability theory framework to evaluate the stability of the estimated cut score. To provide a measure of internal consistency, this…

Descriptors: Generalizability Theory, Group Discussion, Standard Setting (Scoring), Scoring

The Effects of the Number of Scale Points and Non-Normality on the Generalizability Coefficient: A Monte Carlo Study

Peer reviewed

Direct link

Shumate, Steven R.; Surles, James; Johnson, Robert L.; Penny, Jim – Applied Measurement in Education, 2007

Increasingly, assessment practitioners use generalizability coefficients to estimate the reliability of scores from performance tasks. Little research, however, examines the relation between the estimation of generalizability coefficients and the number of rubric scale points and score distributions. The purpose of the present research is to…

Descriptors: Generalizability Theory, Monte Carlo Methods, Measures (Individuals), Program Effectiveness

Estimating Reliability under a Generalizability Theory Model for Test Scores Composed of Testlets.

Peer reviewed

Lee, Guemin; Frisbie, David A. – Applied Measurement in Education, 1999

Studied the appropriateness and implications of using a generalizability theory approach to estimating the reliability of scores from tests composed of testlets. Analyses of data from two national standardization samples suggest that manipulating the number of passages is a more productive way to obtain efficient measurement than manipulating the…

Descriptors: Generalizability Theory, Models, National Surveys, Reliability

Models of Generalizability Theory in Analyzing Existing Faculty Evaluation Data.

Peer reviewed

Chang, Lei; Hocevar, Dennis – Applied Measurement in Education, 2000

Demonstrated the use of generalizability theory in analyzing existing faculty evaluation data. Three measurement conceptualizations representing different purposes of faculty evaluation were developed and variance components associated with these conceptualizations were estimated from an existing faculty evaluation using 30 teachers for each…

Descriptors: College Faculty, Generalizability Theory, Higher Education, Models

Previous Page | Next Page »

Pages: 1 | 2

Brennan, Robert L.	3
Clauser, Brian E.	2
Lee, Guemin	2
Bimpeh, Yaw	1
Chang, Lei	1
Chis, Liliana	1
Clyman, Stephen G.	1
DeMars, Christine	1
Fitzpatrick, Anne R.	1
Frisbie, David A.	1
Gao, Furong	1
Gao, Xiaohong	1
Grosso, Lou	1
Harik, Polina	1
Harrison, Liz	1
Hocevar, Dennis	1
Hyeri Hong	1
Hyeryung Lee	1
Johnson, Robert L.	1
Kachchaf, Rachel	1
Kane, Michael	1
Kannan, Priya	1
Katz, Irvin R.	1
Margolis, Melissa J.	1
Marzano, Robert J.	1
More ▼