NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 20 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Steven L. – Applied Measurement in Education, 2019
The identification of rapid guessing is important to promote the validity of achievement test scores, particularly with low-stakes tests. Effective methods for identifying rapid guesses require reliable threshold methods that are also aligned with test taker behavior. Although several common threshold methods are based on rapid guessing response…
Descriptors: Guessing (Tests), Identification, Reaction Time, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020
When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…
Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
McGrane, Joshua Aaron; Humphry, Stephen Mark; Heldsinger, Sandra – Applied Measurement in Education, 2018
National standardized assessment programs have increasingly included extended written performances, amplifying the need for reliable, valid, and efficient methods of assessment. This article examines a two-stage method using comparative judgments and calibrated exemplars as a complement and alternative to existing methods of assessing writing.…
Descriptors: Standardized Tests, Foreign Countries, Writing Tests, Writing Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Schmidgall, Jonathan – Applied Measurement in Education, 2017
This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…
Descriptors: Scores, Reliability, Validity, Generalizability Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016
As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…
Descriptors: Essays, Scoring, Comparative Analysis, Evaluators
Peer reviewed Peer reviewed
Direct linkDirect link
Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014
The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…
Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Deunk, Marjolein I.; van Kuijk, Mechteld F.; Bosker, Roel J. – Applied Measurement in Education, 2014
Standard setting methods, like the Bookmark procedure, are used to assist education experts in formulating performance standards. Small group discussion is meant to help these experts in setting more reliable and valid cutoff scores. This study is an analysis of 15 small group discussions during two standards setting trajectories and their effect…
Descriptors: Cutting Scores, Standard Setting, Group Discussion, Reading Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Leighton, Jacqueline P. – Applied Measurement in Education, 2013
The Standards for Educational and Psychological Testing indicate that multiple sources of validity evidence should be used to support the interpretation of test scores. In the past decade, examinee response processes, as a source of validity evidence, have received increased attention. However, there have been relatively few methodological studies…
Descriptors: Psychological Testing, Standards, Interviews, Protocol Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Yin, Yue; Shavelson, Richard J. – Applied Measurement in Education, 2008
In the first part of this article, the use of Generalizability (G) theory in examining the dependability of concept map assessment scores and designing a concept map assessment for a particular practical application is discussed. In the second part, the application of G theory is demonstrated by comparing the technical qualities of two frequently…
Descriptors: Generalizability Theory, Concept Mapping, Validity, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Wise, Lauress L. – Applied Measurement in Education, 2010
The articles in this special issue make two important contributions to our understanding of the impact of accommodations on test score validity. First, they illustrate a variety of methods for collection and rigorous analyses of empirical data that can supplant expert judgment of the impact of accommodations. These methods range from internal…
Descriptors: Reading Achievement, Educational Assessment, Test Reliability, Learning Disabilities
Peer reviewed Peer reviewed
Direct linkDirect link
Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010
Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…
Descriptors: Item Response Theory, Case Studies, Reliability, Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Osborn Popp, Sharon E.; Ryan, Joseph M.; Thompson, Marilyn S. – Applied Measurement in Education, 2009
Scoring rubrics are routinely used to evaluate the quality of writing samples produced for writing performance assessments, with anchor papers chosen to represent score points defined in the rubric. Although the careful selection of anchor papers is associated with best practices for scoring, little research has been conducted on the role of…
Descriptors: Writing Evaluation, Scoring Rubrics, Selection, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Kane, Michael; Case, Susan M. – Applied Measurement in Education, 2004
The scores on 2 distinct tests (e.g., essay and objective) are often combined to create a composite score, which is used to make decisions. The validity of the observed composite can sometimes be evaluated relative to an external criterion. However, in cases where no criterion is available, the observed composite has generally been evaluated in…
Descriptors: Validity, Weighted Scores, Reliability, Student Evaluation
Peer reviewed Peer reviewed
Feldt, Leonard S. – Applied Measurement in Education, 1997
It has often been asserted that the reliability of a measure places an upper limit on its validity. This article demonstrates in theory that validity can rise when reliability declines, even when validity evidence is a correlation with an acceptable criterion. Whether empirical examples can actually be found is an open question. (SLD)
Descriptors: Correlation, Criteria, Reliability, Test Construction
Peer reviewed Peer reviewed
Frary, Robert B. – Applied Measurement in Education, 2000
Characterizes the circumstances under which validity changes may occur as a result of the deletion of a predictor test segment. Equations show that, for a positive outcome, one should seek a relatively large correlation between the scores from the deleted segment and the remaining items, with a relatively low correlation between scores from the…
Descriptors: Equations (Mathematics), Prediction, Reliability, Scores
Previous Page | Next Page ยป
Pages: 1  |  2