ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	23
Since 2006 (last 20 years)	53

Source

Applied Measurement in…

111

Publication Type

Journal Articles	111
Reports - Research	68
Reports - Evaluative	39
Reports - Descriptive	5
Speeches/Meeting Papers	5
Information Analyses	1

Education Level

Higher Education	7
Grade 8	6
Elementary Education	5
Elementary Secondary Education	5
Grade 5	5
Grade 4	4
High Schools	4
Middle Schools	4
Postsecondary Education	4
Secondary Education	4
Grade 3	3
Early Childhood Education	2
Grade 10	2
Grade 7	2
Intermediate Grades	2
Junior High Schools	2
Primary Education	2
Grade 1	1
Grade 12	1
Grade 2	1
Grade 6	1
More ▼

Audience

Location

California	3
Canada	2
Arizona	1
Australia	1
California (Los Angeles)	1
Germany	1
Hawaii	1
Idaho	1
Indiana	1
Israel	1
Louisiana	1
Netherlands	1
North Carolina	1
Oman	1
Tennessee	1
United Kingdom	1
United States	1
Vermont	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001

Assessments and Surveys

National Assessment of…	3
SAT (College Admission Test)	2
Advanced Placement…	1
Iowa Tests of Basic Skills	1
National Teacher Examinations	1
Program for International…	1
Stanford Achievement Tests	1
Texas Assessment of Academic…	1
United States Medical…	1

What Works Clearinghouse Rating

Applied Measurement in Education X

Showing 16 to 30 of 111 results Save | Export

Of Small Beauties and Large Beasts: The Quality of Distractors on Multiple-Choice Tests Is More Important than Their Quantity

Peer reviewed

Direct link

Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017

In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…

Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability

Appraising the Scoring Performance of Automated Essay Scoring Systems--Some Additional Considerations: Which Essays? Which Human Raters? Which Scores?

Peer reviewed

Direct link

Raczynski, Kevin; Cohen, Allan – Applied Measurement in Education, 2018

The literature on Automated Essay Scoring (AES) systems has provided useful validation frameworks for any assessment that includes AES scoring. Furthermore, evidence for the scoring fidelity of AES systems is accumulating. Yet questions remain when appraising the scoring performance of AES systems. These questions include: (a) which essays are…

Descriptors: Essay Tests, Test Scoring Machines, Test Validity, Evaluators

Validating Human and Automated Scoring of Essays against "True" Scores

Peer reviewed

Direct link

Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…

Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing

Regression Effects in Angoff Ratings: Examples from Credentialing Exams

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2018

This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…

Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)

Using Multigroup Confirmatory Factor Analysis to Test Measurement Invariance in Raters: A Clinical Skills Examination Application

Peer reviewed

Direct link

Kahraman, Nilufer; Brown, Crystal B. – Applied Measurement in Education, 2015

Psychometric models based on structural equation modeling framework are commonly used in many multiple-choice test settings to assess measurement invariance of test items across examinee subpopulations. The premise of the current article is that they may also be useful in the context of performance assessment tests to test measurement invariance…

Descriptors: Factor Analysis, Structural Equation Models, Medical Students, Performance Based Assessment

Estimating Variance Components from Sparse Data Matrices in Large-Scale Educational Assessments

Peer reviewed

Direct link

DeMars, Christine – Applied Measurement in Education, 2015

In generalizability theory studies in large-scale testing contexts, sometimes a facet is very sparsely crossed with the object of measurement. For example, when assessments are scored by human raters, it may not be practical to have every rater score all students. Sometimes the scoring is systematically designed such that the raters are…

Descriptors: Educational Assessment, Measurement, Data, Generalizability Theory

Applying a Thurstonian, Two-Stage Method in the Standardized Assessment of Writing

Peer reviewed

Direct link

McGrane, Joshua Aaron; Humphry, Stephen Mark; Heldsinger, Sandra – Applied Measurement in Education, 2018

National standardized assessment programs have increasingly included extended written performances, amplifying the need for reliable, valid, and efficient methods of assessment. This article examines a two-stage method using comparative judgments and calibrated exemplars as a complement and alternative to existing methods of assessing writing.…

Descriptors: Standardized Tests, Foreign Countries, Writing Tests, Writing Evaluation

Evaluating Score and Decision Consistency across Claims in a Validation Argument

Peer reviewed

Direct link

Schmidgall, Jonathan – Applied Measurement in Education, 2017

This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…

Descriptors: Scores, Reliability, Validity, Generalizability Theory

Evaluating Comparative Judgment as an Approach to Essay Scoring

Peer reviewed

Direct link

Steedle, Jeffrey T.; Ferrara, Steve – Applied Measurement in Education, 2016

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation.…

Descriptors: Essays, Scoring, Comparative Analysis, Evaluators

Stability of Teacher Value-Added Rankings across Measurement Model and Scaling Conditions

Peer reviewed

Direct link

Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017

Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…

Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling

Increasing the Validity of Angoff Standards through Analysis of Judge-Level Internal Consistency

Peer reviewed

Direct link

Clauser, Jerome C.; Clauser, Brian E.; Hambleton, Ronald K. – Applied Measurement in Education, 2014

The purpose of the present study was to extend past work with the Angoff method for setting standards by examining judgments at the judge level rather than the panel level. The focus was on investigating the relationship between observed Angoff standard setting judgments and empirical conditional probabilities. This relationship has been used as a…

Descriptors: Standard Setting (Scoring), Validity, Reliability, Correlation

Evaluating the Consistency of Angoff-Based Cut Scores Using Subsets of Items within a Generalizability Theory Framework

Peer reviewed

Direct link

Kannan, Priya; Sgammato, Adrienne; Tannenbaum, Richard J.; Katz, Irvin R. – Applied Measurement in Education, 2015

The Angoff method requires experts to view every item on the test and make a probability judgment. This can be time consuming when there are large numbers of items on the test. In this study, a G-theory framework was used to determine if a subset of items can be used to make generalizable cut-score recommendations. Angoff ratings (i.e.,…

Descriptors: Reliability, Standard Setting (Scoring), Cutting Scores, Test Items

Quantifying Error in Survey Measures of School and Classroom Environments

Peer reviewed

Direct link

Schweig, Jonathan David – Applied Measurement in Education, 2014

Developing indicators that reflect important aspects of school and classroom environments has become central in a nationwide effort to develop comprehensive programs that measure teacher quality and effectiveness. Formulating teacher evaluation policy necessitates accurate and reliable methods for measuring these environmental variables. This…

Descriptors: Error of Measurement, Educational Environment, Classroom Environment, Surveys

Using Testlet Response Theory to Examine Local Dependence in C-Tests

Peer reviewed

Direct link

Eckes, Thomas; Baghaei, Purya – Applied Measurement in Education, 2015

C-tests are gap-filling tests widely used to assess general language proficiency for purposes of placement, screening, or provision of feedback to language learners. C-tests consist of several short texts in which parts of words are missing. We addressed the issue of local dependence in C-tests using an explicit modeling approach based on testlet…

Descriptors: Language Proficiency, Language Tests, Item Response Theory, Test Reliability

Validating Automated Essay Scoring: A (Modest) Refinement of the "Gold Standard"

Peer reviewed

Direct link

Powers, Donald E.; Escoffery, David S.; Duchnowski, Matthew P. – Applied Measurement in Education, 2015

By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the "gold standard." Our objective was to refine this model and apply it to…

Descriptors: Essays, Test Scoring Machines, Program Validation, Criterion Referenced Tests

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Feldt, Leonard S.	7
Hambleton, Ronald K.	3
Johnson, Robert L.	3
Qualls, Audrey L.	3
Wise, Steven L.	3
Yen, Wendy M.	3
Bandalos, Deborah L.	2
Brennan, Robert L.	2
Enders, Craig K.	2
Ercikan, Kadriye	2
Fitzpatrick, Anne R.	2
Frisbie, David A.	2
Gao, Xiaohong	2
Kane, Michael	2
Klein, Stephen P.	2
Lane, Suzanne	2
Linn, Robert L.	2
Mehrens, William A.	2
Penny, Jim	2
Shavelson, Richard J.	2
Stone, Clement A.	2
Thissen, David	2
Wainer, Howard	2
Webb, Noreen M.	2
More ▼

Reliability	50
Test Reliability	38
Interrater Reliability	28
Scores	28
Test Items	25
Scoring	21
Test Construction	21
Item Response Theory	20
Validity	20
Error of Measurement	15
Correlation	14
Educational Assessment	14
Performance Based Assessment	14
Test Validity	14
Generalizability Theory	13
Mathematics Tests	13
Evaluation Methods	12
Multiple Choice Tests	11
Standard Setting (Scoring)	11
Comparative Analysis	10
Elementary Secondary Education	9
Evaluators	9
Item Analysis	9
Licensing Examinations…	9
Computation	8
More ▼