ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	22
Since 2006 (last 20 years)	66

Descriptor

Scores	111
Reliability	63
Test Reliability	43
Test Validity	24
Error of Measurement	22
Validity	22
Evaluation Methods	18
Psychometrics	17
Test Construction	14
Interrater Reliability	13
Student Evaluation	12
Higher Education	11
Measurement	11
Measurement Techniques	11
Testing	11
Academic Achievement	10
Achievement Gains	10
Item Response Theory	10
Scoring	10
Statistical Analysis	10
Test Results	10
Teacher Evaluation	9
Test Theory	9
Accountability	8
Accuracy	8
More ▼

Publication Type

Reports - Descriptive	111
Journal Articles	81
Speeches/Meeting Papers	10
Guides - Non-Classroom	3
Numerical/Quantitative Data	3
Opinion Papers	2
Reports - Evaluative	2
Tests/Questionnaires	2
Guides - General	1
Information Analyses	1

Education Level

Elementary Education	13
Higher Education	12
Elementary Secondary Education	8
High Schools	6
Middle Schools	6
Early Childhood Education	5
Postsecondary Education	5
Secondary Education	5
Grade 3	4
Grade 5	4
Junior High Schools	4
Grade 4	3
Grade 8	3
Primary Education	3
Grade 6	2
Grade 7	2
Grade 9	2
Intermediate Grades	2
Adult Basic Education	1
Adult Education	1
Grade 1	1
Grade 12	1
Grade 2	1
More ▼

Audience

Researchers	6
Practitioners	3
Administrators	2
Teachers	2

Location

Maryland	2
Pennsylvania	2
United States	2
Australia	1
Belgium	1
Canada	1
China	1
Florida	1
India	1
Israel	1
Philippines	1
South Africa	1
Texas	1
Texas (Austin)	1
Tunisia	1
United Kingdom	1
United Kingdom (England)	1
Vermont	1
Wyoming	1
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	3
Race to the Top	2
Education for All Handicapped…	1
Education of the Handicapped…	1
Individuals with Disabilities…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 111 results Save | Export

Comparative Judgement in Education Research

Peer reviewed

Direct link

Ian Jones; Ben Davies – International Journal of Research & Method in Education, 2024

Educational researchers often need to construct precise and reliable measurement scales of complex and varied representations such as participants' written work, videoed lesson segments and policy documents. Developing such scales using can be resource-intensive and time-consuming, and the outcomes are not always reliable. Here we present…

Descriptors: Educational Research, Comparative Analysis, Educational Researchers, Measurement

Reliable for Whom? Inferring and Reporting Reliability across Diverse Populations

Peer reviewed

Direct link

Richard S. Balkin; Quentin Hunter; Bradley T. Erford – Measurement and Evaluation in Counseling and Development, 2024

We describe best practices in reporting reliability estimates in counseling research with consideration to precision, generalization, and diverse populations. We provide a historical context to reporting reliability estimates, the limitations of past practices, and new methods to address reliability generalization. We highlight best practices…

Descriptors: Best Practices, Reliability, Counseling, Research

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Modeling Rater Response Processes in Evaluating Score Meaning

Peer reviewed

Direct link

Lane, Suzanne – Journal of Educational Measurement, 2019

Rater-mediated assessments require the evaluation of the accuracy and consistency of the inferences made by the raters to ensure the validity of score interpretations and uses. Modeling rater response processes allows for a better understanding of how raters map their representations of the examinee performance to their representation of the…

Descriptors: Responses, Accuracy, Validity, Interrater Reliability

Reliability. Improving Literacy Brief: Understanding Screening

Direct link

Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019

Reliability is the consistency of a set of scores that are designed to measure the same thing. Reliability is a statistical property of scores that must be demonstrated rather than assumed.

Descriptors: Scores, Measurement, Test Reliability, Error Patterns

Responsibilities of Users of Standardized Tests (Rust-4E)

Peer reviewed

Direct link

Lenz, A. Stephen; Ault, Haley; Balkin, Richard S.; Barrio Minton, Casey; Erford, Bradley T.; Hays, Danica G.; Kim, Bryan S. K.; Li, Chi – Measurement and Evaluation in Counseling and Development, 2022

In April 2021, The Association for Assessment and Research in Counseling Executive Council commissioned a time-referenced task group to revise the Responsibilities of Users of Standardized Tests (RUST) Statement (3rd edition) published by the Association for Assessment in Counseling (AAC) in 2003. The task group developed a work plan to implement…

Descriptors: Responsibility, Standardized Tests, Counselor Training, Ethics

Validity. Improving Literacy Brief: Understanding Screening

Direct link

Petscher, Y.; Pentimonti, J.; Stanley, C. – National Center on Improving Literacy, 2019

Validity is broadly defined as how well something measures what it's supposed to measure. The reliability and validity of scores from assessments are two concepts that are closely knit together and feed into each other.

Descriptors: Screening Tests, Scores, Test Validity, Test Reliability

Conditional Precision of Measurement for Test Scores: Are Conditional Standard Errors Sufficient?

Peer reviewed

Direct link

Nicewander, W. Alan – Educational and Psychological Measurement, 2019

This inquiry is focused on three indicators of the precision of measurement--conditional on fixed values of ?, the latent variable of item response theory (IRT). The indicators that are compared are (1) The traditional, conditional standard errors, s(eX|?) = CSEM; (2) the IRT-based conditional standard errors, s[subscript irt](eX|?)=C[subscript…

Descriptors: Measurement, Accuracy, Scores, Error of Measurement

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Valid and Reliable Assessments. CSAI Update

Download full text

Center on Standards and Assessments Implementation, 2018

Reliability is a measure of consistency. It is the degree to which student results are the same when they take the same test on different occasions, when different scorers score the same item or task, and when different but equivalent tests are taken at the same time or at different times. Reliability is about making sure that different test forms…

Descriptors: Test Reliability, Test Validity, Student Evaluation, Test Bias

Making Sense of Elementary School Reading Scores. Literacy Leadership Brief

Direct link

Fitzgerald, Jill; Shanahan, Timothy E. – International Literacy Association, 2020

Reading scores exist for a continuum of purposes, from informal assessment to formal standardized tests. This brief aims to answer the question: What matters most for elementary-grade teachers when thinking about reading scores, and what could policymakers do to help teachers? Three positions worth pursuing in this regard are shared: (1) every…

Descriptors: Reading Achievement, Scores, Elementary School Students, Elementary School Teachers

Dynamic Measurement in Health Professions Education: Rationale, Application, and Possibilities

Peer reviewed

Direct link

Dumas, Denis; McNeish, Daniel; Schreiber-Gregory, Deanna; Durning, Steven J.; Torre, Dario – AERA Online Paper Repository, 2020

Dynamic Measurement Modeling (DMM) is a psychometric paradigm that uses longitudinal data to estimate students' capacity to learn over the course of an educational program (i.e., growth scores). Here, we provide justification for this approach in health professions education and demonstrate its proof of concept with three time-points of USMLE Step…

Descriptors: Allied Health Occupations Education, Measurement Techniques, Psychometrics, Longitudinal Studies

Phys-MAPS: A Programmatic Physiology Assessment for Introductory and Advanced Undergraduates

Peer reviewed

Direct link

Semsar, Katharine; Brownell, Sara; Couch, Brian A.; Crowe, Alison J.; Smith, Michelle K.; Summers, Mindi M.; Wright, Christian D.; Knight, Jennifer K. – Advances in Physiology Education, 2019

We describe the development of a new, freely available, online, programmatic-level assessment tool, Measuring Achievement and Progress in Science in Physiology, or Phys-MAPS (http://cperl.lassp.cornell.edu/bio-maps). Aligned with the conceptual frameworks of Core Principles of Physiology, and Vision and Change Core Concepts, Phys-MAPS can be used…

Descriptors: Physiology, Science Instruction, Science Tests, Computer Assisted Testing

A Simple Equation to Predict a Subscore's Value

Peer reviewed

Direct link

Feinberg, Richard A.; Wainer, Howard – Educational Measurement: Issues and Practice, 2014

Subscores are often used to indicate test-takers' relative strengths and weaknesses and so help focus remediation. But a subscore is not worth reporting if it is too unreliable to believe or if it contains no information that is not already contained in the total score. It is possible, through the use of a simple linear equation provided in…

Descriptors: Scores, Equations (Mathematics), Prediction, Reliability

Stabilizing Subgroup Proficiency Results to Improve the Identification of Low-Performing Schools. Study Snapshot. REL 2023-001

Peer reviewed
PDF on ERIC

Download full text

Regional Educational Laboratory Mid-Atlantic, 2023

This Snapshot highlights key findings from a study that used Bayesian stabilization to improve the reliability (long-term stability) of subgroup proficiency measures that the Pennsylvania Department of Education (PDE) uses to identify schools for Targeted Support and Improvement (TSI) or Additional Targeted Support and Improvement (ATSI). The…

Descriptors: At Risk Students, Low Achievement, Error of Measurement, Measurement Techniques

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Educational and Psychological…	11
Measurement and Evaluation in…	5
Journal of Educational…	4
Educational Measurement:…	3
International Journal of…	3
Structural Equation Modeling:…	3
Applied Measurement in…	2
Applied Psychological…	2
Assessment Update	2
Assessment and Accountability…	2
Assessment for Effective…	2
Diagnostique	2
Journal of Educational and…	2
Multivariate Behavioral…	2
NASSP Bulletin	2
National Center on Improving…	2
New Meridian Corporation	2
Psychometrika	2
Regional Educational…	2
ACT, Inc.	1
AERA Online Paper Repository	1
Academic Medicine	1
Advances in Physiology…	1
Black Issues in Higher…	1
British Journal of…	1
More ▼

Erford, Bradley T.	3
Feldt, Leonard S.	3
Kolen, Michael J.	3
Raykov, Tenko	3
Davis, John L.	2
Dimitrov, Dimiter M.	2
Fan, Xitao	2
Goldschmidt, Pete	2
Harris, Deborah J.	2
Hays, Danica G.	2
Heritage, Margaret	2
Herman, Joan L.	2
Lee, Won-Chan	2
Pentimonti, J.	2
Petscher, Y.	2
Stanley, C.	2
Thompson, Bruce	2
Vannest, Kimberly J.	2
Abedi, Jamal	1
Asparouhov, Tihomir	1
Ault, Haley	1
Avery, Leanne M.	1
Bachman, Lyle F.	1
Badger, Julia R.	1
More ▼

ACT Assessment	2
General Educational…	2
Bracken Basic Concept Scale	1
College Level Examination…	1
Collegiate Assessment of…	1
Dynamic Indicators of Basic…	1
Florida Comprehensive…	1
Graduate Management Admission…	1
International English…	1
Iowa Tests of Basic Skills	1
National Assessment of Adult…	1
North Carolina End of Course…	1
Preliminary Scholastic…	1
Preschool Language Scale	1
Program for International…	1
SAT (College Admission Test)	1
Stanford Achievement Tests	1
Test of English as a Foreign…	1
Texas Assessment of Academic…	1
United States Medical…	1
Work Keys (ACT)	1
More ▼