ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	23
Since 2016 (last 10 years)	60
Since 2006 (last 20 years)	145

Descriptor

Evaluation Methods	210
Scores	210
Test Reliability	96
Reliability	90
Test Validity	64
Student Evaluation	50
Validity	44
Interrater Reliability	41
Correlation	39
Foreign Countries	38
Test Construction	28
Measurement Techniques	23
Psychometrics	23
Scoring	22
Statistical Analysis	20
Test Items	20
Elementary Secondary Education	19
Rating Scales	19
Comparative Analysis	18
Factor Analysis	18
Academic Achievement	17
Higher Education	17
Measures (Individuals)	17
Educational Assessment	16
Standardized Tests	16
More ▼

Publication Type

Journal Articles	153
Reports - Research	135
Reports - Evaluative	29
Speeches/Meeting Papers	20
Reports - Descriptive	18
Tests/Questionnaires	13
Dissertations/Theses -…	12
Information Analyses	7
Opinion Papers	5
Guides - Non-Classroom	2
Reports - General	2
Collected Works - Proceedings	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - Classroom - Teacher	1
Guides - General	1
Reference Materials -…	1
More ▼

Education Level

Higher Education	37
Postsecondary Education	30
Elementary Education	29
Early Childhood Education	13
Secondary Education	13
High Schools	11
Elementary Secondary Education	10
Middle Schools	8
Primary Education	7
Grade 6	6
Grade 1	5
Preschool Education	5
Grade 3	4
Kindergarten	4
Grade 2	3
Grade 8	3
Junior High Schools	3
Adult Basic Education	2
Adult Education	2
Grade 4	2
Intermediate Grades	2
Grade 12	1
More ▼

Audience

Researchers	6
Practitioners	3
Teachers	2
Administrators	1

Location

Australia	7
China	4
Florida	4
United Kingdom	4
United States	4
Illinois	3
Israel	3
Vermont	3
Connecticut	2
Egypt	2
Germany	2
India	2
Minnesota	2
Netherlands	2
New Jersey	2
Nigeria	2
Norway	2
Pakistan	2
Pennsylvania	2
Portugal	2
South Korea	2
Spain	2
Texas	2
Turkey	2
United Kingdom (England)	2
More ▼

Laws, Policies, & Programs

Elementary and Secondary…	2
Elementary and Secondary…	1
Every Student Succeeds Act…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 210 results Save | Export

Benchmark Rating Procedure, Best of Both Worlds? Comparing Procedures to Rate Text Quality in a Reliable and Valid Manner

Peer reviewed

Direct link

Bouwer, Renske; Koster, Monica; van den Bergh, Huub – Assessment in Education: Principles, Policy & Practice, 2023

Assessing students' writing performance is essential to adequately monitor and promote individual writing development, but it is also a challenge. The present research investigates a benchmark rating procedure for assessing texts written by upper-elementary students. In two studies we examined whether a benchmark rating procedure (1) leads to…

Descriptors: Benchmarking, Writing Evaluation, Evaluation Methods, Elementary School Students

A Novel Means-End Problem-Solving Assessment Tool for Early Intervention: Evaluation of Validity, Reliability, and Sensitivity

Peer reviewed
PDF on ERIC

Download full text

Direct link

Baraldi Cunha, Andrea; Babik, Iryna; Koziol, Natalie A.; Hsu, Lin-Ya; Nord, Jayden; Harbourne, Regina T.; Westcott-McCoy, Sarah; Dusing, Stacey C.; Bovaird, James A.; Lobo, Michele A. – Grantee Submission, 2021

Purpose: To evaluate the validity, reliability, and sensitivity of the novel Means-End Problem-Solving Assessment Tool (MEPSAT). Methods: Children with typical development and those with motor delay were assessed throughout the first 2 years of life using the MEPSAT. MEPSAT scores were validated against the cognitive and motor subscales of the…

Descriptors: Problem Solving, Early Intervention, Evaluation Methods, Motor Development

Studying Score Stability with a Harmonic Regression Family: A Comparison of Three Approaches to Adjustment of Examinee-Specific Demographic Data

Peer reviewed

Direct link

Lee, Yi-Hsuan; Haberman, Shelby J. – Journal of Educational Measurement, 2021

For assessments that use different forms in different administrations, equating methods are applied to ensure comparability of scores over time. Ideally, a score scale is well maintained throughout the life of a testing program. In reality, instability of a score scale can result from a variety of causes, some are expected while others may be…

Descriptors: Scores, Regression (Statistics), Demography, Data

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Peer Overmarking and Insufficient Diagnosticity: The Impact of the Rating Method for Peer Assessment

Peer reviewed

Direct link

Van Meenen, Florence; Coertjens, Liesje; Van Nes, Marie-Claire; Verschuren, Franck – Advances in Health Sciences Education, 2022

The present study explores two rating methods for peer assessment (analytical rating using criteria and comparative judgement) in light of concurrent validity, reliability and insufficient diagnosticity (i.e. the degree to which substandard work is recognised by the peer raters). During a second-year undergraduate course, students wrote a one-page…

Descriptors: Evaluation Methods, Peer Evaluation, Accuracy, Evaluation Criteria

Different Methods for Assessing Pre-Service Teachers' Instruction: Why Measures Matter. EdWorkingPaper No. 23-862

Download full text

Arielle Boguslav; Julie Cohen – Annenberg Institute for School Reform at Brown University, 2023

Teacher preparation programs are increasingly expected to use data on pre-service teacher (PST) skills to drive program improvement and provide targeted supports. Observational ratings are especially vital, but also prone to measurement issues. Scores may be influenced by factors unrelated to PSTs' instructional skills, including rater standards…

Descriptors: Preservice Teachers, Student Evaluation, Evaluation Methods, Preservice Teacher Education

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Linear and Nonlinear Indices of Score Accuracy and Item Effectiveness for Measures That Contain Locally Dependent Items

Peer reviewed

Direct link

Pere J. Ferrando; David Navarro-González; Fabia Morales-Vives – Educational and Psychological Measurement, 2025

The problem of local item dependencies (LIDs) is very common in personality and attitude measures, particularly in those that measure narrow-bandwidth dimensions. At the structural level, these dependencies can be modeled by using extended factor analytic (FA) solutions that include correlated residuals. However, the effects that LIDs have on the…

Descriptors: Scores, Accuracy, Evaluation Methods, Factor Analysis

Proposed Metrics for Summarizing Student Evaluation of Teaching Data from Balanced Likert Scale Surveys

Peer reviewed

Direct link

Abdel Azim Zumrawi; Leah P. Macfadyen – Cogent Education, 2023

Student Evaluations of Teaching (SETs) gather crucial feedback on student experiences of teaching and learning and have been used for decades to evaluate the quality of teaching and student experience of instruction. In this paper, we make the case for an important improvement to the analysis of SET data that can further refine its interpretation.…

Descriptors: Likert Scales, Student Evaluation of Teacher Performance, Student Attitudes, Reliability

Developing a Game-Based Test to Assess Middle School Sixth-Grade Students' Algorithmic Thinking Skills

Peer reviewed
PDF on ERIC

Download full text

Emre Zengin; Yasemin Karal – International Journal of Assessment Tools in Education, 2024

This study was carried out to develop a test to assess algorithmic thinking skills. To this end, the twelve steps suggested by Downing (2006) were adopted. Throughout the test development, 24 middle school sixth-grade students and eight experts in different areas took part as needed in the tasks on the project. The test was given to 252 students…

Descriptors: Grade 6, Algorithms, Thinking Skills, Evaluation Methods

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

A Comparative Judgement Approach to the Large-Scale Assessment of Primary Writing in England

Peer reviewed

Direct link

Wheadon, Christopher; Barmby, Patrick; Christodoulou, Daisy; Henderson, Brian – Assessment in Education: Principles, Policy & Practice, 2020

Writing assessment is a key feature of most education systems, yet there are limitations with traditional methods of assessing writing involving rubrics. In contrast, comparative judgement appears to overcome the reliability issues that beset the assessment of performance assessment tasks. The approach presented here extends previous work on…

Descriptors: Foreign Countries, Writing Evaluation, Elementary School Students, Evaluation Methods

Processes and Procedures for Estimating Score Reliability and Precision

Peer reviewed

Direct link

Bardhoshi, Gerta; Erford, Bradley T. – Measurement and Evaluation in Counseling and Development, 2017

Precision is a key facet of test development, with score reliability determined primarily according to the types of error one wants to approximate and demonstrate. This article identifies and discusses several primary forms of reliability estimation: internal consistency (i.e., split-half, KR-20, a), test-retest, alternate forms, interscorer, and…

Descriptors: Scores, Test Reliability, Accuracy, Pretests Posttests

Assessment of Interrater and Intermethod Agreement in the Kinesiology Literature

Peer reviewed

Direct link

Looney, Marilyn A. – Measurement in Physical Education and Exercise Science, 2018

The purpose of this article was two-fold (1) provide an overview of the commonly reported and under-reported absolute agreement indices in the kinesiology literature for continuous data; and (2) present examples of these indices for hypothetical data along with recommendations for future use. It is recommended that three types of information be…

Descriptors: Interrater Reliability, Evaluation Methods, Kinetics, Indexes

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 14

ProQuest LLC	11
Educational and Psychological…	10
Assessment for Effective…	5
ETS Research Report Series	4
Grantee Submission	4
Journal of Psychoeducational…	4
Regional Educational…	4
Assessment in Education:…	3
International Journal of…	3
Language, Speech, and Hearing…	3
Multivariate Behavioral…	3
Online Submission	3
Psychology in the Schools	3
Advances in Health Sciences…	2
Educational Assessment	2
Educational Researcher	2
English Language Teaching	2
International Educational…	2
International Journal of…	2
Journal of Chemical Education	2
Journal of Deaf Studies and…	2
Journal of Educational…	2
Language Learning	2
Measurement and Evaluation in…	2
Physical Review Physics…	2
More ▼

Thompson, Bruce	4
Cook, Colleen	3
Erford, Bradley T.	3
Gill, Brian	3
Koretz, Daniel	3
Snyder, Patricia A.	3
Algina, James	2
Booker, Kevin	2
Bruch, Julie	2
Deno, Stanley L.	2
Friedman, Greg	2
Lembke, Erica S.	2
McLaughlin, Tara W.	2
McMaster, Kristen L.	2
Michaels, Hillary	2
Ochieng, Charles	2
Raykov, Tenko	2
Yen, Shu Jing	2
A. C., John	1
Abdel Azim Zumrawi	1
Abedi, Jamal	1
Abu-Hamour, Bashir	1
Algozzine, Bob	1
Algozzine, Kate	1
More ▼

ACT Assessment	4
Bayley Scales of Infant…	3
Dynamic Indicators of Basic…	2
Iowa Tests of Basic Skills	2
National Assessment of…	2
Preliminary Scholastic…	2
Stanford Achievement Tests	2
Teacher Rating Scale	2
Wechsler Intelligence Scale…	2
Woodcock Johnson Tests of…	2
Aberrant Behavior Checklist	1
Beck Anxiety Inventory	1
Behavior Assessment System…	1
Center for Epidemiologic…	1
Child Behavior Checklist	1
Classroom Assessment Scoring…	1
College Level Examination…	1
Collegiate Assessment of…	1
Comprehensive Tests of Basic…	1
Early Childhood Longitudinal…	1
Florida Comprehensive…	1
Graduate Management Admission…	1
Mayer Salovey Caruso…	1
Measures of Academic Progress	1
National Assessment of Adult…	1
More ▼