ERIC - Search Results

Publication Date

In 2025

Descriptor

Interrater Reliability	12
Evaluation Methods	5
Error of Measurement	4
Evaluation Criteria	4
Foreign Countries	3
Scoring	3
Artificial Intelligence	2
Communication Problems	2
Comparative Testing	2
Computer Assisted Testing	2
Evaluative Thinking	2
Grading	2
Measurement Techniques	2
Scoring Rubrics	2
Test Reliability	2
Test Validity	2
Young Adults	2
Academic Achievement	1
Achievement Rating	1
Active Learning	1
Adolescents	1
Advanced Placement Programs	1
Advocacy	1
Affective Behavior	1
At Risk Students	1
More ▼

Source

International Journal of…	2
American Journal on…	1
Assessment Update	1
British Educational Research…	1
Educational Measurement:…	1
Educational Process:…	1
Journal of Computer Assisted…	1
Language Testing	1
Measurement in Physical…	1
Oxford Review of Education	1
School Psychology Review	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	12

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	2
High Schools	1
Junior High Schools	1
Middle Schools	1

Audience

Location

Canada	1
Illinois (Urbana)	1
Ireland	1
Israel	1
Italy	1

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 12 results Save | Export

Test-Retest and Inter-Rater Reliability for Selected Outcomes from a Wearable 3D Inertial Sensor over Different Stable and Unstable Postural Conditions: A Validation Study

Peer reviewed

Direct link

Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025

The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…

Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

The Vague Language Use Scale: Clinical Utility and Psychometrics from Adults with Traumatic Brain Injury

Peer reviewed

Direct link

Kathryn J. Greenslade; Julia K. Bushell; Emily F. Dillon; Amy E. Ramage – International Journal of Language & Communication Disorders, 2025

Background: Pragmatic communication difficulties encompass many distinct behaviours, including the use of vague and/or insufficient language, a common characteristic following traumatic brain injury (TBI) that negatively impacts psychosocial outcomes. Existing assessments evaluate pragmatic communication broadly, often with only one or two items…

Descriptors: Neurological Impairments, Head Injuries, Language Impairments, Language Tests

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Profiling Communication Ability in Dementia: Validation of a New Cognitive-Communication Assessment Tool

Peer reviewed

Direct link

Suzanna Dooley; Tammy Hopper; Rachael Doyle; Orla Gilheaney; Margaret Walshe – International Journal of Language & Communication Disorders, 2025

Background: Individuals with dementia have communication limitations resulting from cognitive impairments that define the syndrome. Whereas there are numerous cognitive assessments for individuals with dementia, there are far fewer communication assessments. The Profiling Communication Ability in Dementia (P-CAD) was developed to address this gap.…

Descriptors: Communication Skills, Communication Problems, Dementia, Intellectual Disability

Informant Discrepancies in Universal Screening as a Function of Student and Teacher Characteristics

Peer reviewed

Direct link

Brittany N. Zakszeski; Heather E. Ormiston; Malena A. Nygaard; Kane Carlock – School Psychology Review, 2025

Despite the widespread use of school-based universal screening systems for social, emotional, and behavioral risk, limited research has examined discrepancies in ratings provided by teachers and their secondary students. Using the Social, Academic, and Emotional Behavior Risk Screener (SAEBRS; teacher report) and mySAEBRS (student report) scores…

Descriptors: Middle School Students, Middle School Teachers, Screening Tests, Affective Behavior

Statistically Guided Grading Judgements: Contextualisation or Contamination?

Peer reviewed

Direct link

Louise Badham – Oxford Review of Education, 2025

Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…

Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking

Do Source Use Features Impact Raters' Judgment of Argumentation? An Experimental Study

Peer reviewed

Direct link

Ping-Lin Chuang – Language Testing, 2025

This experimental study explores how source use features impact raters' judgment of argumentation in a second language (L2) integrated writing test. One hundred four experienced and novice raters were recruited to complete a rating task that simulated the scoring assignment of a local English Placement Test (EPT). Sixty written responses were…

Descriptors: Interrater Reliability, Evaluators, Information Sources, Primary Sources

Agency, Advocacy, Positionality: Bringing an Equity Mindset to Higher Education Assessment

Peer reviewed

Direct link

Beth K. Janetski; Mary K. Thompson – Assessment Update, 2025

The Grand Challenges Project supports global collaborations that inform equitable practices for assessment practitioners in higher education while identifying evidence-informed solutions. One goal named in the project is to leverage assessment findings to increase equity in higher education. This article focuses on the results from the Equity…

Descriptors: Higher Education, Educational Assessment, Global Approach, International Cooperation

AI-Assisted Assessment of Inquiry Skills in Socioscientific Issue Contexts

Peer reviewed

Direct link

Wen Xin Zhang; John J. H. Lin; Ying-Shao Hsu – Journal of Computer Assisted Learning, 2025

Background Study: Assessing learners' inquiry-based skills is challenging as social, political, and technological dimensions must be considered. The advanced development of artificial intelligence (AI) makes it possible to address these challenges and shape the next generation of science education. Objectives: The present study evaluated the SSI…

Descriptors: Artificial Intelligence, Computer Assisted Testing, Inquiry, Active Learning

Challenges in Using Parent-Reported Bed and Wake Times for Actigraphy Scoring in Rett-Related Syndromes

Peer reviewed

Direct link

Breanne J. Byiers; Alyssa M. Merbler; Chantel C. Burkitt; Frank J. Symons – American Journal on Intellectual and Developmental Disabilities, 2025

Sleep problems are common in Rett syndrome and other neurogenetic syndromes. Actigraphy is a cost-effective, objective method for measuring sleep. Current guidelines require caregiver-reported bed and wake times to facilitate actigraphy data scoring. The current study examined missingness and consistency of caregiver-reported bed and wake times…

Descriptors: Sleep, Neurodevelopmental Disorders, Psychomotor Skills, Genetic Disorders

Student Reflections on Peer Assessments: Benefits and Challenges in a Mathematics Class

Peer reviewed
PDF on ERIC

Download full text

Yaniv Biton – Educational Process: International Journal, 2025

Background/purpose: This study addresses the challenge of engaging students in meaningful assessment processes within mathematics education, particularly in pre-university preparatory courses. Peer assessment, a growing pedagogical practice, can enhance learning outcomes, promote critical thinking, and improve collaborative skills. However,…

Descriptors: Foreign Countries, Mathematics Education, College Preparation, College Bound Students

Alyssa M. Merbler	1
Amy E. Ramage	1
Beth K. Janetski	1
Breanne J. Byiers	1
Brittany N. Zakszeski	1
Cantor Tarperi	1
Chantel C. Burkitt	1
Diego Campaci	1
Emily F. Dillon	1
Fabrizio Garau	1
Federico Schena	1
Francesca Nardello	1
Frank J. Symons	1
Heather E. Ormiston	1
John J. H. Lin	1
Jonas Flodén	1
Julia K. Bushell	1
Kane Carlock	1
Kathryn J. Greenslade	1
Louise Badham	1
Malena A. Nygaard	1
Margaret Walshe	1
Mary K. Thompson	1
Orla Gilheaney	1
Ping-Lin Chuang	1
More ▼