Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains.

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1452603

Record Type: Journal

Publication Date: 2024

Pages: 11

Abstractor: As Provided

ISBN: N/A

ISSN: ISSN-1043-4046

EISSN: EISSN-1522-1229

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low

Advances in Physiology Education, v48 n4 p904-914 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to evaluate the accuracy and reliability of these LLMs in evaluating the achievement of learning outcomes across different cognitive domains in a scientific inquiry course on sports physiology. Human graders and three LLMs, GPT-3.5, GPT-4o, and Gemini, were tasked with scoring submitted student assignments according to a set of rubrics aligned with various cognitive domains, namely "Understand," "Analyze," and "Evaluate" from the revised Bloom's taxonomy and "Scientific Inquiry Competency." Our findings revealed that while LLMs demonstrated some level of competency, they do not yet meet the assessment standards of human graders. Specifically, interrater reliability (percentage agreement and correlation analysis) between human graders was superior as compared to between two grading rounds for each LLM, respectively. Furthermore, concordance and correlation between human and LLM graders were mostly moderate to poor in terms of overall scores and across the pre-specified cognitive domains. The results suggest a future where AI could complement human expertise in educational assessment but underscore the importance of adaptive learning by educators and continuous improvement in current AI technologies to fully realize this potential.

American Physiological Society. 9650 Rockville Pike, Bethesda, MD 20814-3991. Tel: 301-634-7164; Fax: 301-634-7241; e-mail: webmaster@the-aps.org; Web site: https://bibliotheek.ehb.be:2487/journal/advances

Publication Type: Journal Articles; Reports - Research

Education Level: Higher Education; Postsecondary Education

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Identifiers - Location: Singapore

Grant or Contract Numbers: N/A