ERIC Number: EJ1455049
Record Type: Journal
Publication Date: 2024
Pages: 4
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: EISSN-2056-7936
Evaluating Large Language Models for Criterion-Based Grading from Agreement to Consistency
Da-Wei Zhang; Melissa Boey; Yan Yu Tan; Alexis Hoh Sheng Jia
npj Science of Learning, v9 Article 79 2024
This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criterion-based grading with a detailed understanding of the criteria, underscoring the importance of domain-specific understanding over model complexity. These findings highlight the potential of LLMs to deliver scalable educational feedback.
Descriptors: Artificial Intelligence, Natural Language Processing, Criterion Referenced Tests, Grading, Prompting, Engineering, Benchmarking, Feedback (Response)
Nature Portfolio. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://www.nature.com/npjscilearn/
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A