ERIC Number: ED661970
Record Type: Non-Journal
Publication Date: 2024-Jul
Pages: 6
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses
Sami Baral; Eamon Worden; Wen-Chiang Lim; Zhuang Luo; Christopher Santorelli; Ashish Gurung; Neil Heffernan
Grantee Submission, Paper presented at the International Conference on Educational Data Mining (17th, Atlanta, GA, Jul 2024)
The effectiveness of feedback in enhancing learning outcomes is well documented within Educational Data Mining (EDM). Various prior research have explored methodologies to enhance the effectiveness of feedback to students in various ways. Recent developments in Large Language Models (LLMs) have extended their utility in enhancing automated feedback systems. This study aims to explore the potential of LLMs in facilitating automated feedback in math education in the form of numeric assessment scores. We examine the effectiveness of LLMs in evaluating student responses and scoring the responses by comparing 3 different models: Llama, SBERT-Canberra, and GPT4 model. The evaluation requires the model to provide a quantitative score on the student's responses to open-ended math problems. We employ Mistral, a version of Llama catered to math, and fine-tune this model for evaluating student responses by leveraging a dataset of student responses and teacher-provided scores for middle-school math problems. A similar approach was taken for training the SBERT-Canberra model, while the GPT4 model used a zero-shot learning approach. We evaluate and compare the models' performance in scoring accuracy. This study aims to further the ongoing development of automated assessment and feedback systems and outline potential future directions for leveraging generative LLMs in building automated feedback systems. [This paper was published in: "Proceedings of the 17th International Conference on Educational Data Mining," edited by B. Paaßen and C. D. Epp, International Educational Data Mining Society, 2024, pp. 732-737. Funding for this paper was provided by the U.S. Department of Education's Graduate Assistance in Areas of National Need (GAANN).]
Publication Type: Reports - Research; Speeches/Meeting Papers
Education Level: Junior High Schools; Middle Schools; Secondary Education
Audience: N/A
Language: English
Sponsor: Institute of Education Sciences (ED); National Science Foundation (NSF); Office of Naval Research (ONR) (DOD); National Institutes of Health (NIH) (DHHS); Office of Postsecondary Education (ED); Office of Elementary and Secondary Education (OESE) (ED), Education Innovation and Research (EIR)
Authoring Institution: N/A
IES Funded: Yes
Grant or Contract Numbers: R305N210049; R305D210031; R305A170137; R305A170243; R305A180401; R305A120125; R305R220012; R305T240029; 2118725; 2118904; 1950683; 1917808; 1931523; 1940236; 1917713; 1903304; 1822830; 1759229; 1724889; 1636782; 1535428; N000141812768; R44GM146483; P200A120238; P200A180088; P200A150306; U411B190024; S411B210024; S411B220024
Author Affiliations: N/A