ERIC Number: EJ1441636
Record Type: Journal
Publication Date: 2024
Pages: 11
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-2320-2653
EISSN: EISSN-2582-1334
The Reliability of Using ChatGPT in Rating EFL Writings
Shanlax International Journal of Education, v12 n4 p49-59 2024
This paper explores the reliability of using ChatGPT in evaluating EFL writing by assessing its intra- and inter-rater reliability. Eighty-two compositions were randomly sampled from the Written English Corpus of Chinese Learners. These compositions were rated by three experienced raters with regard to 'language', 'content', and 'organization'. The writing samples were also rated by ChatGPT twice over some time, and the average scores were calculated. Independent samples t-test was conducted to compare the average scores given by ChatGPT and human raters. Pearson correlation analyses were conducted between the two sets of overall scores given by ChatGPT to calculate the intra-rater reliability, as well as between average scores given by ChatGPT and human raters for inter-rater reliability. The results of comparative analysis shows that ChatGPT may be used for evaluating EFL essays, as the scores are similar to those provided by reliable human raters. However, the result of correlation analyses shows that the intra-rater reliability of ChatGPT is not high enough to be acceptable, r=0.575, p<0.01 and the strength of the inter-rater reliability is moderate as well, r=0.508, p<0.01. Besides, there is no significant relationship between their average scores on 'organization' of the writings, r=0.181, p>0.05. Thus, it can be concluded that ChatGPT is not a reliable tool to rate and score EFL writings using the prompt in this study. One of the possible reasons for the unreliability of ChatGPT as a rater of EFL writing seems to be related to scoring for the 'organization' of the essay. These findings imply that while ChatGPT has potential as an evaluative tool, its current limitations, particularly in assessing organization, must be addressed before it can be reliably used in educational settings.
Descriptors: English (Second Language), Second Language Instruction, Writing (Composition), Evaluation Methods, Artificial Intelligence, Written Language, Interrater Reliability, Foreign Countries, College Students
Shanlax International Journals. 66, V.P. Complex, T.P.K. Main Road, Near KVB
Vasantha Nagar, Madurai Tamil Nadu 625003, India. e-mail: shanlaxjournals@gmail.com; Web site: http://www.shanlaxjournals.in/journals/index.php/education
Publication Type: Journal Articles; Reports - Research
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: China
Grant or Contract Numbers: N/A