Comparison of Generative AI Performance on Undergraduate and Postgraduate Written Assessments in the Biomedical Sciences.

Andrew Williams

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1439226

Record Type: Journal

Publication Date: 2024

Pages: 22

Abstractor: As Provided

ISBN: N/A

ISSN: N/A

EISSN: EISSN-2365-9440

Comparison of Generative AI Performance on Undergraduate and Postgraduate Written Assessments in the Biomedical Sciences

Andrew Williams

International Journal of Educational Technology in Higher Education, v21 Article 52 2024

The value of generative AI tools in higher education has received considerable attention. Although there are many proponents of its value as a learning tool, many are concerned with the issues regarding academic integrity and its use by students to compose written assessments. This study evaluates and compares the output of three commonly used generative AI tools, ChatGPT, Bing and Bard. Each AI tool was prompted with an essay question from undergraduate (UG) level 4 (year 1), level 5 (year 2), level 6 (year 3) and postgraduate (PG) level 7 biomedical sciences courses. Anonymised AI generated output was then evaluated by four independent markers, according to specified marking criteria and matched to the Frameworks for Higher Education Qualifications (FHEQ) of UK level descriptors. Percentage scores and ordinal grades were given for each marking criteria across AI generated papers, inter-rater reliability was calculated using Kendall's coefficient of concordance and generative AI performance ranked. Across all UG and PG levels, ChatGPT performed better than Bing or Bard in areas of scientific accuracy, scientific detail and context. All AI tools performed consistently well at PG level compared to UG level, although only ChatGPT consistently met levels of high attainment at all UG levels. ChatGPT and Bing did not provide adequate references, while Bing falsified references. In conclusion, generative AI tools are useful for providing scientific information consistent with the academic standards required of students in written assignments. These findings have broad implications for the design, implementation and grading of written assessments in higher education.

Descriptors: Content Area Writing, Artificial Intelligence, Writing Assignments, Biomedicine, Science Education, Technology Uses in Education, Natural Language Processing, Cues, Essays, Undergraduate Study, Graduate Study, Writing Evaluation, Higher Education

BioMed Central, Ltd. Available from: Springer Nature. 233 Spring Street, New York, NY 10013. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-348-4505; e-mail: customerservice@springernature.com; Web site: https://bibliotheek.ehb.be:2129/gp/biomedical-sciences

Publication Type: Journal Articles; Reports - Research

Education Level: Higher Education; Postsecondary Education

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A