NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
PDF on ERIC Download full text
ERIC Number: EJ1458432
Record Type: Journal
Publication Date: 2025
Pages: 40
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: EISSN-2157-2100
PharmaSimText: A Text-Based Educational Playground filled with RL-LLM Agents That Work Together Even in Disagreement
Bahar Radmehr; Tanja Kaser; Adish Singla
Journal of Educational Data Mining, v17 n1 p1-40 2025
There has been a growing interest in developing simulated learners to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the learner's ability to generalize skills across tasks. In this paper, we aim to enhance simulated learners' generalization capabilities in less-structured text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) "RL-based" agents that utilize natural language for state and action representations, (ii) "LLM-based" agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid "RL-LLM" agents that combine these two strategies to improve agents' performance and generalizability. To support the development of these agents, we introduce PharmaSimText, a novel benchmark developed with expert-evaluated GPT-4 generations derived from a virtual pharmacy environment designed for practicing diagnostic conversations. After experimenting with RL-"based" and LLM-"based" agents using GPT-4 and open-source LLMs along with a wide range of strategies for combining them, we find that RL-"based" agents are good at completing tasks, but not at asking quality diagnostic questions. Conversely, LLM-"based" agents are better at asking diagnostic questions, but not at completing tasks. Finally, specific variations of hybrid "RL-LLM" agents enable us to overcome these limitations. Our findings highlight the potential of combining methods based on RL and LLMs in creating generalizable agents that have solutions close to human ones with the LLM component, while remaining faithful to controlled environments with the RL component. The source code and benchmark are available on GitHub.
International Educational Data Mining. e-mail: jedm.editor@gmail.com; Web site: https://jedm.educationaldatamining.org/index.php/JEDM
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A