MetaMate: Large Language Model to the Rescue of Automated Data Extraction for Educational Systematic Reviews and Meta-Analyses.

Xue Wang; Gaoxiang Luo

Background: Despite the usefulness of systematic reviews and meta-analyses, they are time-consuming and labor-intensive (Michelson & Reuter, 2019). The technological advancements in recent years have led to the development of tools aimed at streamlining the processes of systematic reviews and meta-analyses. Innovations such as Paperfetcher (Pallath & Zhang, 2023), which facilitates automated study searching, MetaReviewer, designed for collaborative data extraction, and machine learning algorithms for automated screening (Campos et al., 2024), represent significant progress towards efficiency. Nevertheless, the arena of automated data extraction still faces considerable challenges. Given that data extraction is both time-consuming and prone to errors, enhancing this phase through automation plus human-in-the-loop could substantially improve both the efficiency and effectiveness of the process (Higgins et al., 2023). Despite the development of several AI tools aiming to automate data extraction (Jonnalagadda et al., 2015; Marshall & Wallace, 2019), their applicability to educational systematic reviews and meta-analyses is limited due to several factors: 1) These tools are primarily validated with research from the medical and public health sectors, which differ significantly from educational research in scope and methodology (e.g., Pradhan et al., 2019). 2) Accessibility remains a barrier; many tools require a subscription (e.g., DistillerSR), placing them out of reach for many educational researchers. Even within the domain of AI data extraction tools, several inherent limitations further complicate their application: 1) Reliance on supervised machine learning algorithms that require large training databases (Chernikova et al., 2024). 2) Stopping at the Population, Intervention, Comparison, and Outcome (PICO) element detection and classification, failing to extract granular data elements (Wallace et al., 2016). 3) Acting as keyword search engines, lacking semantic understanding of study context (Rose et al., 2010). While existing AI tools share these common pitfalls, Large Language Models (LLMs) are emerging as an extremely capable technology (Wadhwa et al., 2023) for powering data extraction. Objectives: We have the following objectives for this project. 1. Develop an open-access web-based software, MetaMate, that can be used out-of-the-box, aimed at facilitating efficient (i.e., within seconds) automated data extraction for educational systematic reviewers and meta-analysts. 2. Propose an automated data extraction tool with LLMs that embraces zero-shot learning (i.e., in no need of training database) and text embedding (i.e., in no need of feature engineering). 3. Push beyond keyword search to semantic search, allowing mathematical reasoning in data extraction (e.g., add up sample sizes of treatment groups) and semantic understanding (e.g., even when study design is not explicitly stated, MetaMate can infer from the context). 4. Enable fine-grained data extraction beyond PICO element detection and classification, allowing the users to extract structured data elements out of unstructured text and export in a ready-to-use format (e.g., table). 5. Support variable text length for conference papers, journal articles and theses, enabling users to efficiently extract data from studies with the simplicity of one click and reducing the extraction time to mere seconds. 6. Engage with the research community to gather feedback and iteratively refine MetaMate, ensuring it meets the practical needs of its users. To the best of our knowledge, MetaMate is the first automated LLM-empowered data extraction tool for educational systematic reviews and meta-analyses. Prior to SREE 2024, Objectives 1-5 will be met. We hope to gather feedback, in part, from the SREE community and the field it represents. Research Design: The development of MetaMate was informed by a comprehensive fix-step process aimed at delivering an optimal user experience for researchers engaged in systematic reviews and meta-analyses. These steps included: (1) identifying the data elements required for extraction, (2) conceptualizing the hierarchical schema of MetaMate to extract data in high-level of granularity (e.g., participant?sample size?female sample size), (3) developing exhaustive prompt engineering with a novel chain-of-thought (CoT) tailored for data extraction for meta-analyses, (4) validating the MetaMate's effectiveness through comparative quantitative evaluation with human coders, (5) reiterate steps 2-4 until desired user experiences and test outcomes are satisfied. We used data from the authors' own meta-analysis projects (See Tables 2-3) to benchmark MetaMate's performance against human coders. Using MetaMate is straightforward: simply upload a PDF file of the study, and MetaMate will extract and return the relevant information within seconds. To facilitate reviewing of MetaMate, the following anonymous demo link can be used to test the tool: https://teaching-noticeably-elk.ngrok-free.app. Results: We identified 20 data elements from the Participant and Intervention categories within the PICO framework, comprising 10 participant-related data elements and 10 intervention-related data elements. To evaluate the performance of MetaMate, we referenced the coding outcomes of 32 empirical studies focused on interventions designed to enhance learner autonomy. Table 1 outlines the coding scheme used to extract the 20 identified data elements from the empirical studies. Each study was coded independently by two human coders. When disagreement arose, a third human coder was involved to resolve the conflicts until consensus was reached. Across the 20 data elements, the average agreement rate between MetaMate and human coders was 91.25%. Figure 1 presents the agreement rates between MetaMate and human coders for each of the 20 data elements. Tables 2 and 3 present the detailed coding results of MetaMate and human coders for participant-related data elements and intervention-related data elements, respectively. As shown in Figure 1, the agreement rates between MetaMate and human coders were higher than 80% for most data elements. Table 4 showcases instances where MetaMate exhibited capabilities in mathematical reasoning and semantic comprehension. Potential reasons for discrepancies between MetaMate and human coder are explored and detailed in Table 5. These insights pave the way for future enhancements to MetaMate. Conclusions: MetaMate leverages LLMs for automated data extraction in educational research synthesis. Despite the limited number of studies that the tool was tested on, MetaMate demonstrated an overall agreement rate of 91.25% with human coders. As an open-access web-based software, MetaMate has the potential to empower educational researchers by significantly reducing time and effort required for data extraction.