The Quality of Educational Systematic Reviews: A Methodological Overview of Reviews.

Dylan Dachet; Coline Legros; Marta Pellegrini; Ariane Baye

Background: A systematic review is a secondary research method that follows a predetermined protocol to identify, synthesize, and assess the quality of primary studies that meet pre-established eligibility criteria to answer a specific research question (Higgins & Green, 2011; Higgins & Thomas, 2019). It uses systematic and transparent procedures (Alexander, 2020) to minimize bias (Li Wan Po, 1998) with the aim at drawing reliable conclusions and support policy and practice decision-making. In recent years, the number of systematic reviews published has steadily increased (Bastian et al., 2010; Smela et al., 2023). However, this increase has not been always followed by an improvement in the quality of the procedures used. Previous research has assessed the quality of systematic reviews published in the educational sciences and showed that (a) search strategy is mostly not comprehensive (e.g., searching gray literature, hand searching) (Hew et al., 2021; Sanchez de Ribera et al., 2020); (b) quality appraisal of primary studies is rare and not always based on common standards (e.g., WWC guidelines, Risk of Bias criteria) (Maeda et al., 2022); (c) characteristics of primary studies are incompletely reported (Gaias et al., 2020; Nelson et al., 2022). However, these reviews focused either on a specific topic (e.g., effect of flipped classrooms, effect of mathematics interventions for students with learning disabilities), or on a single quality criterion (e.g., description of study samples). Our research aims to provide more comprehensive information on the quality of systematic reviews published in educational field. Research Questions: 1. To what extent do systematic reviews in education meet the reporting standards of PRISMA-2020 (Page et al., 2021) and the methodological features included in AMSTAR-2 (Shea et al., 2017)? 2. Are there differences in the quality of systematic reviews published between 2000 and 2022? Research Design: A Methodological Overview of Reviews (MOR) (Biondi-Zoccai, 2016) was conducted to assess the quality of systematic reviews published after 2000. MOR aims to "include multiple systematic reviews and assess their quality, without, however trying to associate it with their results" (Papageorgiou & Biondi-Zoccai, 2016, p. 59). The protocol has been pre-registered on OSF: https://osf.io/j3978/?view_only=5657b2093e2045c2b609da80f7289300. Data Collection and Analysis: A bibliographical search was carried out on ERIC and on four educational journals focused on research synthesis (i.e., Review of Educational Research, Review of Research in Education, Educational Research Review and Review of Education). After searching, we included systematic reviews based on the following criteria: reviews that synthesize primary studies conducted in a training or educational setting; reviews that clearly stated that they were systematic reviews. Both qualitative and quantitative systematic reviews were eligible to be included in this review. We located 3,349 records. Of them, 1,586 records meeting our eligibility criteria were included. The selection process was carried out in double-blind, with an overall agreement of 85%. Disagreements were resolved through discussions between the two reviewers and a third external reviewer. Since we were not able to assess 1,586 reviews, we used a two-step random stratified sampling strategy. We placed records in five blocks based on the publication year (i.e., 2000-2004, 2005-2009, 2010-2014, 2015-2019, 2020-2022) as well as in two categories based on journal's focus (i.e., records published in specialized journals vs. other journals indexed on ERIC). For each block of years, we selected 30 reviews published in specialized journals and 30 reviews published in other journals. This sampling strategy allowed us to include 234 reviews. We extracted relevant characteristics of the quality of reviews based on PRISMA (e.g., searching, eligibility criteria, risk of bias assessment) and AMSTAR-2 (e.g., number of databases, backward search, characteristics of primary studies). Two independent reviewers carried out data extraction for 33% of the included articles. The overall agreement reached 85% on PRISMA items and 86% on AMSTAR-2 items. Results: Of the 234 articles included (listed in Appendix 1), 103 were published in specialized journals and 131 were published in other journals indexed on ERIC. Table 1 reports review characteristics related to research design, research question (whether reviews were focused on the effectiveness of an intervention), and the conduction of a meta-analysis. The size of our sample of systematic reviews may vary according to criteria, as some of them focus solely on meta-analysis, on RCT/QED or on effectiveness questions. Figures 1 to 3 show a wider overview of our main results. First, our results highlight some concerns about reporting and comprehensiveness of search strategy. On the one hand, 35% of the included reviews do not describe information sources used and 87% of the included reviews do not report the full search strings. On the other hand, only 1% of these reviews fully met the criterion of AMSTAR-2 relating to the quality of the search strategy, because of the limited used of expert(s) consulting, grey literature search, backward search and trial search (see table 2). Furthermore, double-blind procedures were rarely used during the selection process (26%) and during the extraction process (41%) of included reviews. Our MOR also shows that description of included studies often lacks details. Only 27% of the reviews provided an assessment of the quality of primary studies included. Furthermore, only 38% of the 128 included systematic reviews about effectiveness of an intervention provided a description of included studies according to the PICOS framework. Only eight reviews provided information on protocol registration prior to conducting the review. Finally, we observed positive changes over time for some criteria (e.g., comprehensiveness of search strategy, used of double-blind procedures). However, declines in quality have also been observed for others (e.g., reporting of the full search strings, conducting of a quality appraisal). Conclusions: The results of this MOR show that systematic reviews do not meet several quality criteria considered. Changes over time will be discussed more in depth in light of the increasing use of systematic reviews.