NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED659738
Record Type: Non-Journal
Publication Date: 2023-Sep-30
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Which Observational Study Designs Routinely Produce Causal Estimates That Approximate Experimental Estimates within Narrow Bounds?: A Meta-Analysis of 46 Educational Design Experiments
Thomas Cook; Mansi Wadhwa; Jingwen Zheng
Society for Research on Educational Effectiveness
Context: A perennial problem in applied statistics is the inability to justify strong claims about cause-and-effect relationships without full knowledge of the mechanism determining selection into treatment. Few research designs other than the well-implemented random assignment study meet this requirement. Researchers have proposed partial solutions that rely on non-randomized comparison groups and baseline covariates. Despite their limitations in providing unbiased causal inference in practice, observational studies are needed to compare outcomes between non-randomly formed treatment and comparison groups. To measure potential bias, researchers conduct within study comparisons (WSC) that compare causal estimates from observational studies and experiments that test the same hypothesis using the same treatment group, outcome, estimand, and other design particulars (LaLonde 1986; Cook et al. 2008). Objective: This paper conducts a meta-analysis of 46 WSCs that compare both mean bias and its variance over eight feasible observational designs. These designs result when three design elements are factorially combined: (1) the presence or not of a pretest measure of the outcome, (2) whether the non-equivalent comparison group is local to the treatment group or not, and (3) whether there are or are not at least four other covariates. Thus, one design includes all three elements; another has none; and the other six include any one or any two. Research Design: The sample consists 46 WSC studies that focus on educational interventions and outcomes. We evaluate standardized bias estimates at three levels: contrast, hypothesis, and study. Contrast-level bias refers to the bias estimate for each outcome, population, and analytical choice. In this study, we analyzed 603 contrasts from 46 WSCs, where both standardized bias estimates and approximate standard errors were available. For the remaining contrasts, we could only obtain standardized bias estimates, as the papers lacked the necessary information to calculate the standard error associated with the bias estimate. Hypothesis-level bias is the average of contrast-level bias estimates in a specific outcome domain. A study can contain multiple hypotheses. This analysis aims to answer the question of how the intervention affected the full study sample's math performance, rather than its effects on different locations or target groups (Friedlander and Robins 1995). In our sample, we have 247 hypothesis-level bias estimates. Study-level bias is the average bias across all hypothesis-level estimates within a single study. This approach has been commonly used in previous syntheses of WSC results, as demonstrated by studies such as Chaplin et al. (2018) and Coopersmith et al. (2022). This paper uses the frequentist multi-level model proposed by Pustejovsky and Tipton (2021), which employs a random-effects meta-regression method implemented through the R package metafor, and cluster-robust variance estimation (CRVE) implemented through the R package clubSandwich. This approach is most appropriate for this paper as this model allows each design subgroup to have its own estimate of average bias and variation in bias at the contrast, hypothesis and study levels. Using this within-subgroup variation, random effects are computed for each of the eight design subgroups. The bias calculated for a design subgroup is independent of the other subgroups. Standard errors are clustered at the study-level. Results: Our subgroup correlated and hierarchical effects analysis reveals that the average bias of the standardized bias estimates lies within 0.1 standard deviations of the experimental estimate for all the eight design subgroups (see Figure 1). The average standardized bias is at or close to 0 for several of these subgroups. The variance around the average bias reflects how consistently studies produce treatment effect estimates that are comparable to those from randomized trials. At the top, the design subgroup D does not include any of the three key design elements and has the largest standard deviation. In contrast, design subgroup L_P_C, which contains all three key design elements, namely local comparison group, pretest measure of the outcome and a set of rich covariates, has the smallest standard deviation and consistently performs well across different specifications and levels of analysis. Figure 2 compares the predicted percentages of effect sizes that are close to unbiased for the three levels of analysis. We find that two design subgroups perform consistently well across all three levels of analysis. For subgroups L_P_C and D_P_C, 90% or more studies deliver effect sizes within 0.1 standard deviations of the experimental estimate. Both these subgroups contain a pretest as well as rich covariates and one of them also utilizes a local comparison group. Two more subgroups show promising results - the subgroup of studies that use a pretest and rich covariates (D_P_C) and that use a local comparison group in conjunction with a pretest (L_P). Over 75% of studies using these design combinations deliver effect sizes within 0.1 standard deviations of the experimental estimate at all three levels of analysis. Conclusions: In summary, our analysis reveals that the average bias falls within 0.1 standard deviations of the experimental estimate for all the eight observational study designs. Moreover, as more design elements are incorporated, the standard deviation of bias tends to decrease, and when all three elements are utilized, there is a high likelihood (at least 90 percent) that bias will fall within 0.1 standard deviations of the experimental estimate. Notably, quasi-experiments that include a pretest measure of the outcome generally outperform other designs in reducing bias, and a pretest plus four other covariates is as effective as a quasi-experiment with all three elements. Based on a meta-analysis of bias estimates from 46 studies, our findings indicate (1) which quasi-experimental designs are preferable and (2) which ones consistently yield causal estimates that closely align with those from a comparable randomized experiment.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Information Analyses; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A