NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED663257
Record Type: Non-Journal
Publication Date: 2024-Sep-20
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
LOOL: Flexible & Robust Estimator of Heterogeneous Treatment Effects
Duy Pham; Kirk Vanacore; Adam Sales; Johann Gagnon-Bartsch
Society for Research on Educational Effectiveness
Background: Education researchers typically estimate average program effects with regression; if they are interested in heterogeneous effects, they include an interaction in the model. Such models quantify and infer the influences of each covariate on the effect via interaction coefficients and their associated p-values or confidence intervals. However, while these models produce interpretable estimates, they also constrain the estimates to be linear. More recently, some researchers have been examining incorporating machine learning (ML) methods -- off-the-shelf or otherwise -- for more flexible estimation. While more precise than interaction models, these approaches' estimates are often uninterpretable -- especially with powerful but non-parametric methods like causal forests [Athey and Wager, 2019]. This work proposes a two-stage procedure to estimate heterogeneous treatment effects -- dubbed the "Leave-One-Out Learner" or LOOL, an extension of the "Leave-One-Out Potential Outcomes" (LOOP) estimator first described by Wu and Gagnon-Bartsch [2018]. Purpose: In its first stage, we estimate and impute individual treatment effect (ITE) with the LOOP estimator -- using more flexible methods like random forests [Breiman, 2001]. In the second stage, we can then regress these estimates on the covariates -- using a parametric model like simple least-squares or Lasso regression [Tibshirani, 1996] -- to produce interpretable, parameterized estimates of the conditional average treatment (CATE) with which researchers can draw inferences without any additional step. The first stage allows the LOOL to be more flexible than interaction-based regressions, which restricts predictions to be linear, while the second stage allows it to be more interpretable than non-parametric methods like causal forests [Athey and Wager, 2019]. Moreover, if the experiment design is valid and the modeling process satisfies the leave-one-out requirement outlined by Wu and Gagnon-Bartsch [2018], the first-stage ITE from the LOOP estimators will be unbiased. In turn, this robustness provides researchers a great deal of flexibility: they can choose whichever model best suits their needs in the second stage without worrying about carrying over and further amplifying potential bias from the first. In addition, as recent studies [Sales et al., 2018, 2023, Gagnon-Bartsch et al., 2023] have shown, researchers can further boost the precision of estimates by incorporating auxiliary data outside of the experiment. Data: Our data came from Decker-Woodrow et al. [2023], which examined the effectiveness of three computer-based learning platforms (CBLPs) -- From Here to There (FH2T), DragonBox, and ASSISTments [Heffernan and Heffernan, 2014]. The study examined four conditions: FH2T and DragonBox -- where assigned students simply worked with respective CBLPs -- and Immediate Feedback and Active Control -- which were both administered through ASSISTments. In the latter two conditions, students could either request assistance in the form of hints or only receive them upon submitting their answers. The original sample included 3,612 students. However, due to COVID-19, only 1,852([approximately equal to] 51.2%) of the sample completed the posttest. The data is available upon request. Methodology: Let potential outcomes Y[subscript i](1) and Y[subscript i](0) represent the outcome value Y[subscript i] that subject i [epsilon] {1, . . . , n} would have exhibited if they were assigned to treatment or control, respectively [Rubin, 1974, Splawa-Neyman et al., 1990]. Subject i's ITE is thus [tau][subscript i] = Y[subscript i](1)-Y[subscript i](0). If treatment Z is randomly assigned, with [double-struck P](Z[subscript i] = 1) = p[subscript i], then Y[subscript i]U[subscript i] is an unbiased estimator of [tau][subscript i], where U[subscript i] = 1/p[subscript i] if Z[subscript i] = 1 and -1/(1-p[subscript i]) if Z[subscript i] = 0. Unfortunately, the large sampling variance of Y[subscript i]U[subscript i] renders it impractical as an estimator of [tau]. Wu and Gagnon-Bartsch [2018] modified this quantity to obtain a more efficient estimate [tao with caret] = (Y[subscript i]-[m with caret][subscript i])U[subscript i] where m[subscript i] = (1-p[subscript i])Y[subscript i](1)+pY[subscript i](0). This estimate is unbiased if [m with caret][subscript i] [double up tack] U[subscript i] and thus [m with caret][subscript i] [double up tack] Z[subscript i]. As its name suggests, the LOOP estimator achieves this by cross-fitting potential outcome models [mu][superscript T][subscript i] (·) and [mu][superscript C][subscript i] (·) separately for each observation i with n - 1 observations j [not equal to] i. Moreover, [tao with caret][subscript i] will also be precise if [mu with caret][superscript T][subscript i] (·) and [mu with caret][superscript C][subscript i] (·) are accurate. Consider the definition of the CATE: [tau](x) = [double struck E][Y[subscript i](1) - Y[subscript i](0)|X[subscript i] = x] = [double struck E][tau][subscript i]|X[subscript i] = x] [Imbens and Rubin, 2015]. If, somehow, [tau] were observed, one could estimate the CATE by regressing them on the covariates X, thus estimating [double struck E][tau][subscript i]|X[subscript i] = x]. However, since we observe either Y[subscript i](1) or Y[subscript i](0) corresponding to Z[subscript i], [tau][subscript i] is unobservable. Nevertheless, we can instead use the LOOP estimator's accurate ITE estimates to estimate [tau](x), since [double struck E][tau with caret][subscript i]|X[subscript i] = x] = [double struck E][tau][subscript i]|X[subscript i] = x]. Results: Table 1 shows regression models estimating the CATE of each treatment over Active Control with LOOL. We first utilized a random forest for the LOOP estimator to estimate the ITE. We then regressed these estimates on six covariates to estimate the CATE: the student's pretest score, their mathematics anxiety, whether they started the school year remotely, whether they were under an Early Intervention Program Plan (EIP), whether they received accommodations in the form of either an IEP or Section 504, and whether English is their second or foreign language (ESOL). The results replicated the main findings in Decker-Woodrow et al. [2023], that FH2T and DragonBox had positive average effects, and that the effect of FH2T increased with pretest scores. Moreover, our results suggest that FH2T's and DragonBox's effects were lower for ESOL students, and may have been negative. Table 2 shows an interaction model estimating the posttest -- with coefficients of the treatment and its interactions being analogs to those in Table 1. With only one exception, standard errors of coefficients from the LOOL are all smaller. This result suggests the LOOL can produce more precise estimates than a strictly linear interaction model. Conclusion: The estimation of heterogenous effects is crucial in education research, where the student body is often diverse thus having different strengths and weaknesses along with education needs. While precise and accurate estimates are always desirable, interpretable estimates provide researchers with more information to better understand underlying causal heterogeneity. We believe the LOOL can serve as a sweet spot between flexibility and interpretability in estimation.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A