NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED659618
Record Type: Non-Journal
Publication Date: 2023-Sep-29
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
The Implications of Propensity Score Augmentation for Generalization
Wendy Chan; Jimin Oh; Chen Li; Jiexuan Huang; Yeran Tong
Society for Research on Educational Effectiveness
Background: The generalizability of a study's results continues to be at the forefront of concerns in evaluation research in education (Tipton & Olsen, 2018). Over the past decade, statisticians have developed methods, mainly based on propensity scores, to improve generalizations in the absence of random sampling (Stuart et al., 2011; Tipton, 2013). Propensity scores model the sample selection process as a function of observable characteristics, or covariates, that affect both sample selection and moderate treatment effects (Kern et al., 2016). Once estimated, propensity scores are used to match or reweight the study so that the resulting sample is compositionally similar (based on the covariates) to individuals in the inference population. When certain core assumptions hold, propensity score-based methods can provide bias reduced estimates of population average treatment effects (PATEs; Chan, 2017). A core assumption needed for propensity scores is that the sample and population share common support (Tipton, 2014). This assumption requires that the propensity scores include all treatment effect moderators that also affect sample selection. Additionally, every school in the sample must have a comparable school(s) in the inference population, where comparability is based on the covariates used in the propensity scores. While the notion of common support centers on treatment effect moderators, these variables are generally difficult to identify in practice, primarily due to the small study samples seen in practice (Tipton, 2014; Tipton et al., 2017). Although the true treatment effect moderators may be unknown, there may be covariates that are correlated with the true moderators, leaving open the question of whether including variables that are proxy moderators affects estimation and inferences in generalization studies. Focus of Study: The purpose of the current study is to assess the effects of including proxy moderators, specifically weak proxies (variables that are weakly correlated with the true moderators), on inferences for generalization. The motivation for this focus is two-fold. First, many publicly available data sources include aggregate variables (at the school or district level), but the true moderators may be at the classroom or student level. As a result, aggregate variables may only be weakly correlated with the true moderators. Second, when researchers have access to a potentially large set of weak proxies, an important question is whether including them in the propensity score model and analysis yields any benefit for estimation. This can have implications for researchers when various types of covariates on a population are accessible and decisions are made about which ones to include in a generalization analysis. Research Design: We conducted a simulation study to examine the effects of propensity score augmentation with weak proxies. The simulation varied two main parameters: (i) covariate type based on their correlation with the true moderators of treatment impact and (ii) the number of weak proxy moderators included in the propensity score model. For each iteration, we generated nine covariates X_i, i=1,...,9, of which {X_1,X_2,X_3} were the true treatment moderators, {X_4,X_5,X_6} were strongly correlated with the moderators with an average correlation of ? ?=0.8, and {X_7,X_8,X_9} were moderately correlated with the moderators with ? ?=0.4. All covariates were simulated as Normal variables. In addition to the nine covariates, we simulated sets of covariates that had average correlations |(?|) ?<0.10 with the true moderators. Together, we examined three scenarios in which the propensity score model contained p=11, 41, and 81 weak proxies in addition to the strongly and moderately strong moderators {X_4,...,X_9}. Results and Conclusion: We applied four estimators of the population average treatment effect (PATE) to assess the effects of propensity score augmentation with weak proxies on their performance. Table 1 provides a description of each estimator. Table 2 provides summary statistics of the propensity score distributions in the simulated samples and populations. Note that the results are based on the realistic scenario when the true moderators {X_1,X_2,X_3} are unknown and not part of the estimated propensity scores. Table 3 provides the average absolute bias and root mean squared error (RMSE) of each estimator where both measures are computed in relation to the true simulated PATE. Table 2 illustrates that as more weak proxies are added to the propensity score model, the median and maximum estimated propensity score increases and the overall skewness of the distribution decreases in the sample. In contrast, the median propensity score in the population decreases with additional weak proxies, dropping to a value of 0.009 in the model with 81 weak proxies. This is accompanied by an overall increase in skewness. This implies that with more weak proxies added, there were some population schools with estimated propensity scores that were much smaller than average. The changes in skewness appear to be more pronounced in the sample (decrease of 40%) compared to the population (increase of 7%) between the first row (no weak proxies) and last row (81 total weak proxies). Table 3 shows that as the propensity score model was augmented with weak proxies, the average bias and RMSE consistently increased for IPW and this was followed by TMLE. In the final row with 81 (total) weak proxies, the average RMSE increased by 23% and 21% for IPW and TMLE, respectively. The results for subclassification were smaller where the average RMSE increased by 15% when the number of weak proxies increased from 41 to 81. For BART, the results were inconsistent though the average bias and RMSE were higher when the model included weak proxies compared to the model without them. This is potentially due to the increased variability from including weak proxies in the analysis. These results highlight two main implications. First, adding weak proxies affects the propensity scores for both the sample and population by affecting the median and skewness of the distributions. Second, augmenting the propensity score model with weak proxies affects the bias of estimators like IPW and TMLE the most. However, the pace at which the bias changed was not necessarily aligned with the number of weak proxies included. Additionally, estimators that do not depend on the propensity score, like BART, are also potentially impacted when weak proxies are incorporated in the analysis.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A