ERIC Number: ED658556
Record Type: Non-Journal
Publication Date: 2022-Sep-23
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
An Aggregation Scheme for Increased Power in Primary Outcome Analysis
Timothy Lycurgus; Ben B. Hansen
Society for Research on Educational Effectiveness
Background: Efficacy trials in education often possess a motivating theory of change: how and why should the desired improvement in outcomes occur as a consequence of the intervention? In scenarios with repeated measurements, certain subgroups may be more or less likely to manifest a treatment effect; the theory of change (TOC) provides guidance as to which are best situated to benefit. Our method, Power-maximizing Weighting for Repeated-measurements with Delayed-effects (PWRD aggregation) converts the TOC into a test statistic with greater asymptotic relative efficiency and thus, greater statistical power than extant alternatives when the TOC proves true. We illustrate the benefit of our method through its application on a cluster randomized trial testing the efficacy of BURST[R]: Reading (BURST), a program designed to assist elementary students falling behind in reading ability. The TOC behind BURST holds program benefits to be delayed and non-uniform, experienced only after a student's performance stalls. As a result, the treatment effect should not be constant. Rather, its effects should be greatest following points when student learning has stalled. Accordingly, beginning from separate estimates of the average treatment effect for different subgroups and occasions of follow-up, PWRD aggregation combines these effect estimates with attention to both their expected sizes and their mutual correlations with one another, which are deduced from the intervention's environing TOC. Purpose: The objective behind PWRD aggregation is to increase the asymptotic relative efficiency and thus, improve the statistical power in primary outcome analysis compared to extant alternatives when the governing theory of change is accurate. Research Design: In education settings, students often enter and exit the study at various points so researchers frequently have different numbers of observations on each student. For example, BURST examined a reading intervention for early elementary students across four years. Table 1 illustrates our design for the first of four cohorts. Data sources for similarly structured efficacy trials will incorporate a similar design, with varying numbers of observations on any given student. Thus, the method chosen to handle repeated measurements is of great importance not only in BURST but in other longitudinal settings as well. Perhaps the simplest way to make use of repeated measurements is to fit a linear model predicting student-year observations from independent variables identifying the cohort and time of follow-up; if we assume these observations are independent, then each observation receives equal weight. Thus, we refer to this method as "flat weighting". While simple, flat weighting is inefficient because student-year observations are not independent. Instead, many education researchers fit a mixed effects model, allowing for clustering within students, classrooms, or schools. But despite the ensuing improvement in efficiency, the mixed effects model fails to take into account which subgroups and occasions of follow-up are best situated to exhibit a treatment effect. This is especially true when effects are delayed rather than instantaneous so students only receive the intervention once their reading performance stalls. We see this phenomenon arise in Table 2 for the first cohort of students participating in BURST. We observe larger differences between treatment and control means as students participate in the study for longer. In later years, students have had a greater opportunity to require supplemental instruction and thus, to benefit. PWRD aggregation looks to use the TOC to deduce which subgroups are most likely to benefit and then weights the separate effect estimates for each subgroup in a manner that will best allow a researcher to detect an effect of the intervention using the aggregated statistic. PWRD Aggregation: Define [delta subscript gt] as the parameter representing the intention-to-treat (ITT) effect for cohort g during year of follow-up t: [delta subscript gt] = E(Y[subscript gt(Z=1)] - Y[subscript gt(Z=0)]|G=g,T=t), where Z=1 and Z=0 denote assignment to the treatment and control respectively. Now take some hypothesis test of no-effect such that [delta subscript agg]=[sigma subscript g,t] [omega subscript gt][delta subscript gt] is zero or negative against the alternative that [delta subscript agg]>0. Here, [omega subscript gt] represents the aggregation weight attached to [delta subscript gt]. If the following assumptions about the theory of change hold, PWRD aggregation will improve power to detect an effect compared to common alternatives: Individuals who receive supplemental instruction as a result of the intervention at time j receive an effect [tau] [greater than or equal to] 0 at some point between j and when they exit the study. Individuals who do not receive supplemental instruction are unaffected. Effect [tau] received by individual i at time j is retained in full throughout the duration of the study. Under these conditions, the following weights [omega] maximize the asymptotic relative efficiency of the aggregated test statistic: [omega] [direct proportionality] p[sigma superscript -1], where p[subscript gt] = P(An individual in cohort g is eligible to receive the supplemental instruction by year of follow-up t) and [sigma] denotes the expected relative covariances between the ITT estimates. For more details, see the appendix. Results: We use a simulation design mirroring BURST to compare power under PWRD aggregation with power under flat weighting, a mixed model with school-level random effects, and a method that solely analyzes students when they exit the study. Initial effects are generated for all students as follows: Y[subscript ijk] = [beta subscript 0] + [beta subscript 1]Grade[subscript ijk] + [mu subscript k] + [epsilon subscript ijk] [mu subscript k] = [gamma subscript 0] + ?[subscript k], with ?[subscript k] ~ N(0,[xi]). The outcome Y[subscript ijk] of student i in year of follow-up j at school k is a function of the grade of the student and the random intercept of the school at which the student is enrolled, [mu subscript k]. Then we impose artificial treatment effects on students assigned to treatment schools in a manner consistent with BURST's TOC. Figure 1 presents our findings. It is apparent that PWRD aggregation provides far greater power than other methods while maintaining proper Type I error rates. This holds across both small and large effect sizes. Under relaxed conditions allowing for interference within schools, PWRD aggregation provides similar gains in power. Conclusion: In this paper, we present an aggregation scheme that maximizes power when the theory of change holds. We demonstrate this through a simulation study mirroring a large-scale randomized trial for a reading intervention.
Descriptors: Educational Change, Educational Research, Intervention, Efficiency, Measurement Techniques, Evaluation Methods, Randomized Controlled Trials, Reading Programs, Reading Instruction, Program Effectiveness, Elementary School Students, Reading Ability, Reading Achievement, Error of Measurement
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: Elementary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A