Content Relatability and Test Score Disparities: Evidence from Texas.

Steven Lee; Matthew Schaelling

Background: Inequality along racial and economic dimensions is well-documented and widespread in educational contexts. Achievement gaps are observed among children as early as primary school and are especially notable in standardized testing (Fryer & Levitt, 2004; Fryer & Levitt, 2013; Bond & Lang 2013). In response, some observers and policymakers have called for a deeper understanding of this testing gap and the mechanisms that produce these differences. While some have suggested abandoning standardized testing entirely, others insist that measurement is essential to accountability and progress, especially regarding racial and economic equality in education. If standardized tests fail to accurately measure achievement of students across all backgrounds, understanding the exact mechanisms by which it happens is crucial. One possible pathway might be the degree to which educational content is "relatable'' to certain groups of students, whereby students learn or perform better when they encounter concepts or topics with which they are familiar or in which they hold interest. Several papers in educational psychology show that interest in a topic can impact performance on reading comprehension tests and that interests diverge by race and gender (Bray & Barron, 2004; Asher, 1979). Other existing research in this area focuses on the representation of different identity groups in educational materials, such as children's literature (Adukia et al., 2023). Objective: In this research, we measure the impact of a student's relatability to a reading passage on standardized test performance. Complementing existing research on representation of identity groups in educational content, we consider the possibility that different racial/ethnic groups have divergent experiential knowledge or interest in topics which may appear in educational content. After determining a relationship between relatability and test performance, we measure the extent to which relatability contributes to observed racial/ethnic test gaps. We hypothesize that relatability impacts test performance, which implies that relatability of test content are contributing to test score gaps between racial or ethnic groups. Setting: To conduct our research, we analyze end-of-year standardized tests for primary school students in Texas from 2013-2019. We focus on the reading comprehension section of the test for 3rd to 8th graders. For each grade and year, we use item-level responses from every test taker and the full text of the tested reading passages for our analysis. Population: Our student sample consists of more than 13 million student-exam observations. The plurality of students in our student population are Hispanic (46%), while 34% are non-Hispanic white, and 15% are non-Hispanic Black (see Table 1). Research Design: We first develop a measure of "content relatability'' for a racial/ethnic group to a piece of text through a novel measure constructed using administrative survey data on demographic differences from the American Time Use Survey and natural language processing models. We apply this measure to our sample of reading comprehension passages. This generates the relatability of each passage to each racial/ethnic group represented in our student sample. In order to identify the causal effect of this relatability measure to student performance, our estimation strategy relies only on the portion of relatability that is considered conditionally random across test takers and tests. We argue that including race-grade fixed effects and passage-level fixed effects is sufficient for identification in our context, following closely the estimation framework laid out in Borusyak et al. (2022). Results: We find that the increased relatability of a reading comprehension passage causally raises student performance on questions connected to the passage. We estimate in our preferred regression specification that a standard deviation increase in relatable topics in a passage leads to a 1.7pp increase in the probability of correct answers on that passage (see Table 2). This effect is equivalent to a 0.07 standard deviation increase in passage-level performance and a 0.05 standard deviation increase in test scores. Since test makers ultimately select passages for inclusion in exams, we consider a variety of placebo outcomes with the same model specification, including prior student performance and demographic composition, and find no effect. We next investigate the extent to which content relatability contributes to racial disparities in test scores. We first show non-parametrically that at the passage-level, there is a correlation between relatability differences across groups and performance differences across groups (see Figure 1). Using our formal regression estimates, we find that equalizing content relatability across groups would lead to a 4% smaller Black-white test gap and 4% smaller Hispanic-white test gap. We counterfactually suppose how test scores would adjust if relatability had been set to a level which most closely equalizes relatability differences across race. We show that 1% of Black students in elementary school (grades 3 to 5) could have achieved a higher state-determined, reading comprehension standard if they took a test with more racially equal relatability. Overall, we counterfactually estimate over 11,000 Black students and 15,000 Hispanic students during our sample period were designated to be at a lower reading comprehension level due to relatability. Conclusions: Our results have implications both for test writers and education policymakers. First, it highlights that in order to write balanced assessments, test makers should take into account not only the identities of characters, but also the general content of the passage or question itself as we show that this may influence performance. Second, when policymakers consider outcome differences along demographic dimensions, one additional component to examine might be the standardized tests used to calculate those differences. However, we also note that the contribution of test construction to the gaps we find are both non-negligible and modest; that is, they cannot explain a substantial portion of why Black and Hispanic students on average have lower performance on tests than white students.