Volume 61, Issue 1 p. 329-351
RESEARCH ARTICLE
Open Access

The impact of intervention modality on students' multiplication fact fluency

Kourtney R. Kromminga

Corresponding Author

Kourtney R. Kromminga

Department of Educational Psychology, University of Minnesota, Minneapolis, Minnesota, USA

Intervention Services and Leadership in Education Department, Wichita State University, Wichita, Kansas, USA

Correspondence Kourtney R. Kromminga, Wichita State University, 1845 Fairmount St., Hubbard Hall Box 142, Wichita, KS 67260, USA.

Email: [email protected]

Search for more papers by this author
Robin S. Codding

Robin S. Codding

Department of Applied Psychology, Northeastern University, Boston, Massachusetts, USA

Search for more papers by this author

Abstract

While an abundance of evidence-based mathematics interventions are available to support students’ mathematics fluency, research suggests that they are not often implemented within schools. Moreover, national data suggests that most students in the United States are not meeting grade level expectations in mathematics. This underperformance coupled with limited implementation of evidence-based practices emphasizes the need to provide schools with effective, efficient, and resource friendly interventions to support students’ mathematics achievement. The purpose of this study was to compare the effects of reciprocal peer tutoring, iPad delivered flashcards, and a combined intervention condition on the multiplication fluency of 72 third grade students enrolled in two public schools in the midwestern region of the United States using a randomized pretest–posttest design. Students received 10 min of intervention after their core instruction 4 days per week for 5 weeks. Results suggest that there were few differences between treatment groups on both proximal and distal measures. Moreover, all groups demonstrated significant growth from pre- to posttest on proximal measures. Assessment response rates across modalities and student engagement are discussed along with implications for research and practice.

Practitioner points

  • Students across all treatment groups (peer tutoring, iPad flashcards, and a combined intervention) improved their multiplication fact fluency.

  • There were few differences between groups at posttest on proximal measures of multiplication fluency and distal measures of math achievement between groups.

  • Students across treatment groups scored consistently higher in digits correct per minute when completing the paper-pencil curriculum-based measure (CBM) compared with the iPad CBM.

1 INTRODUCTION

Students in the United States perform below students from other developed countries in basic mathematics computation according to the 2015 Trends in International Mathematics and Science Study (TIMSS; NCES, 2015). On the National Assessment of Educational Performance (NAEP) in mathematics, students should perform at or above the 'proficient’ achievement level, defined as a student demonstrating competency over challenging subject matter, indicating “solid academic performance” (NCES, 2021). The performance of fourth grade students in the United States on the 2022 NAEP mathematics assessment indicates that 64% of these students are not scoring at or above the proficient level (NCES, 2022). These scores indicate that most students in the United States are not able to demonstrate competence in mathematics on this assessment and the data has consistently demonstrated this level of underachievement since the 1990s (NCES, 2022). Given that mastery of basic number operations is a prerequisite skill students need to have success in more advanced mathematics topics (e.g., rational number knowledge, algebraic principles, and word problem solving) these outcome data highlight a critical need for students to receive more support to become proficient with foundational mathematics skills (Fuchs et al., 2016; Marcelino et al., 2017). Moreover, a recent study found that at the beginning of grade 4 computational proficiency had a direct impact on mathematics problem solving for students at the end of fourth grade (Kaskens, et al., 2022).

Despite an abundance of evidence-based interventions shown to improve student performance with whole number operations (Codding et al., 2011; Powell et al., 2017), survey results indicate that these interventions are not being implemented in schools due to numerous barriers including a lack of (a) time to implement, (b) intervention materials, (c) personnel, (d) adequate training, and (e) fit within the curriculum (Long et al., 2016). This dearth of implementation coupled with the widespread chronic underachievement of US students in mathematics highlights the importance of providing a menu of effective class wide interventions. However, few studies have compared evidence-based practices that have similar instructional design principles to determine if the practices produce similar student gains. Specifically, both technology-mediated and reciprocal peer tutoring (RPT) interventions have yielded moderate to large effects on students’ whole number knowledge (Alegre et al., 2019; Petersen-Brown et al., 2019). However, little is known about how these options compare to one another. Previous researchers have highlighted the importance of such comparisons to provide critical information about (a) instructional design principles, (b) impacts on student performance, and (c) resources needed for implementation (Petersen-Brown et al., 2019). The purpose of the present study is to compare the effects of three class wide interventions (e.g., a peer-mediated intervention, a technology-mediated intervention, and a combined condition) across technology and paper-pencil delivered assessments of one aspect of whole number operations: multiplication fact fluency.

1.1 Importance of whole number knowledge

Mathematics fluency is the automatic recall of basic mathematics facts which is measured by determining whether student responding is rapid and accurate (National Council of Teachers of Mathematics [NCTM], 2014). From a theoretical perspective, building fluent responding is a prerequisite for the retention, maintenance, and generalization of the skill to other academic tasks (Maki et al., 2021). The instructional hierarchy, which is a learning hierarchy to describe how a learner develops a new skill (Haring et al., 1978), is supported by research which shows that proficiency with basic facts is a significant predictor of later mathematics achievement such as rational number knowledge, algebraic principles, and word problem solving (Fuchs et al., 2016; Marcelino et al., 2017). Specifically, research indicates that proficiency with basic facts plays a fundamental role in the acquisition of higher-level mathematics by reducing the cognitive load of advanced mathematics problem-solving (Price et al., 2013). Moreover, the development of basic fact fluency is associated with positive educational outcomes such as attending college, solving complex mathematics problems, and interpreting abstract mathematics principles (Fuchs et al., 2016; Marcelino et al., 2017). The critical importance of the development of basic fact fluency coupled with the pervasiveness of student underachievement in mathematics calls for intervention to take place at a class wide level.

Within a multitiered system of support framework (MTSS), tier 1 (i.e., universal instruction) includes the delivery of an evidence-based core curriculum and effective instructional practices to all students to ensure that most students (i.e., about 80%) achieve proficiency (McMaster & Fuchs, 2016). Unfortunately, commonly used mathematics curricula do not provide sufficient opportunities to practice for students to become proficient (Doabler et al., 2012), which may contribute to the persistent underachievement of United States (NCES, 2022). Solutions to this problem begin by addressing foundational skill gaps during core instruction. When school level performance falls below 80% proficiency or displays low overall levels of mathematics performance, one of the first steps is to provide class wide interventions that target foundational skill gaps (VanDerHeyden et al., 2012). Thus, educators need to supplement core curricula with evidence-based interventions facilitating adequate and explicit practice opportunities that educators can apply in the classroom (Long et al., 2016; VanDerHeyden & Codding, 2020).

1.2 Evidence-based interventions

The use of carefully constructed practice opportunities and immediate feedback for improving student mathematics performance is supported by decades of research (e.g., Codding et al., 2011; Twyman & Sota, 2016). Technology and peer-mediated interventions allow students multiple opportunities to practice mathematics facts with immediate, corrective feedback (Klingbeil et al., 2020; Tamim et al., 2011). Moreover, these evidence-based interventions can be embedded into regular classroom routines during core instruction (Klingbeil et al., 2020; Tamim et al., 2011). Both approaches offer opportunities to differentiate instruction, require few materials, and minimal training is needed for successful implementation. The alleviation of these implementation barriers coupled with the positive impacts on student performance across skills make these two interventions not only positive for students but also feasible for classroom teachers to implement.

1.2.1 Peer tutoring

Peer tutoring is a peer-mediated intervention in which student dyads work together to complete a learning task (Klingbeil et al., 2020). Peer tutoring provides frequent opportunities for students to practice and verbalize learning with their peer (Fuchs & Fuchs, 2017; Klingbeil et al., 2020). RPT, in which each student within a dyad serves as both tutor and tutee, has shown great promise in promoting the mathematics achievement of students in grades K through 12 (Alegre et al., 2019; Klingbeil et al., 2020). Moreover, research indicates that peer tutoring may benefit students across a range of demographic characteristics (e.g., those with low socioeconomic status, emergent bilinguals, students receiving special education services) with effect sizes ranging from small to moderate across student populations (Alegre et al., 2019; Klingbeil et al., 2020). In addition to the academic benefits of peer tutoring, previous research indicates that students who take part in peer tutoring interventions demonstrate an increase in the amount of time they spend academically engaged (Bowman-Perrott et al., 2014; Klingbeil et al., 2020). This increased engagement has been correlated to higher scores at posttest (Bowman-Perrott et al., 2014).

1.2.2 iPad-delivered flashcard drill

Technology delivered instruction is now commonplace in the classroom with the growing popularity of touch devices (e.g., tablets and iPads) and Chromebooks (Petersen-Brown et al., 2019). The NCTM (2014) have called professionals to incorporate technology into mathematics teaching for students at all grade levels to alleviate implementation barriers, provide individualized targeted instruction, and increase opportunities to practice. Technology-mediated flashcard drill, specifically, incorporates simple components such as opportunities to practice and feedback without increasing demands on teachers. Moreover, the use of technology-mediated interventions has been found to have a positive effect on student mathematics performance (Cheung & Slavin, 2013; Delgado et al., 2015; Zhang et al., 2015).

Despite the knowledge that technology-mediated drill has positive impacts on student performance, little is known about how iPad technology compares to traditional methods (e.g., peer-mediated, teacher-mediated, or self-managed interventions). Although some researchers may argue that all technology is created equal and as it is merely a delivery mechanism for instructional practices (Twyman & Sota, 2016), others believe that it is important to examine the effects of different types of technology to ensure the effects on student performance are the same as traditional methods (Duhon et al., 2012; Petersen-Brown et al., 2019). Moreover, a systematic review found that across 32 studies found that students using technology-mediated mathematics interventions were more engaged than those in a control or traditional method of intervention delivery (Fabian et al., 2016). Higher engagement has also been linked to greater improvement in mathematics scores when using technology to practice (Bowman-Perrott et al., 2014; Evans et al., 2015; O'Malley et al., 2014). One potential explanation for the increased engagement is that technology-mediated interventions may incorporate games and goal setting which could impact engagement. The widespread use of technology in schools coupled with the limited research regarding its impact on student performance compared with traditional evidence-based practices makes examining the effects of technology-mediated interventions critically important to ensure student success.

1.3 Comparing evidence based practices

Although the effects of RPT and technology-mediated drill and practice are well documented in research, few studies have compared the effects of the two modalities of intervention on student computation outcomes. Researchers have stressed the importance of comparing research-based interventions (Petersen-Brown et al., 2019). Comparing novel practices such as technology-mediated interventions to existing evidence-based practices is more informative than a no treatment or business as usual control condition because this comparison can provide information regarding how new practices perform in an applied setting (Petersen-Brown et al., 2019). There is also value to determining how a new practice compares to practices known to be beneficial (Kilgus et al., 2016; Petersen-Brown et al., 2019). Researchers have suggested that when comparing interventions, it is critical to ensure that they are as similar as possible to create rigorous research conditions which reduce the likelihood of drawing inappropriate conclusions (Petersen-Brown et al., 2019). As stated above, both technology-delivered drill and RPT have similar core components such as the presentation of a stimulus (e.g., mathematics fact), a response (e.g., verbal or typed), and immediate, corrective feedback.

1.4 Assessment modality

When assessing the effectiveness of technology-based interventions, it is critical to evaluate the modality of the assessment used to measure the intervention outcomes (Duhon et al., 2012). Assessment across modalities refers to measuring students’ skills in a different modality than the one the intervention is provided in (Duhon et al., 2012). Prior research has indicated that when an intervention is delivered by computer, but the progress monitoring tool is paper-pencil, skill transfer is not always observed (Duhon et al., 2012; Rich et al., 2017). Additionally, Rich and colleagues (2017) conducted a study which targeted the subtraction fact fluency of second graders who participated in one of three interventions (e.g., explicit timing [paper practice], computer-based drill and practice, or 4 days of computer drill and 1 day of explicit timing [combined practice]). Rich and colleagues (2017) found that for students in the combined condition, 1 day a week of paper-based practice (e.g., explicit timing) was sufficient to facilitate generalization across assessment types. The findings of these studies have implications for evaluating student outcomes because it suggests that alignment between the intervention modality and assessment tool matters. Specifically, the implications for the results of a combined technology and paper-based intervention facilitating generalization across assessment measures has great practical value because students in schools often encounter both paper-based and technology-mediated assessments at the classroom, state, national, and international levels. The generalization of combined interventions to both assessment modalities could provide students with the necessary practice to generalize their learning across modalities leading to an increase in their performance across assessments and improve the accuracy with which school professionals make decisions regarding student performance.

1.5 Purpose

The present study seeks to expand the existing literature by comparing three class wide interventions (i.e., RPT, iPad delivered flashcards, a combination of the two interventions) on student multiplication fluency. Consistent with previous research by Rich and colleagues (2017), a combined condition was included to determine if the effects of an intervention using multiple modalities could be replicated. Two proximal measures of multiplication fact fluency were included, one delivered via iPad and the other delivered via paper-pencil and two distal measures of mathematics achievement were used to examine the impacts of these interventions on student mathematics performance broadly. Additionally, this study sought to examine the response rate of students across assessment modality and treatment group on the proximal measures. It has been hypothesized that student response rate would be slower when typing than writing based on findings of previous research which found that students are slower at typing than writing (Hensley et al., 2017). Student academic engagement data were also collected to assess whether there are differences between the two selected interventions because previous research has suggested that both RPT and technology-mediated interventions promote student engagement (Evans et al., 2015; O'Malley et al., 2014). The research questions were as follows:
  • 1.

    What are the differences between reciprocal peer-tutoring, iPad delivered flashcards, and a combined intervention conditions on proximal measures of students’ multiplication fluency and distal measures of mathematics operations?

  • 2.

    What are the differences in the growth of the three treatment conditions (i.e., reciprocal peer-tutoring, iPad delivered flashcards, and a combined intervention) from pre- to posttest on proximal measures of multiplication fluency and distal measures of mathematics operations?

  • 3.

    What are the differences in students’ multiplication fluency between the iPad delivered and paper-pencil proximal measures?

  • 4.

    What are the generalized effects of single (i.e., technology or RPT) and mixed modality interventions across assessment modalities (i.e., iPad and paper-pencil)?

  • 5.

    What are the differences in students’ active engagement and off-task behaviors when using the iPad or participating in RPT?

  • 6.

    What are the differences in the student social validity of iPad fact practice compared with RPT practice?

2 METHOD

2.1 Participants and setting

Participants were third grade students recruited from the five third grade classrooms across two elementary schools within a suburban school district located in the Midwest United States. The school district served predominantly white (76.3%) students who were not eligible to receive free and reduce-priced lunch (82.8%). Upon receiving support from the University of Minnesota's IRB and the participating schools’ administrators, the first author invited all five third-grade classrooms across the two schools to participate in the present study (1 from school A and 4 from School B). Of the five classrooms, three teachers agreed to participate. Data collection was completed in the first 3 months of 2020. Passive parental consent forms were sent home with all 87 students in the three classrooms and were collected by the classroom teachers upon return. Of the 87 possible participants, parental consent and student assent were obtained for 76 students. Two participants moved out of the district after the start of the study and two participants consistently missed intervention sessions to receive special education services. Therefore, 72 participants made up this study (the classrooms had 31, 24, and 17 participants). Based on the results of a statistical sample size calculator (Erdfelder et al., 1996) for an analysis of covariance (ANCOVA) model with fixed and interaction effects, the desired sample size is 88 participants to achieve power of 80% with an α level of .0125 and effect size of f = 0.40. Accordingly, the recruited sample is slightly underpowered.

2.1.1 Participant demographics

The participants were 52.1% male, and the majority were White (81.94%), with the remaining students identified as Black (9.72%), Asian American (5.56%), and Latinx/Hispanic (2.78%). Most of the participants did not qualify for free or reduced priced lunch (80.8%), were not learning English as a second language (93.2%) and were not eligible to receive special education services (87.8%). Students were randomly assigned to be in one of three treatment conditions, (a) iPad, (b) RPT, or (c) combined. Students were randomized by assigning each a number and entering the number in a random list generator (random.org). The iPad condition included 24 students, the RPT condition included 23 students, and the combined condition included 25 students.

2.1.2 Instructional setting

All teachers were white females with masters’ degrees in education who were between the ages of 25 and 30 years of age and had 2–6 years of teaching experience. The school district did not employ an evidence-based core mathematics curriculum, but rather teachers from each grade level compiled lessons and taught grade-level content that aligned with common core state standards. Each classroom had a 50-min mathematics block per day which included 30 min of instruction and 20 min of practice which included a combination of iXL Learning on the Chromebook, teacher developed learning stations, and/or independent seat work. During the intervention, students participated in their assigned intervention during their 20 min practice block. Chromebooks were used for various activities across subjects, including mathematics. Classroom teachers reported that on average, students spent approximately 90% of their instructional time in mathematics using traditional methods (i.e., teacher-directed instruction, worksheets, and other non-technology activities). The students spent the remaining 10% of their instructional time engaged in various programs on their Chromebooks.

2.1.3 Intervention setting

At the time of the study, the first author was a fourth-year school psychology doctoral student who carried out all intervention procedures. The 20 min sessions were conducted in the students’ classroom during time reserved for mathematics. The classrooms were organized similar to mathematics “centers,” a structure commonly used in the classroom, such that students using the iPads were on one side of the room and those practicing with their peers were on the other side of the room. This was done to minimize the disruption to students in the iPad condition who were practicing independently on an iPad because the peer tutoring group was talking (e.g., mathematics fact recited, tutee answer, and tutor corrective feedback) during their practice.

2.2 Measures

The pre- and posttest measures used for all students consisted of two proximal measures of multiplication fact fluency and two distal measures of mathematics achievement. The proximal measures were three forms of a 2 min paper-pencil curriculum-based measure (CBM) and three forms of a 2 min iPad-delivered CBM. The median score for each student on each assessment was used. The two distal measures were the mathematics fluency and mathematics calculation subtests of the Woodcock-Johnson Fourth Edition, Tests of Achievement (WJ-IV; Schrank et al., 2014). The primary dependent variable in this study was student's scores in digits correct per 2 min (DC2M) on the paper-pencil CBM and the iPad delivered CBM at pretest and posttest. Additionally, students were observed during the intervention phase using momentary time sampling to measure their academic engagement and off-task behavior. Lastly, a social validity measure was administered to students during posttest to gather information related to the students’ acceptability of the intervention procedures.

2.2.1 Proximal and distal mathematics measures

The proximal and distal measures of multiplication fluency were administered to address research questions one through four. First, the students’ scores on all mathematics measures at posttest were used to examine the differences between groups. Second, students’ scores from pre- to posttest on all mathematics measures were used to examine differences in the rate of improvement between groups. Third, the posttest scores on each mathematics measure were examined to determine whether a mixed modality or single modality intervention resulted in greater generalization across assessment modalities. Lastly, the students’ scores were compared at pre- and posttest to examine differences in student fluency on paper-pencil and iPad delivered CBMs (e.g., proximal measures).

2.2.2 Paper-pencil CBMs

Six single skill multiplication CBM probes with numerals from 0 to 12 were obtained from AIMSweb (Shinn, 2004). Three probes were administered at pre-test and the remaining three forms were administered at posttest, with the median score among these three forms at each time point serving as the final score for each participant. The use of multiple probes has been shown to provide a more accurate score and reliability across alternate forms of a CBM (Methe et al., 2015). Standardized administration and scoring instructions were followed (Shinn, 2004). Alternate forms reliability of the paper-pencil CBM multiplication probes for this study was 0.82 on average (range, r = .63 to .95). The test-retest and alternate forms reliability coefficients for mathematics CBMs have been found to be moderate (Solomon et al., 2020; Strait et al., 2015). Criterion validity of single skill CBM measures ranges from 0.82 to 0.85 (Strait et al., 2015).

2.2.3 iPad CBMs

The iPad application (app) used to administer the iPad CBMs was Timed Test (FormSoft Group LTD, 2012). The paid version of Timed Test included multiplication facts and cost $4.99. The platform enabled the first author to: (a) set up a timed test for multiple users, (b) view the users previous test scores and answers, and (c) set the test time, number of problems, and desired operands included in the test. For the present study the first author entered each participant's name in the app and set the test settings to be consistent with the paper-pencil CBM probes, therefore, the timer was set to 2 min, the problems were presented vertically, and included 84 multiplication facts with operands between 0 and 12. Students were instructed to select begin and answer as many multiplication problems as possible in 2 min using the on-screen number pad and their finger. If students did not know the answer to a problem, they could skip the problem by swiping from right to left on the screen. The iPad app automatically stopped the students after 2 min. The app provided a summary of each completed assessment which the first author printed and subsequently scored using the same scoring procedures that were used for the paper-pencil CBM, described above. The mean alternate form reliability of the iPad delivered CBM computed for this study was 0.81 (range, r = .68 to .89). On average, concurrent validity between the paper-pencil CBMs and iPad CBMs in this study was .738 (range, r = .55–.87).

2.2.4 WJ-IV (Schrank et al., 2014)

Pretesting and posttesting included the mathematics fluency and calculation subtests of the WJ-IV. The calculation task is untimed and consisted of mathematics problems of progressive difficulty ranging from simple addition to advanced calculus. The fluency subtest is a 3-min timed test which included single digit addition, subtraction, and multiplication facts. Standardized administration and scoring procedures were used for both subtests. Students’ scores were computed as the total number of correct problems. Reliability of the calculation and fluency subtests is 0.91 and 0.95, respectively, for children ages 7–11 (McGrew et al., 2014). The concurrent validity of these mathematics subtests with the KTEA-3 and WIAT-III ranges from 0.68 to 0.77, for children ages 6–13 (McGrew et al., 2014).

2.2.5 Systematic direct observation

The fifth research question, aimed at examining the academic engagement and off-task behavior of students across treatment conditions, used the Behavioral Observation of Students in Schools (BOSS; Shapiro, 1996). The BOSS uses both partial interval and momentary time sampling to categorize students’ behavior into engaged time (e.g., passive, and active engaged time) or off task behavior (e.g., off-task motor, off-task verbal, and off-task passive). At the end of each interval, the observer recorded whether the student (iPad condition) or pair of students (RPT condition) were engaged or off-task. The observation lasted the duration of the 10-min intervention session for 15 s intervals. Active engaged time was defined as time spent participating with the intervention materials appropriately (e.g., peer tutors were practicing flashcards with one tutor and one tutee or the students using the iPad were typing answers to facts). Passive engaged time was defined as time spent participating in the tasks not directly related to solving a mathematics problem (e.g., waiting for partner to count the number correct, shuffle cards, or reviewing summary of practice session on the iPad). Off-task motor was defined as any movement from the designated practice spot (e.g., going to bathroom, sharpening pencil, wandering around the room), off-task verbal was defined as any talking that was not directly related to the task of solving mathematics facts (e.g., chatting with partner, talking to another student), and off-task passive was defined as any behavior that was not related to the intervention procedures (e.g., staring into space, changing settings on the iPad, drawing on the whiteboard, etc.). The first author used a random list generator to identify a pair of students within each class to conduct an observation during each session. This resulted in all dyads being measured at least one time and multiple groups having two observations. Each observation session was coded as either the student engaging in RPT or iPad delivered flashcards, meaning that the students in the combined group were separated into whichever intervention they engaged with at the time of the observation.

2.2.6 Acceptability survey

Due to the importance of assessing student acceptability of intervention procedures can improve intervention outcomes for students, increase motivation, and self-efficacy (Eckert et al., 2017; Mautone et al., 2009; Schunk, 1996; Stipek, 1996). The sixth research question, aimed at examining the social validity of the interventions according to student ratings, was examined using the Kid's Intervention Profile (KIP; Eckert et al., 2017). The KIP was adapted for this study and used to evaluate student acceptability of the intervention conditions. The KIP can be used with any academic intervention and the items are written for easy customization of different intervention procedures (e.g., How much do you like [insert specifics of intervention]; Eckert et al., 2017). Each item included a 5-point Likert scale with the response options; not at all, a little bit, some, a lot, and very, very much. Psychometric data for the measure includes alternate form reliability (r = .82 to .95). To score the KIP, the sum of the 8 items is calculated, where responses of “not at all” are equal to 1 and responses of “very, very much” are equal to 5. For two items (3 and 8) the opposite is true, meaning that for these items a response of “not at all” is equal to 5 and “very, very much” is equal to 1. The scores range from 8 to 40 with a score of 24 or higher indicating that the student is accepting of the intervention (Eckert et al., 2017).

2.3 Procedures

Pretesting occurred on two different school days in each participants’ classroom. Paper-pencil CBMs and the iPad CBMs were delivered to the participants by the first author. Each student was randomly assigned to either complete the three iPad CBMs before the paper-pencil CBMs or vice versa to control for order effects. Both the iPad and paper-pencil CBM probes were administered in a single session lasting 20 min (i.e., 2 min for each probe plus time for instructions and a 3-min stretch break after completing three probes). The classroom teachers assisted the first author and monitored students during the assessment. The second day of pre-testing was reserved for the WJ-IV fluency and calculation subtests. The fluency subtest was administered first (3 min), and the remaining time (approximately 20 min) was reserved for students to work on the calculation subtest.

2.3.1 Training

The day before commencement of the intervention procedures, the first author worked with participating students in small groups of seven to eight to demonstrate and model their assigned intervention procedures (iPad, RPT, or both). During this 15-min training, the first author: (a) reviewed expectations and intervention procedures, (b) introduced the materials to the students and instructed them on how to use each appropriately, (c) demonstrated the intervention procedures, and (d) supervised student practice with the intervention procedures and provided immediate corrective feedback as necessary.

2.3.2 Intervention conditions

Table 1 lists the intervention components and active ingredients for the iPad and RPT conditions. Intervention sessions occurred over 5 consecutive weeks with four sessions per week. Sessions lasted 20 min, including: (a) 5 min of introduction and preparation: (b) 10 min of practice; and (c) 5 min of clean up. During the 10-min intervention sessions, students practiced basic multiplication facts using the assigned procedures (i.e., iPad or RPT). To reward appropriate behavior, such as respecting others and property, students earned up to five stickers during each intervention session that were recorded on a chart, regardless of intervention condition (Bowman-Perrott et al., 2016). Each student who received 18 of 20 stickers for the week earned a small prize, such as a pencil, pen, eraser, small toy, or class game time.

Table 1. Core intervention components between iPad and peer tutoring conditions.
Comparing intervention components
iPad intervention Peer tutoring intervention
Procedures
Practiced independently
Practiced with peer
Student typed response
Student wrote and verbalized response
Vertical presentation of facts
Random presentation of facts
Multiplication facts 0–12 practiced
Practiced solving facts for 10 min
Practiced solving facts for 5 min
Acted as tutor to peer for 5 min
Visual error correction and feedback
Verbal error correction and feedback
Summary of number of correct facts
Completed what I know chart
Active ingredients
Cumulative review
Drill and practice
Feedback
Motivation
Student verbalizations
  • Note: The combined condition is not included in this table because this condition engaged in both above procedures, the iPad three times per week and the peer tutoring one time per week.

RPT condition

Using procedures consistent with previous research (Fantuzzo & Ginsburg-Block, 1998; Klingbeil et al., 2020), students were assigned to peer pairs based on their scores on the paper-pencil CBM at pretest. Specifically, when listing the class in order from highest score to lowest score, the list was split in half (highest performing half and lowest performing half), the student with the top score (student 1 in highest performing half) was partnered with student one from the lowest performing half (i.e., in a group of 10 students, from highest score to lowest score student 1 was matched with student 6 and student 5 was matched with student 10). Each student in the pair had the opportunity to practice as the tutor and tutee (e.g., each student practices 5 min as one role then switched roles with partner for final 5 min). The RPT condition consisted of six components: (a) tutor presents each card to the tutee, (b) the tutee writes the answer on a small whiteboard and shows it to the tutor, (c) the tutor gives corrective feedback and praise to tutee, (d) steps a-c continue until all have been presented and pair continues until timer sounds, (e) the cards are shuffled and the pair switches roles (i.e., tutor becomes tutee), (f) steps a–d are repeated with new tutor/tutee. At the beginning of each session, the first author set a timer for 5 min and peer pairs began practice with one student acting as the tutor and one acting as the tutee. With each presentation of a fact by the tutor, the tutee would write a response on their whiteboard. If the tutee could not answer in 3 s or did not get the answer correct, the tutor provided immediate, corrective feedback (e.g., nice try, 2 × 5 is 10). Then the tutor would place the incorrectly answered fact on a mat constructed of paper and laminated in the box labeled as “what I am working on.” If the tutee answered a fact correctly in 3 s or less, the tutor provided corrective feedback and praise (e.g., “great job, 3 × 3 is 9”). Then the tutor would place the correctly answered fact card in the “what I know” box on the mat.

iPad condition

The Mental Math Cards Challenge app (McNamara, 2017) was selected by the first author with the following considerations: (a) ease of use, (b) cost (i.e., free), (c) immediate, corrective feedback included, and (d) the drill and practice procedures used mirrored those of the RPT condition. The difficulty of the application was defaulted to easy (which contained facts with operands 0–12), the number of practice questions per deck was set to 40 (the highest option), and auto advance on incorrect answers was turned off (this allowed for corrective feedback to be provided). When the interventionist instructed students that the 10 min timer had started, students selected multiplication using the touch screen and then selected practice. The app flashcard interface featured a basic multiplication fact in the center of the screen in large print. Below the fact was a number pad where the students could type their answer. In the top right corner was a running total of the number of problems the student had answered correctly and incorrectly. The left corner displayed the question number out of 40, indicating the student's progress through the deck. If the student answered a question correctly, a green check mark appeared over the fact and the word “correct” was written beneath the answer, then the app proceeded to the next fact. If the student answered the problem incorrectly a red “x” appeared over the fact and below the fact the words, “The answer is […]” were displayed. To move to the next fact, the student pressed “next.” After the student completed the first 40 problems, the app provided “practice statistics” which was a summary of the following information: (a) number of questions completed, (b) number of correct and incorrect answers, and (c) accuracy.

Combined condition

Consistent with procedures from Rich and colleagues (2017) students in the combined condition followed procedures for the iPad intervention 3 days per week. On the fourth day, the pair followed the procedures for the RPT intervention.

2.3.3 Posttest

Posttesting occurred over 2 days and consisted of the same procedures as pretest. In addition, the students completed the acceptability survey at the end of the first day. The classroom teachers also completed a two-item survey that included a Likert scale ranging from 1 (not very likely) to 5 (very likely). The first item asked how likely the teacher was to use technology-delivered flashcards in their classroom in the future and the second question asked how likely they were to use reciprocal peer-tutoring in their classroom in the future.

2.4 Inter-scorer agreement (ISA) and procedural fidelity

ISA was assessed for 20% of sessions and was calculated by summing the total number of digits attempted and dividing by the total digits agreed upon and multiplying by 100. A fourth-year school psychology doctoral student completed ISA. The mean ISA on paper-pencil CBM was 99.65%, (range, 86%–100%) and iPad CBM was 99.82% (range, 93%–100%). The mean ISA for each of the WJ-IV subtests was 99.15% (range, 86%–100%).

Procedural fidelity was assessed for both interventionist procedures and student intervention procedures. Fidelity of interventionist procedures was assessed by the classroom teachers for 20% of sessions. Teachers and interventionists discussed intervention procedures before implementation and a six-item checklist detailing the procedural steps was used to monitor fidelity. Procedural fidelity was calculated by summing the number of correctly implemented steps, divided by the total number of steps on the implementation checklist, and multiplying by 100. Treatment adherence for the interventionist was 100% (range, 83%–100%). Fidelity of student procedures was assessed by the first author using a checklist detailing the procedural steps of each intervention. One pair of students was randomly assigned for a fidelity check before the start of the intervention sessions, and one pair of students were assessed in each classroom on each day of intervention. Treatment adherence was as follows for students in the iPad condition 99% (range, 67%–100%), RPT condition 99% (range, 92%–100%) and combined condition 99% (range, 92%–100%).

2.5 Research design and analysis

A pretest–posttest experimental design with random assignment across three conditions was used. Descriptive statistics of the four mathematics measures including means, standard deviations, and correlations were calculated. Additionally, data was examined using a one-way analysis of variance (ANOVA) and independent samples t-tests to evaluate the presence of pre-test differences between groups on each of the measures and order effects between participants who were tested on the iPad first versus those who used paper-pencil first at both pre- and posttest on the CBM measures.

To determine whether there were differences between treatment conditions (RQ 1) and whether the effects of the intervention modality generalized to cross modality assessments (RQ 4), four one-way between-groups ANCOVA models (i.e., one for each measure) were conducted. The independent variable in each model was the treatment condition (i.e., iPad, peer-tutoring, or combined) and the dependent variable was the posttest scores on each measure. Participants’ scores at pretest on each respective measure were used as the covariate in this analysis. The assumptions of the chosen inferential statistic, ANCOVA, were checked. Due to the multiple comparisons being conducted (e.g., four ANCOVA models for each measure) the Benjamini–Hochberg (B-H) procedure was used. This method was chosen due to its utility in controlling for false discovery rates and power in limiting type 1 errors (Palejev & Savov, 2019). Additionally, the B-H procedure allows the user to sequentially compare the observed p values for each dependent variable, based on their importance or expected level of effect (Palejev & Savov, 2019). Therefore, the B-H corrected level of significance for each dependent variable is as follows; iPad CBM (p = .03), paper-pencil CBM (p = .02), WJ-IV fluency subtest (p = .01), and WJ-IV calculation subtest (p = .001).

To determine whether there were differences in the growth of students across the four mathematics measures (RQ 2) and whether there were differences in the fluency of student responding across assessment modalities on the proximal measures (RQ 3), the rate of improvement (ROI) of each group on the two proximal CBM measures was calculated by subtracting the pre-test scores of each group from the posttest scores and then dividing by the number of weeks of intervention to get an average change in DC2M per week for each group on the measure.

Paired samples t-tests were conducted to evaluate differences in engagement and off-task behavior between groups (RQ 5). Each observation session was categorized as either taking place during the iPad or peer-tutoring practice. Therefore, the combined group was categorized into the type of practice the student was engaged in at the time of the intervention. Mean scores and standard deviations across treatments were used to evaluation social validity of the intervention procedures (RQ 6).

3 RESULTS

There was a strong correlation between the paper-pencil CBM and iPad CBM (r = .87) and a moderate correlation between the paper-pencil CBM and the WJ-IV fluency subtest (r = .57) and WJ-IV calculation subtest (r = .50). There was a strong correlation between the iPad CBM and the WJ-IV fluency subtest (r = .64) and a moderate correlation between the iPad CBM and the WJ-IV calculation subtest (r = .54). Lastly, there was a strong correlation between the WJ-IV fluency and calculation subtests (r = .66).

The results of a one-way ANOVA indicate that the three treatment conditions were not significantly different at pre-test: iPad CBM (F (2,67) = 1.79, p = .175), paper-pencil CBM (F (2,66) = 0.90, p = .41), WJ-IV fluency subtest (F (2,67) = 0.25, p = .778), or WJ-IV calculation subtest (F (2,68) = 0.77, p = .469).

Results of an independent samples t-test indicate there was no order effect, thus no significant differences were found between students who received the iPad CBM first at pretest and those who received it second (t (58.55) = 0.20, p = .841). The same is true for the paper-pencil CBM at pre-test (t (60.31) = 0.50, p = .618), the iPad CBM at posttest (t (49.87) = −0.13, p = .899), and the paper-pencil CBM at posttest (t (47.6) = −0.02, p = .984).

3.1 Between group differences on mathematics measures

There were no violation of the assumptions of an ANCOVA. Normality was evaluated through visual analysis of the Q-Q plots of each measure were not indicative of a violation to the assumption of normality. The relationship the dependent variable and the covariate were best fit through a linear model for each measure. The homogeneity of variances assumption was tested through a Levene's test and for each measure the test was not significant and therefore was not indicative of violations to the homogeneity of variance assumption. The homogeneity of regression slopes was tested by examining the significance of the interaction between the independent variable and covariate, none of these interactions were significant indicating that the homogeneity of regression slopes assumption was not violated.

Table 2 includes the results of the ANCOVA models across the four assessments. The first four ANCOVAs include treatment group as the main effect with pretest as a covariant. With regard to whether there are differences between the treatment groups on the mathematics measures, the results of an ANCOVA indicate that the main effect for treatment was significant for the iPad CBM at the B-H corrected α level of p = .03. Posthoc comparisons using the B-H corrected α indicate that there was a significant difference between the iPad condition and the combined condition on the iPad CBM (M difference = 11.93, SE = 4.65, p = .013), favoring the iPad condition. There was not a significant difference between the iPad condition and RPT condition (M difference = 10.49, SE = 4.78, p = .032) when using the B-H corrected α or the RPT condition and the combined condition (M difference = 1.44, SE = 4.83, p = .767). The remaining three ANCOVAs indicated that the main effect for treatment group was not significant for the paper-pencil CBM, WJ-IV fluency, or WJ-IV calculation assessments.

Table 2. ANCOVA main effect of treatment condition.
F test Np2 p
Paper-pencil CBM F(2, 66) = 0.64 0.02 .533
iPad CBM F(2, 65) = 3.90 0.11 .025*
WJ-IV Fluency F(2, 66) = 0.47 0.02 .630
WJ-IV Calculation F(2, 67) = 1.96 0.06 .150
  • Abbreviations: ANCOVA, analysis of covariance; CBM, curriculum-based measure; WJ-IV, Woodcock-Johnson Fourth Edition, Tests of Achievement.
  • * Significance at the Benjamini–Hochberg corrected α level associated with the assessment.

3.2 Within group differences on mathematics measures

Table 3 includes the means, standard deviations, and ROI for each treatment group across the four mathematics measures at pre- and posttest. ROI data indicate that differences were observed between treatment groups across the four measures from pre- to posttest. On the paper-pencil CBM, the combined group (ROI = 5.16) had a higher ROI than the RPT (ROI = 4.56) and the iPad (ROI = 4.01) groups. On the iPad CBM, the iPad group (ROI = 5.53) has a higher ROI than the peer tutoring (ROI = 3.40) and combined (ROI = 3.21) groups. On the WJ-IV fluency, the iPad group (ROI = 1.02) had a higher ROI than the combined (ROI = 0.73) and the RPT (ROI = 0.53) groups. Lastly on the WJ-IV calculation, the combined group (ROI = 0.28) had a higher ROI than the RPT group (ROI = 0.16). Of practical significance, on the paper-pencil CBM, the ROIs for the iPad condition (4.56 DC2M), RPT condition (4.01 DC2M), and combined condition (5.16 DC2M) surpassed the expected ROI on this measure for students receiving no intervention. According to Pearson (2011) the expected ROI for third grade students on the math measure is a mean of 0.72 DC2M per week.

Table 3. Within condition statistics across measures.
Pretest, mean (SD) Posttest, mean (SD) Rate of improvement (ROI)a
Paper-pencil CBM
iPad Condition 52.36 (16.36) 75.17 (27.18) 4.56
Peer Tutoring Condition 41.67 (16.42) 61.73 (21.86) 4.01
Combined Condition 45.17 (18.47) 70.97 (29.31) 5.16
iPad CBM
iPad Condition 34.89 (19.23) 62.55 (23.01) 5.53
Peer Tutoring Condition 31.93 (13.49) 48.95 (19.55) 3.40
Combined Condition 41.07 (17.11) 57.13 (26.70) 3.21
WJ-IV Fluency
iPad Condition 46.33 (22.25) 51.43 (15.60) 1.02
Peer Tutoring Condition 46.95 (11.53) 49.59 (13.93) 0.53
Combined Condition 51.52 (21.03) 55.17 (16.54) 0.73
WJ-IV Calculation
iPad Condition 27.86 (11.40) 26.50 (4.56) −0.27
Peer Tutoring Condition 24.95 (3.76) 25.77 (3.13) 0.16
Combined Condition 26.52 (2.92) 27.91 (3.01) 0.28
  • Abbreviations: CBM, curriculum-based measure; WJ-IV, Woodcock-Johnson Fourth Edition, Tests of Achievement.
  • a Rate of improvement is calculated by subtracting the posttest score from the pretest score and dividing the result by the number of weeks of intervention.

3.3 Fluency rates across assessment modality

Figure 1 includes a depiction of the rate of fluency across CBM modalities by group. When examining whether there is a difference in the rate of student responding across paper-pencil and iPad CBMs, all three groups demonstrated consistently higher rates of fluency on the paper-pencil measures at both pre- and posttest. The RPT and combined group demonstrated greater improvement in their multiplication fluency on the paper-pencil CBM at posttest than at pretest whereas the iPad group had similar rates of responding across both the paper-pencil and iPad CBMs with a slightly higher ROI (+0.97 DC2M per week) on the iPad CBM.

Details are in the caption following the image
Curriculum-based measure (CBM) fluency rates across groups. DC2M, digits correct per 2-min.

A paired samples t-test was used to examine differences in scores on the two CBM measures at both pre- and posttest. Results indicate that participants’ scores at pretest on both the iPad and paper-pencil CBM were significantly different (t (67) = 6.53, p = .000). Additionally, the participants’ scores at posttest were significantly different between the iPad and paper-pencil CBM (t (67) = 9.88, p = .000). These results indicate that at both pre- and posttest the participants had higher rates of DC2M on the paper-pencil CBM than the iPad CBM.

3.4 Generalization of interventions across assessment modality

When examining the generalization of single- and multi-modality intervention across assessment modalities, both the RPT and iPad group demonstrated growth from pre- to posttest on the proximal measures indicating that the results of a single modality intervention may generalize to a cross modality assessment. However, both groups demonstrated higher rates of improvement on the matched modality assessment which indicates that a matched intervention and assessment modality may be preferential to demonstrate maximum benefits of the intervention. In terms of skill generalization, all three treatment groups demonstrated improvements in their scores on the WJ-IV fluency which indicate that the effects of these multiplication fluency interventions may generalize to fact fluency across operations (addition, subtraction, multiplication, and division). However, on the most distal measure of mathematics achievement, the combined and RPT conditions demonstrated small improvements in scores from pre- to posttest which indicates that there may be a benefit to combined practice or RPT practice over iPad practice on the generalization of skills to more distal measures of mathematics performance.

3.5 Student engagement and off-task behavior

Students in the RPT condition had significantly higher rates of engaged time (M = 95.67%, SD = 9.18) than in the iPad condition (M = 87.38%, SD = 18.43; t (72.59) = −2.70, p = .009). Moreover, the students in the RPT condition had significantly lower rates of off-task behavior (M = 10.64%, SD = 14.35) than those in the iPad condition (M = 27.91%, SD = 27.98; (t (73.59) = 3.68; p < .001).

3.6 Intervention acceptability

Students in all three treatment conditions reported high acceptability ratings. While no significant differences were observed, the iPad condition had the highest mean acceptability (31.61; SD = 6.24), followed by the RPT condition (M = 29.76; SD = 7.04) and then the combined condition with a (M = 29.22; SD = 7.15). All three teachers indicated that they were highly likely to use both RPT and technology-delivered flashcards in the future.

4 DISCUSSION

The present study sought to address existing gaps in the literature by examining the effects of iPad delivered flashcards, RPT, and a combined intervention condition on proximal measures of multiplication fluency delivered via paper-pencil and iPad. No prior published studies, to our knowledge, have made this comparison. Additionally, the present study examined the impact of these intervention on distal measures of mathematics achievement. Another purpose of the present study was to examine if there were differences in response rates of the participants across assessment modality and treatment group. Previous research has suggested that typing is slower than writing and therefore it was hypothesized that students would have less digits correct per minute on the iPad CBM than the paper-pencil CBM (Hensley et al., 2017). Lastly, previous research has found that both reciprocal RPT and technology-mediated interventions increase student engagement over control conditions (Bowman-Perrott et al., 2014; Evans et al., 2015; O'Malley et al., 2014). Therefore, the present study evaluated student engagement and off-task behavior across conditions.

The findings of the present study suggest that all three interventions were beneficial to students with each group yielding significant gains from pre- to posttest on both versions of the proximal CBMs (iPad and paper-pencil). Although socially meaningful gains were made on distal measures of mathematics achievement, these gains were not significantly different at posttest from pretest. Both interventions were perceived as acceptable by teachers and students and students in the peer-tutoring condition were significantly more likely to be academically engaged and less off-task than students in the iPad condition, albeit students across all groups were on-task an average of 91.1% (range, 31.3%–100%) of the intervals observed. The practical implications of these findings suggest that schools have multiple acceptable and effective options that can be applied in the classroom context to supplement core curricula to facilitate student proficiency in mathematics (Long et al., 2016).

4.1 CBMs

On the paper-pencil CBM, all groups had significant improvement from pre- to posttest, though there were not differences between groups at posttest. The lack of differences between groups is inconsistent with previous studies which found that the paper-pencil condition outperformed the technology and combined conditions on proximal paper-pencil measure (Duhon et al., 2012; Rich et al., 2017). It is possible that because the time to practice was held constant (10 min) that the potency of opportunities to practice in the iPad condition were higher than that of the RPT condition where students spent 5 min as the tutee and 5 min as the tutor. Previous research suggests that acting as both the tutor and tutee is beneficial for students (Haas et al., 2022); however, it is unclear if it is equally beneficial to be in either role. This, coupled with the idea that the combined group had the opportunity to generalize their learning from the iPad to paper-pencil during the day of RPT practice, may have contributed to the slightly greater ROI observed with the combined group as compared to that of both the RPT and iPad groups on the paper-pencil CBM. Additionally, the ROI on the measure for all groups surpassed the expected growth of 0.72 on the math computation measures with no intervention (Pearson, 2011). Solomon et al. (2020) found higher average math fluency growth rates per session (1.62 digits correct per minute). In the present study, students received four sessions per week and the dependent variable was digits correct per 2 min (DC2M) which means that a rate of improvement of approximately 12.96 DC2M would be consistent with the results of other interventions targeting math fact fluency (Solomon et al., 2020). On the two CBM measures no groups reached a ROI of 12.96 DC2M or higher. Therefore, while the results of the present study resulted in higher ROIs than would be expected for students who are not receiving intervention, the ROIs were not as high as results found in Solomon et al. (2020). The lower ROIs observed in the present study could be due to many factors relating to student characteristics or levels of initial fluency or the large set size used in the included interventions.

On the iPad CBM, all intervention conditions produced significant growth from pre- to posttest. However, a significant difference was observed between the iPad group and combined group favoring the iPad condition. In addition, on the iPad CBM students in the iPad condition had a ROI greater than 5 DC2M per week and students in the combined condition and RPT conditions had a greater than 3 DC2M ROI. Given that the students in the iPad condition yielded the highest ROI compared to all conditions and performed significantly higher than those in the combined condition on the iPad CBM it is possible that typing speed improved for these students because of the repeated practice using the iPad. Another explanation may be that the iPad CBM was the most proximal measure to the iPad intervention, given the matched modality. This is consistent with results from Duhon et al. (2012) and Rich and colleagues (2017) who found that the technology-mediated intervention yielded the highest scores on the technology-delivered CBM. Although there were not significant differences on the iPad CBM between the RPT and iPad conditions, this could be due to insufficient power in the present study due to a small sample size. Lastly, for both the paper-pencil and iPad CBMs pre-test was a significant covariate for posttest scores and explained much of the variance in posttest scores (62.3% and 57.4%, respectively). These results are consistent with research which suggesting that student's initial skill level is a significant predictor of posttest outcomes (Clarke et al., 2019).

Students across all conditions had higher rates of DC2M when completing the paper-pencil CBM than the iPad CBM at both pre- and posttest. The results of the present study support previous research which has found that student response times are slower when typing than writing (Hensley et al., 2017; VanDerHeyden et al., 2022). Another potential reason for this difference could be because the students have more experience with writing than typing; while the participating classrooms utilized technology, the time spent using it was much lower than time spent in traditional activities (e.g., worksheets, paper-pencil tests) involving writing.

4.2 Generalization

Despite the growth noted for students in each condition across the paper-pencil proximal measure, this performance did not generalize to the WJ-IV fluency subtest, which reflects a broad measure of mathematics facts from all four operations. Given participants above average range of performance at pretest, it is possible that students had little growth to make on the assessment suggesting a ceiling effect was present. Similarly, for both the RPT and iPad group there were no generalized effects observed on the WJ-IV calculation subtest. However, the results of a paired samples t-test indicated that there was significant growth in performance for students in the combined condition from pre- to posttest though these effects were not found to be significantly different than the other treatment groups in the ANCOVA model. It is possible that the sample size was too small to detect a difference or that there truly was no significant difference between scores. The lack of gains for may also be because the WJ-IV calculation subtest which is a more distal mathematics measure comprised of problems ranging from simple addition to advanced calculus. Regardless, these scores are of practical significance and provide support for the use of multiple modalities which is supported by previous research that has found that computer assisted instruction is more effective as a supplement to traditional instruction rather than a replacement (Burns et al., 2012; Cheung & Slavin, 2013; Musti-Rao & Plati, 2015).

4.3 Systematic direct observation

Students participating in the RPT intervention were significantly more engaged in the intervention than those using the iPad and, correspondingly, students using the iPad were significantly more off-task than those participating in RPT. No other studies have compared iPad conditions to RPT, albeit Bryant et al. (2015) found that a technology delivered intervention was more engaging for students than teacher delivered small group and worksheet practice. The lack of identified differences in posttest scores between groups despite the differences in engagement could be due to a lack of statistical power or it could be that the similarities between interventions in terms of active ingredients lessened the impact of the differences in engagement.

When examining the list of active ingredients across the iPad and RPT intervention conditions, the only difference is that multiplication facts were typed into the app while the student worked independently on the iPad whereas with RPT students verbalized and wrote their answers. Taiwo (2017) demonstrated a functional relationship between student verbalization or mathematics problem solving and mathematics proficiency. These data suggest that the RPT group may have had an advantage in verbalizing their responses, however this did not translate to differences in growth on the included measures. Perhaps other variables discussed above such as the potency of opportunities to practice impacted the growth of the RPT condition. In other words, the combined group both may have yielded a higher number of opportunities to respond working on the iPad and were more engaged in the peer-tutoring intervention perhaps producing the small difference between ROI on the paper-pencil CBM between the peer-tutoring and combined conditions.

4.4 Intervention acceptability

Each group of students rated their respective intervention as being highly acceptable. These results are consistent with other studies that have examined student acceptability of technology-delivered interventions (Kromminga & Codding, 2020; Musti-Rao & Plati, 2015) and RPT interventions (Greene et al., 2018). There were no significant differences between the conditions in terms of student acceptability, indicating that any of the three intervention conditions might be acceptable intervention options for students. The classroom teachers also rated that they were very likely to use both the technology-delivered and RPT interventions in their classroom in the future. Researchers have found that interventions are generally more acceptable if they are not excessively time consuming, have limited or no negative side effects, and intrusiveness of the intervention (Silva et al., 2020). Therefore, it is possible that the limited time burden on the teachers for both intervention procedures, the short period of class time required to complete the intervention (e.g., 20 min), and the intervention's fit within typical classroom practices led to high acceptability among classroom teachers.

4.5 Limitations and future research

As with all research the results must be taken with the limitations. First, the sample size for the present study was smaller than the power analysis indicated would be necessary to detect an effect. Future replications of this study are needed in future research. Second, the present study did not include a control condition. While this limits the claims, we could make regarding the impacts of the intervention compared to students receiving no treatment, researchers have stressed the importance of including active treatment comparisons to provide more information regarding how two treatments compare in terms of instructional design principles, impacts on student performance, and resources for implementation (Petersen-Brown et al., 2019). Future research may consider including an active control and business as usual control condition to provide comprehensive support for the effectiveness of the treatments. Third, the current sample of third grade students was predominantly white students without disabilities from a middle-class suburb in the Midwest United States. This limits the generalizability of the results to other grade levels, races or ethnicities, disability status, urbanicity, geographical region, or schools with different instructional practices. Future research should replicate this study with different populations of students to determine if these practices could be beneficial to different groups of students. Fourth, due to limitations with how the facts could be presented in the iPad condition (i.e., all 91 facts randomized) the peer condition also practiced daily with the full set of facts randomized. This was done to keep the intervention conditions similar for comparison between conditions. However, it is often recommended to practice facts in small sets of unknown facts rather than all at once (Burns et al., 2016). Therefore, it is possible that the intervention impacts on students’ outcomes may be higher with smaller sets of mathematics facts practiced daily. Future research may also consider examining student growth throughout the intervention rather than simply looking at change from pre- to posttest because growth may not have been linear. Additionally, future research may also examine whether student baseline level of performance results in differential effects of interventions across modalities. Another limitation is that the first author served as an interventionist for each of these classrooms, future studies may wish to examine the effectiveness of teacher implementation of these class wide interventions. An additional limitation is that there was no interobserver data collected on the measure of student engagement and the first author both administered and scored the CBMs, meaning they were not blinded.

4.6 Implications for practice

The results of this study suggest that all students improved their multiplication fluency from pretest to posttest on proximal measures and there were few differences between groups. That said, on the iPad CBM, students in the iPad condition significantly outperformed students in the combined condition, potentially illustrating the impact of matching assessment and intervention modality on student outcomes. A significant difference in engagement was also found, favoring the RPT intervention. Taken together, these results have practical meaning given that reciprocal RPT, iPad delivered flashcards, or the combination of the two were beneficial for improving the multiplication fact fluency of these third-grade students and were highly acceptable to both the students and teachers.

ACKNOWLEDGMENTS

The contents of this manuscript were developed under a grant from the US Department of Education, # H325D160016. However, those contents do not necessarily represent the policy of the US Department of Education, and you should not assume endorsement by the Federal Government. Project Officer, Sarah J. Allen, PhD.

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest.

    ETHICS STATEMENT

    All procedures outlined in this manuscript were approved by the University of Minnesota, Twin Cities institutional review board. All participants had written informed parental consent and verbal student assent to participate.

    DATA AVAILABILITY STATEMENT

    Data are available upon request.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.