Evidence for a simplicity principle: teaching common complex grapheme-to-phonemes improves reading and motivation in at-risk readers
Abstract
This study examines the effects of teaching common complex grapheme-to-phoneme correspondences (GPCs) on reading and reading motivation for at-risk readers using a randomised control trial design with taught intervention and control conditions. One reading programme taught children complex GPCs ordered by their frequency of occurrence in children's texts (a ‘simplicity principle’). The other reading programme taught children word usage. Thirty-eight students participated in the 9-week programme of 30 supplemental small group sessions. Participants in the complex GPC group performed significantly better at post-tests with generally large value-added effect sizes (Cohen's d) at both by-participant and by-item for spelling, d = 1.85, d = 1.16; word recognition with words containing taught GPCs, d = 0.96, d = 0.95; word recognition, d = 0.79, d = 0.61, and reading motivation, d = 0.34, d = 0.56. These findings suggest that the simplicity principle aids in structuring maximally effective supplemental phonic interventions.
In English, legal combinations of 26 letters (graphemes) represent the 44 smallest speech sounds in the language (phonemes) reasonably often enough to allow the assembly of the majority of word pronunciations from these component grapheme-to-phoneme correspondence (GPC) rules. However and as widely documented, English contains many words that violate common spelling rules. These words are amongst the most commonly occurring in children's texts (e.g., one, two three, the, go, he). Furthermore, a single grapheme may also be pronounced in multiple ways (e.g., consider ‘ch’ in ‘chat’, ‘school’, ‘Charlotte’ and ‘stomach’), and up to nine phoneme pronunciations may exist for a single grapheme (Gontijo, Gontijo & Shillcock, 2003).
In response to such observations, some researchers have assumed that distinct classes of ‘regular’ and ‘exception words’ exist that word-specific spelling knowledge is brought to bear in reading such exception words (most explicitly modelled in the dual-route cascaded model of reading; Coltheart, Rastle, Perry, Langdon & Ziegler, 2001). Other researchers argue that extensive complex rules exist in the English alphabetic system. Gontijo et al. (2003) argue that approximately 461 GPC rules characterise the English language. A third theoretical position is that statistically defined distributional orthographic and/or semantic–morphemic processes aid acquisition (e.g., Grainger & Ziegler, 2011; Seidenberg & McClelland, 1989).
Given the complexity of English orthography and theoretical diversity of models, how far should educators go in teaching GPCs to beginner readers in the simplest yet most efficient manner? The current study attempts to shed some light on this fundamental question. A starting point is the existing research on the nature and effectiveness of phonics teaching and is thus overviewed in the succeeding texts.
Research on phonics instruction
Phonics instruction involves teaching children to apply GPCs to assemble the pronunciations of words. Phonics programmes often involve explicit and direct teacher-led instruction of GPCs and of strategies for blending them to derive pronunciations (e.g., Foorman, Francis, Fletcher, Schatschneider & Mehta, 1998). The evidence for phonics in a general sense is often seen as being strong. Several systematic reviews and statistical meta-analyses on this issue summarising the literature over several decades now exist, in English at least (Ehri, 2003; Ehri, Nunes, Stahl & Willows, 2001; McArthur et al., 2012; National Reading Panel, National Institute of Child Health & Human Development (NRP), 2000; Torgerson, Brooks & Hall, 2006). All such reviews show at least some positive effects for phonics interventions. As an example of these, Ehri et al. (2001) chose all studies containing treatment phonics programmes and control no-phonics programmes. A significant but relatively modest effect of d = 0.41 from 38 treatment-control studies was found overall.
In evaluating this work, many of the studies included in the review lacked treatment controls (that is, controls were often unseen ‘business as usual’ classrooms). Furthermore many of the studies in this review were not randomised control trial (RCT) studies. Because both of these factors lead to possible confounding of designs, they were explored post hoc by Ehri et al. (2001), who found few marked differences across study types. On the other hand, Suggate (2010) has reported larger effects in quasi-experimental studies. Furthermore, reviews including only RCT studies (e.g., Torgerson et al., 2006) conclude that while phonics in some form appears to ‘work’, the evidence base itself is still relatively weak, methodologically speaking.
Other limitations of phonics intervention research include the lack of specificity and details in the interventions and the lack of measurement of the transferability of the interventions to other reading skills (e.g., McArthur et al., 2012). Arguably, another major limitation of much phonics research is that phonics researchers have often included additional nonphonic literacy interventions, leaving it unclear whether phonics per se improves reading and spelling. McArthur et al. (2012) found only 11 published studies in the whole literature to date that met their selection criteria of the following: (i) specifying the use of randomisation, quasi-randomisation or minimisation to assign participants to phonics and control interventions; (ii) that taught phonics alone; and (iii) were directed towards struggling readers in English. From these 11 studies, the results suggested that phonics interventions were effective for improving some reading skills including word reading accuracy and letter–sound knowledge. As only two of the 11 studies measured spelling or fluency, there was not enough evidence to tell whether they were effective for promoting these skills. Phonics interventions focus on teaching grapheme-to-phoneme relationships, whereas spelling requires the reverse phoneme-to-grapheme relationship. Therefore, more studies looking at the transferability of phonics training to spelling of taught GPCs and to spelling abilities more generally need to be explored.
It is also unclear from existing evidence either for how long or how much phonics should be taught (e.g., McArthur et al., 2012; Torgerson et al., 2006). Some systematic reviews have cautiously suggested diminishing returns for extended interventions (e.g., NRP, 2000), although others have suggested that there is simply not enough of the best quality data available to decide the issue (e.g., Torgerson et al., 2006). Relevant to this debate, a recent systematic review by Suggate (2010) suggests that phonics interventions are only effective in the earliest years of children's elementary school careers with apparent diminishing returns for phonics programmes after Grade 1 and that other language-related interventions are more effective in later elementary years. These data are also consistent with the consensus view that there exist substantial minorities of ‘treatment nonresponders’ who struggle to master phonics (e.g., Torgerson et al., 2006).
One issue that perhaps clouds this debate about the optimal duration of phonics programmes concerns the content of such programmes. It might be that diminishing returns might be obtained for some longer interventions because they cover the same (perhaps relatively) limited range of GPCs and phonic activities as shorter interventions, just over long periods. As such, they may thus be redundant (and may be perceived as such by students themselves, affecting motivation). Viewing this issue from the distinct perspective of the orthographic complexity of English as highlighted at the beginning of this paper, theoretically, children could learn approximately 461 GPCs to have a ‘complete’ GPC decoding system (Gontijo et al., 2003). This might suggest a role for teaching of at least some more ‘complex’ GPCs to children sometimes taught only the 20–30 most common rules.
Phonics and the simplicity principle
One recent approach to research that takes up this issue of the nature and number of GPCs that should be taught in phonics programmes is the work of Vousden, Solity and colleagues (Solity & Vousden, 2009; Vousden, Ellefson, Solity & Chater, 2011). Vousden et al. (2011) first generated a large database of words found in 685 contemporary children's books read by children aged 5–7 years in schools in the United Kingdom that later served as an independent variable. GPCs within these words were then coded by the frequency with which these units occurred in all of these children's texts. Vousden et al. (2011) found that 64 of the most frequently occurring GPCs accounted for the same word types in both children and adult text. This frequency-coded GPC list was then used alongside the highest-frequency exception words to model their optimal explanatory power in predicting the total proportion of words in children's texts. Predictably, where few GPCs were known, few words could be read. There was suboptimality at the other extreme as well, because teaching additional GPCs after a certain level itself showed diminishing returns, as the number of new words that became readable in relation to additional but decreasingly frequent GPC rules of course declines. This modelling is the empirical basis of the simplicity principle for reading: the principled selection of the optimally efficient GPC units that lead to greatest generalisation and usage in reading words in children's books.
In subsequent simulations, Vousden, Ellefson, Chater and Solity (2010) compared the content of commonly used UK phonics programmes with the one they derived from the simplicity principle. Figure 1 shows the simulation of the curvilinear relationship between the number of GPCs (number of correspondences) that are available and the percentage of words addressed in real texts that can then (in principle) be read with this rule knowledge. The curves modelled in the succeeding texts imply that the learning simplicity principle's GPCs are more profitable than GPCs covered in other programmes (Read Write Inc., Jolly Phonics, Letters and Sounds). Furthermore, the order of the simplicity principle list was arguably more optimal than the order learned in other programmes, as the simplicity principle's correspondences learned at each stage led to more accurate pronunciation earlier in simulations than the other programmes did.

To date, the work of Vousden et al. (2011) has involved conceptual and statistical modelling research. There is no behavioural evidence from child or adult readers. The current study thus seeks to examine the impact of phonic approaches taking advantage of the simplicity principle in a practical classroom setting. Here, we seek to explore whether students benefit from learning the extended list of ‘common complex’ GPC units suggested by the simplicity principle. To this end, the current study will look at teaching these GPC units such as ‘a_e’, ‘pp’, ‘tch’, ‘igh’, ‘ed’. Given findings from this review showing the limited number of studies with the strongest designs (i.e., RCTs with taught controls, clearly specifying content, only teaching phonics), these issues were also addressed in the current methodology. Additionally, as there remains a debate about whether older children benefit from phonic programmes compared with younger children, the intervention was undertaken and evaluated in Grades 1 and 2 classrooms.
Reading motivation
One of the reasons teaching students GPC units is effective is because students gain independence in reading (Beck & Juel, 1995). Self-report studies suggest that fluent readers are also very aware of the effectiveness of sounding out words (Beech, 2010). If the simplicity principle-based approach provides the optimal number of GPCs students will likely encounter in text as it generalises quickly to a large number of words, then students will be able to read more words at a faster pace and achieve independence quicker. It is known that achievement in reading is strongly predictive of reading self-concept and academic self-concept (Chapman, Tunmer & Prochnow, 2000; Daki & Savage, 2010). Therefore, the present study also examines the direct impact of the simplicity principle on reading motivation and their self-reported reading strategies.
In summary, the current study addresses the causal effects of a simplicity principle-based phonics programme on reading skills and reading motivation in an intervention study with RCT and treatment control in Grades 1 and 2 (equivalent to Years 1 and 2). In the experimental condition, children were taught the complex GPCs from Solity and Vousden (2009), whereas the treatment control condition had lessons aimed at improving word usage and word meaning.
Hypothesis 1.The simplicity principle predicts that participants in the complex GPC will perform better than in the word usage group on the reading and spelling assessments at post-test.
Hypothesis 1A. spelling. Teaching graphemes (letters) to phonemes (sounds) could transfer to hearing the phoneme and recognising the corresponding grapheme. Therefore, it is predicted that teaching complex GPCs will lead to higher scores on spelling assessments.
Hypothesis 1B. word recognition of taught grapheme-to-phoneme correspondence. Participants in the complex GPC group are expected to perform better than those in word usage on word recognition items containing complex GPCs taught in the lessons. Nevertheless, if generalised, it would also result in a higher accuracy rate of recognising written words that are read aloud than participants in the word usage group.
Hypothesis 2.Students in the complex GPC group would report higher reading motivation following the reading programme compared with students in the word usage group.
Hypothesis 3.Students will rely more on strategies taught in their respective condition after the intervention than before the intervention, reflected in their self-reports of reading strategy use.
Method
Participants
A first-grade class (n = 20, 6 boys, 14 girls) and a second-grade class (n = 18, 7 boys, 11 girls) from a regular public sector English school in Quebec, Canada, participated in this study. All English schools in Quebec offer dual language instruction dividing language arts into distinct English and French classes. Invitations for the participation of this study were sent to multiple English school boards in Quebec, but this school was particularly eager to receive reading help for their students. At the initial meeting with the principal and the teachers, they informed us about their concern for the reading levels of the students and were very interested to have the study conducted at their school. All students were sent a parental consent letter informing them about the nature of the study. Parental consent was given to all the students in both classes, but parental questionnaires were returned by 74% of the sample. Responses showed that all respondents spoke to their children in both French and English but read primarily English text to their children.
Measures
Raven's Coloured Progressive Matrices
The test is designed to measure test takers' reasoning ability and general intelligence using 36 multiple choice questions. Each item asks test takers to identify the missing block in the pattern. The internal reliability of this test at pre-test (Cronbach's alpha) was 0.63.
Phonemic blending assessment
A group assessment was given to evaluate whether students have or have not acquired the ability to blend phonemes independent of print. The phonemic blending test by Pennington Publishing (www.penningtonpublishing.com) was chosen because of its efficiency in gathering information on the entire class at once and took 10 minutes to administer and complete. The researcher would read: ‘I will give you the sounds of a word very slowly and two word choices. You tell me which word is formed from the sounds – the first or the second word I say. The sounds are/c//r//ī//d/. Is the word 1) light or 2) cried?’ Other examples of items are great versus skate, street versus please, most versus nose and used versus huge. There were 10 test items with varying word length and nine items were consonant–vowel–consonant (CVC) words. The internal reliability of this test within this sample (Cronbach's alpha) was 0.68.
Spelling test
The spelling test consisted of nine words that were taught explicitly in the reading programme and contained GPCs that were not included in the Group Reading Assessment and Diagnostic Evaluation (GRADE) assessments. The purpose of this test was to check whether students knew the target words before the reading programme began and whether they learned the words following the reading programme (see Appendix A for spelling list).
At pre-test, participants were also asked to use the word in a sentence to assess knowledge of word usage. All students were able to use the words correctly in sentences demonstrating that they were familiar with the words. In addition, all students were able to use the words in sentences correctly during the programme so it was not necessary to re-test them on this knowledge at post-test. Consequently at post-test, participants were only asked to write the word and not use it in a sentence. The internal reliability of this test at pre-test (Cronbach's alpha) was 0.74.
Group Reading Assessment Diagnostic Evaluation level 1
The Group Reading Assessment and Diagnostic Evaluation is a norm-referenced and developmentally based assessment of reading skills. The word recognition assessment was selected from the GRADE because it assesses the directional effect of learning grapheme to corresponding phoneme unit. Level 1 was chosen as recommended by the second-grade class' teacher as assessment in level 2 was perceived as too difficult for the students in the class. Out of level norms provided for the test were thus used. The standardised assessment yields a stanine score for this subtest alone.
For the word recognition test, participants were asked to identify the word read by the examiner from a choice of four visually and/or phonologically similar words. The test consisted of 20 word sets. None of the words on the test were explicitly taught in the reading programme so the test evaluates transferability of the knowledge gained. Only version A of GRADE level 1 was used because 11 of the 20 words contained GPCs that were taught in the reading programme, whereas version B assesses only eight GPCs taught in the reading programme. In addition, version B only reassesses one GPC from version A. The internal reliability of this test at pre-test (Cronbach's alpha) was 0.78.
Reading Self-Concept Scale
This pencil and paper scale devised by Chapman and Tunmer (1995) measures the reading subcomponent of academic self-concept and was used to assess reading motivation. The original scale consists of 30 items representing three related aspects of reading self-concept: 10 perceptions of competence in reading items, 10 perceptions of difficulty with reading items and 10 attitudes towards reading items. However, many of the questions were difficult for the young students to comprehend when the original assessment was administered at pre-test. For five items, participants were confused and consistently asked for clarification, causing the instructions to deviate from the standardised procedures. In order to create a more reliable assessment, rather than changing items that caused confusion, these specific items were removed. In addition, another five items that lowered the internal reliability of the test (i.e., Cronbach's alpha was less than 0.70) were also removed. Items removed were questions 1, 5, 14, 15, 16, 18, 19, 20, 21 and 24. The revised assessment had 20 items (six competence in reading items, seven perceptions of difficulty with reading items and seven attitudes towards reading items) administered for post-test and were used in all of the analyses.
The assessment was administered to the whole class, and the researcher read aloud each question as participants responded accordingly to a 5-point Likert-style scale. The assessment took approximately 30 minutes to administer. The internal reliability of this revised test at pre-test (Cronbach's alpha) was 0.86.
Strategy question
The strategy question was taken from Beech's (2010) paper identifying the strategies children use when they come across an unfamiliar word in text. The researcher read aloud the question: ‘What do you do when you are reading a book or something, and you come across a word that you can't read?’ Then the researcher read out seven options that Beech (2010) found were most common responses among students: (i) I try to sound it out; (ii) I think of a word that's like it; (iii) I ask my teacher; (iv) I ask a friend; (v) I use a dictionary; (vi) I work it out from the other words around it; and (vii) I just skip it and go on. Participants responded by circling one of the options that best reflects the strategy they use.
Procedure and design
The reading programme took place in the first half of the school year with the Grade 2 students and then in the second half of the year with the Grade 1 students. The procedures for both grades were identical, with only minor changes in the books read and the words taught noted in Appendix B.
Forming groups
After the initial assessments, participants were matched with another participant in the same class with similar scores in order of predetermined importance on the assessments: phonemic blending assessment, word recognition test from the GRADE, spelling test and reading motivation. Within the pair, participants were randomly assigned number 1 for complex GPC group or number 2 for word usage group within each classroom using an online random number generator (www.random.org). One participant in Grade 2 was not present at the initial assessment and was not included in the original formation of the groups. This student was randomly assigned to the word usage group. Each randomised group was split into two more groups creating four smaller groups consisting of four to five students used for programme instruction.
Reading programme
The reading programme took place outside of the classroom for 20 minutes per group. Each group would take turns to participate in the programme during their first period in homeroom. There were three to four sessions a week for nine consecutive weeks, with a total of 30 sessions (see Appendix C for a visual representation of the first lesson).
Using the Ranked List of Grapheme–Phoneme Mappings of Vousden et al. (2011), words with those GPCs were compiled from lists of most commonly occurring English words and/or most commonly misspelled words made in elementary grades according to lists from Fitzgerald (1932), Graham, Harris and Loynachan (1993) and Johnson (1950). A few words such as ‘school’ and ‘noise’ were selected because students would frequently ask how to spell those words at the end of sessions when they had to construct their own sentences using the target word. The reading programme only focused on the top 36 complex GPCs on the following assumption: (i) students in Grade 2 already have a solid grasp of the singleton GPCs and do not need to go over them; and (ii) going beyond the top 64 GPCs would be unprofitable as they rarely occur in text (Solity, 2012).
Shared components of both interventions
Each session contained one to two target words. The experimenter began the session by asking participants: ‘How would you spell (target word)? Try your best’. Participants would write down how they think the word is spelled in their notebook as best as they could. The researcher then wrote the word on a whiteboard. The word was erased from the board, and participants were given a bag filled with letters to build the target word. This step allowed participants to physically manipulate and form the words having just seen the correct spelling on the board, reinforcing their understanding of the spelling of the word. The word was written on the board again, and participants compared their spelling with the board and corrected the spelling if needed. The built word remained in front of them for the remainder of the lesson.
Next, the researcher read a picture book containing the target word (see Appendix B for list of books corresponding to the words). This showed participants how the word is used in a rich context as suggested by Solity and Vousden (2009). The books were selected from a compiled list of the 500 most circulated English juvenile titles from the Toronto Public Library children's and youth services by librarians with the date range from January 2011 until 3 June 2012 (Toronto Public Library, senior staff personal communication, 27 July 2012). The books were chosen for each lesson based on a high occurrence of the specific target word and GPC unit. The Grade 2 students highly favoured books written by Robert Munsch, and therefore, most of the books in the lessons were Robert Munsch books. The Grade 1 students on the other hand had trouble following long stories, often losing interest or forgetting what the target word of the day was; therefore some shorter books were chosen where feasible. After finishing the book, on the basis of their assigned condition, each randomised group took part in complex GPC activities, whereas the other half undertook word usage activities.
Complex grapheme-to-phoneme correspondence condition
The target GPC within the target word was introduced to the participants by writing the target GPC in red and the rest of the letters in black on a whiteboard. The researcher explained to the participants where the GPC is normally located within words (i.e., at the beginning, middle and/or end of a word). For example, for the word ‘says’, the researcher read, ‘Now the word says has a spelling pattern, “s”. Normally what sound does “s” make? That's right “ssss”. But when “s” is at the end of a verb or an action word like “says” it makes a “z” sound’.
The participants pull out the GPC from the word they built at the beginning of the session and lay it in front of themselves. The researcher then gave each participant a passage from the book containing words with the GPC, and each participant would identify the words containing the GPC. The identified words were then written on the whiteboard, and all the participants read aloud each word.
Word usage condition (control)
Lessons in this condition focused on the usage of the target word. For example, for the target word says the researcher read: ‘We use the word says when another person is speaking: Franklin says, Mary says, He says, She says. But there is no “s” at the end of “says” for I say, You say, or if more than one person is talking, the boys say, the girls say. Let's finish these sentences, when should we use says and when should we use say?’ Examples used were ‘Franklin _____; She _____; I _____; You ______; The boys____; The girls ____’. The experimenter then provides sentence activities for participants to further their understanding of the word usage. ‘Let's look at these sentences, when should we use says and when should we use say. Ben ____ hello. I ____ goodbye. She ______ she is cold’.
To end off each session, all participants wrote a sentence using the target word in a notebook provided to them at the beginning of the reading programme. After 10 words were taught, a review session took place using games on spelling, word usage and word bingo. There were three review sessions of 10 words, and a final review session on the last day to go over all of the words.
Informal classroom observations
Before the reading programme began, the researcher and the classroom teacher had informal discussions on the reading material the teacher covered in class. The teacher described weekly routines such as printing and cursive writing, review of letters and sounds in the alphabet (as there were students who transferred over from the French programme and would confuse the names for letters in French with names of English letters), grammar lessons, students reading one book of their choice aloud to the class and monthly spelling tests. A few of the words on the spelling tests did overlap with words taught in the intervention such as school and know, but overall, the list of words did not overlap with the words taught in the intervention. Randomised within-class design meant that any teacher effects were equal across conditions.
In addition, as each group finished their daily session and the next group was lining up, approximately 2–3 minutes per switch, the researcher made note of the lessons taking place in the class. Altogether, more than 4 hours of informal observations took place over the course of the programme. It was noted that the teacher would teach singleton GPC units, and occasionally, some complex GPC units for silent letters such as ‘kn’ in ‘know’. In the classroom, the GPC units were also taught on the basis of individual words rather than as a theme for multiple words. For example, for the spelling list of October, the teacher said, ‘bat, the “a” makes the /ah/ sound, and leaves, the “ea” has the /ee/ sound’. In contrast, the reading programme teaches the GPC units as a theme, in which an entire lesson would be on the ‘ea’ as /ee/ sound.
Results
Preliminary data analyses
All data for the main analyses were first screened for skewness and kurtosis. These preliminary analyses showed there were no deviations from normality in the data sets and thereby were suitable for parametric tests. The next steps before conducting the main analyses were sample description and verification. Using the GRADE, which is a standardised test, a stanine score was calculated corresponding to the raw scores. This confirmed that both samples were at-risk readers, Grade 2 (M = 2.89, standard deviation [SD] = 0.90), Grade 1 (M = 3.40, SD = 1.93). In addition, using the raw scores for Raven's Coloured Progressive Matrices, the test revealed that both samples were intellectually average, with both achieving the 50th percentile for their respective age group (Grade 2, M = 19.79; Grade 1, M = 16.29).
One of the assumptions essential to suitability of the simplicity principle-based intervention was that students in both grades have already gained the ability to blend sounds, as the reading programme builds on known phonemic skills. This assumption was explored using the phonemic blending assessment with items varying in word length and in mostly CVC words (for example, great versus skate, street versus please, most versus nose). The test requires students to achieve 80% correct to be considered having mastered phonemic blending. In the Grade 2 sample, 14 out of 17 participants (one student did not complete this assessment) scored greater than 80%. In the Grade 1 sample, 14 out of 20 participants scored greater than 80%, showing that generally the sample's initial phonic abilities were at the right level for the intervention.
Inferential analyses
Independent samples t-tests were first conducted on pre-test scores with conditions as the independent variable. The Grade 2 results showed the groups were not significantly different from one another at pre-test (spelling test, t(16) = 1.20, p = .25; word recognition, t(16) = .28, p = .78, and reading motivation, t(16) = 1.33, p = .20), which means the two groups were comparable. Similarly, the groups for the Grade 1 sample were also comparable (spelling test, t(18) = 1.21, p = .24; word recognition, t(18) = .41, p = .69, and reading motivation, t(18) = .48, p = .64). The means and SDs for all the raw scores are shown in Table 1.
Pre-test | Post-test | Cohen's d for complex GPC | Cohen's d for word usage | Cohen's d for value added | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Complex GPC | Word usage | Complex GPC | Word usage | |||||||||
Tests | Grade | M | SD | M | SD | M | SD | M | SD | |||
By-participant | ||||||||||||
Phonemic Blending | 1 | 8.6 | 2.55 | 8.1 | 2.08 | — | — | — | — | — | — | — |
2 | 8.33 | 1.5 | 9 | 2.07 | — | — | — | — | — | — | — | |
Total | 8.47 | 2.03 | 8.55 | 2.08 | — | — | — | — | — | — | — | |
Spelling | 1 | 2.4 | 0.97 | 1.9 | 0.88 | 4.3 | 1.49 | 2.1 | 1.37 | 2.04 | 0.22 | 1.82 |
2 | 1.11 | 1.05 | 2.11 | 2.26 | 6.56 | 2.01 | 4.44 | 2.4 | 3.28 | 1.4 | 1.88 | |
Total | 1.76 | 1.01 | 2.01 | 1.57 | 5.43 | 1.75 | 3.27 | 1.86 | 2.66 | 0.81 | 1.85 | |
Word Reading specific | 1 | 6.4 | 2.01 | 6.1 | 3.31 | 9.4 | 1.58 | 6.9 | 2.85 | 1.13 | 0.3 | 0.83 |
2 | 7.78 | 2.49 | 8.11 | 2.67 | 10.56 | 0.53 | 8.11 | 2.52 | 1.08 | 0 | 1.08 | |
Total | 7.09 | 2.25 | 7.11 | 2.99 | 9.98 | 1.06 | 7.51 | 2.69 | 1.11 | 0.15 | 0.96 | |
Word Reading | 1 | 13 | 3.4 | 12.2 | 5.22 | 16.8 | 2.62 | 13.9 | 4.53 | 1.3 | 0.57 | 0.73 |
2 | 15.67 | 2.92 | 16.11 | 3.69 | 18.78 | 1.09 | 16.33 | 3.24 | 0.94 | 0.07 | 0.84 | |
Total | 14.34 | 3.16 | 14.16 | 4.46 | 17.79 | 1.86 | 15.12 | 3.89 | 1.12 | 0.32 | 0.79 | |
Reading Motivation | 1 | 81.4 | 15.22 | 78.7 | 9.11 | 79.6 | 11.62 | 82.5 | 7.61 | −0.15 | 0.31 | −0.46 |
2 | 73.89 | 14.38 | 82.78 | 13.8 | 90.78 | 7.66 | 81.11 | 14.06 | 1.2 | −0.12 | 1.14 | |
Total | 77.65 | 14.8 | 80.74 | 11.46 | 85.19 | 9.64 | 81.81 | 10.84 | 0.53 | 0.1 | 0.34 | |
By-item | ||||||||||||
Spelling | 1 | 2.67 | 3.57 | 2.11 | 3.62 | 4.78 | 2.33 | 2.33 | 2.83 | 0.59 | 0.06 | 0.53 |
2 | 1.11 | 1.97 | 2.11 | 1.54 | 6.56 | 2.01 | 4.44 | 2.74 | 3.11 | 1.33 | 1.78 | |
Total | 1.89 | 2.77 | 2.11 | 1.62 | 5.67 | 2.17 | 3.39 | 2.79 | 1.85 | 0.7 | 1.16 | |
Word Reading specific | 1 | 5.91 | 2.47 | 5.82 | 1.99 | 8.27 | 1.67 | 6.36 | 1.91 | 1.06 | 0.24 | 0.82 |
2 | 6.73 | 1.9 | 6.73 | 1.62 | 8.55 | 1.04 | 6.64 | 1.63 | 1.03 | −0.05 | 1.08 | |
Total | 6.32 | 2.19 | 6.28 | 1.81 | 8.41 | 1.36 | 6.5 | 1.77 | 1.05 | 0.1 | 0.95 | |
Word Reading | 1 | 6.5 | 2.54 | 6.1 | 2.2 | 8.4 | 1.6 | 6.95 | 1.9 | 0.8 | 0.36 | 0.44 |
2 | 7.05 | 1.79 | 7.25 | 1.59 | 8.45 | 1.05 | 7.35 | 1.6 | 0.83 | 0.06 | 0.77 | |
Total | 6.78 | 2.17 | 6.68 | 1.9 | 8.43 | 1.33 | 7.15 | 1.75 | 0.82 | 0.21 | 0.61 | |
Reading Motivation | 1 | 38 | 8.53 | 36.85 | 10.73 | 39.8 | 6.42 | 41.25 | 7.75 | 0.19 | 0.46 | −0.27 |
2 | 33.25 | 7.66 | 37.25 | 5.5 | 40.85 | 4.27 | 36.5 | 5.6 | 1.27 | −0.12 | 1.38 | |
Total | 35.63 | 8.1 | 37.05 | 8.12 | 40.33 | 5.35 | 38.88 | 6.68 | 0.73 | 0.17 | 0.56 |
- GPC, grapheme-to-phoneme correspondence.
A series of 2(Grade) × 2 (Condition) × 2 (pre-test and post-test) mixed model ANOVAs were then used to compare the test scores of the two grades and two conditions at pre-test and post-test, both at the by-participant and by-item. The by-item analyses evaluate participants' responses to individual test items to ensure the effects were not the result of a few participants' scores. Likewise, analyses at the by-participant checked that the effects were not due to only a few items on the tests. Together, the two types of analyses reduce the probability of type II error.
The first analysis looks at the impact of grade and condition on reading skills with two subhypotheses on specific components of reading. The second analysis examined the hypothesis on reading motivation using the self-reports from the revised version of the Reading Self-Concept Scale questionnaire (Chapman & Tunmer, 1995). The final analysis investigated the change in strategy use by using a chi-square on reported strategies pre-programme and post-programme.
Main analyses
Spelling
There was no main effect for condition (by-participant, F(1,34) = 3.87, ns, or by-item, F(1,32) = 1.45, ns) or for grade (by-participant, F(1,34) = 3.30, ns, or by-item, F(1,32) = .47, ns). However, there was a significant main effect for time at the by-participant and by-items, F(1,34) = 133.80, p < .001, η2 = .80, and F(1,32) = 52.33, p < .001, η2 = .62, respectively.
There was no grade × condition interaction (by-participant, F(1,34) = .67, ns, or by-item F(1,32) = .31, ns), but there was a significant for condition × time interaction (by-participant, F(1,34) = 31.74, p < .001, η2 = .48, and by-item, F(1,16) = 12.80, p < .001, η2 = .29). Participants in the complex GPC condition (by-participant, M = 5.43, SD = 1.75, by-item, M = 5.67, SD = 2.17) performed better at post-test than those in the control group (by-participant, M = 3.27, SD = 1.86, by-item, M = 3.39, SD = 2.79). There was also a significant grade × time interaction (by-participant, F(1, 34) = 44.21, p < .001, η2 = .57, and by-item, F(1,32) = 15.17, p < .001, η2 = .32), with the Grade 2 students (by-participant, M = 5.50, SD = 2.21, by-item, M = 5.5, SD = 2.38) performing better at post-test than the Grade 1 students (by-participant, M = 3.20, SD = 1.43, by-item, M = 3.56, SD = 2.58). There was no three-way interaction among time × grade × condition by-participant, F(1,34) = 2.73, ns, or by-item, F(1,32) = .77, ns.
Effect sizes for means were calculated using Cohen's (1988) standard equation for mean differences of the total sample (post-test–pre-test)/(pooled pre-test SD) (refer to Table 1 for effect sizes). The value-added difference in effect sizes (defined the difference between the effect size for the GPC over the word usage controls) was large, d = 1.85, in by-participant analysis and by-item analysis, d = 1.16.
Word recognition for words with taught grapheme-to-phoneme correspondences
There was no main effect for condition (by-participant, F(1,34) = 3.04, ns, or by-item, F(1,40) = 4.25, ns), but there was a main effect for grade only at the by-participant, F(1,34) = 4.17, p = .05, η2 = .11, not by-item, F(1,40) = 1.44, ns. There was a significant main effect for time (by-participant, F(1,34) = 24.68, p < .001, η2 = .42, and by-item, F(1,40) = 17.67, p < .001, η2 = .31). There was no interaction for grade × condition (by-participant, F(1,34) = .06, ns, or by-item F(1,40) = .002, ns), but there was a significant interaction for condition × time interaction (F(1, 34) = 14.14, p = .001, η2 = .29, and at the by-item, F(1,40) = 11.42, p = .002, η2 = .22), with participants in the complex GPC condition (by-participant, M = 9.98, SD = 1.06, by-item, M = 8.41, SD = 1.36) performing better on post-test than those in the control (by-participant, M = 7.51, SD = 2.69, by-item, M = 6.50, SD = 1.77). There was no interaction for grade × time (by-participant, F(1,34) = .60, ns, by-item, F(1,40) = 1.15, ns). There was no three-way interaction among time × grade × condition by-participant, F(1,34) = .19, ns, or by-item, F(1,40) = .007, ns.
Parallel effect sizes for means were calculated using Cohen's (1988) standard equation for mean differences of the total sample (post-test–pre-test)/(pooled pre-test SD) (refer to Table 1 for effect sizes). The value-added difference in effect sizes was large, d = 0.96, in by-participant analysis and by-item analysis, d = 0.95.
Word recognition (all tests words)
There was no main effect for condition (by-participant, F(1,34) = 1.74, ns, or by-item, F(1,76) = 3.62, ns). There was a main effect for grade by-participant, F(1,34) = 6.45, p = .02, η2 = .16, but not by-item, F(1,76) = 2.21, ns. However, there was a significant main effect for time (by-participant, F(1,34) = 27.78, p < .001, η2 = .45, and by-item, F(1,76) = 30.24, p < .001, η2 = .29). There was no interaction for grade × condition (by-participant, F(1,34) = .15, ns, or by-item F(1,76) = .43, ns), but there was a significant condition × time interaction (by-participant, F(1, 34) = 8.86, p = .005, η2 = .21, and by-item, F(1,76) = 9.25, p = .003, η2 = .11), with participants in the complex GPC group (by-participant, M = 17.79, SD = 1.86, by-item, M = 8.43, SD = 1.33) performing better on post-test than those in the control (by-participant, M = 15.12, SD = 3.89, by-item, M = 7.15, SD = 1.75). There was no interaction between grade × time, by-participant, F(1,34) = 1.67, ns, and by-item, F(1,76) = 2.62, ns. There was no three-way interaction among time × grade × condition by-participant, F(1,34) = .22, ns, or by-item, F(1,76) = .11, ns.
Effect sizes for means were calculated using Cohen's (1988) standard equation for mean differences of the total sample (post-test–pre-test)/(pooled pre-test SD) (refer to Table 1 for effect sizes). The value-added difference in effect sizes was large, d = 0.79, in by-participant analysis and by-item analysis, d = 0.61.
Reading motivation
There were no main effects for condition (by-participant, F(1,34) = .002, p = .97, η2 < .001, or by-item, F(1,76) < .001, p = .99, η2 < .001), for grade (by-participant, F(1,34) = .24, ns, and by-item, F(1,76) = 1.92, ns) or for time (by-participant, F(1,34) = 3.89, ns, but there was an effect at the by-item, F(1,76) = 18.89, p = .001, η2 = .20).
There was also no interaction for grade × condition, by-participant, F(1,34) = .006, ns, or by-item F(1,76) = .01, ns, for condition × time, F(1, 34) = 2.20, ns, by-item, F(1,76) = 3.67, ns, or for grade × time (by-participant, F(1,34) = 2.29, p = .14, η2 = .06, or by-item, F(1,76) = .05, ns). However, there was a three-way interaction among time × grade × condition by-participant (F(1,34) = 7.65, p = .009, η2 = .18, and by-item, F(1,76) = 12.30, p < .001, η2 = .15), with those in the complex GPC in Grade 2 (by-participant, M = 90.78, SD = 7.66, by-item, M = 40.85, SD = 4.27) only reporting higher reading motivation at post-test than those in control in Grade 2 (by-participant, M = 81.11, SD = 14.06, by-item, M = 36.50, SD = 5.60).
Effect sizes for means were calculated using Cohen's (1988) standard equation for mean differences of the total sample (post-test–pre-test)/(pooled pre-test SD) (refer to Table 1 for effect sizes). The value-added difference in effect sizes was medium, d = 0.34, in by-participant analysis and by-item analysis, d = 0.56. For the Grade 2 sample alone, the value added difference in effect sizes was large (by participant, d = 1.14; by-item, d = 1.38).
In a final analysis, we sought to explore more precisely where the source of observed gains in reading motivation lay. The Chapman & Tunmer (1995) RSQ measure used here assesses perceptions of competence in reading, perceptions of difficulty with reading and attitudes towards reading. Although reading motivation is a multidimensional construct, for an academic intervention such as the one reported here, it might be that it affects more than one of these aspects of motivation, so the impact of the simplicity-based intervention on each motivation subscale was evaluated.
To evaluate this possibility, the mixed model ANOVA reported in the previous grade (1 vs 2) and condition (GPC versus word usage) × time (pre-test versus post-test) was rerun for reading motivation, breaking the composite score down into three component dimensions of attitude, competence and difficulty. This analysis revealed a main effect of reading motivation, F(5,170) = 30.63, p < .001, η2 = .47. The only other effect reaching significance was a reading motivation × condition × grade interaction, F(5, 170) = 4.44, p = .001, η2 = .12. Post hoc t-tests confirmed that the only significant increase from pre-test to post-test was for the difficulty for Grade 2 children in complex GPC condition indicating that students in Grade 2 in the complex GPC group felt reading was less difficult at post-test.
Strategy question
At pre-test, four responses were selected (Table 2); however, at post-test, all 38 students selected ‘sound it out’ as the strategy of their choice. Because there were not enough participants to analyse each independent reading strategy (I sound it out, and I think of a word that's like it) versus dependent strategies (I ask someone else, and I skip it), the labels were regrouped to I sound it out versus others. Furthermore, because of the small cell number, strategy use could not be examined by condition or by grade.
Frequency | |||||||
---|---|---|---|---|---|---|---|
Pre-test | Post-test | ||||||
Strategy | Grade | Complex GPC | Word usage | Total | Complex GPC | Word usage | Total |
I try to sound it out | 2 | 5 | 6 | 11 | 9 | 9 | 18 |
1 | 9 | 9 | 18 | 10 | 10 | 20 | |
I think of a word that's like it | 2 | 2 | 1 | 3 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 | |
I ask someone else | 2 | 1 | 0 | 1 | 0 | 0 | 0 |
1 | 1 | 1 | 2 | 0 | 0 | 0 | |
Skip it | 2 | 1 | 2 | 3 | 0 | 0 | 0 |
1 | 0 | 0 | 0 | 0 | 0 | 0 |
- GPC, grapheme-to-phoneme correspondence.
Instead, a 2 × 2 chi-square contingency table analysis was conducted in order to determine whether strategy use (sound it out, and others) is contingent on the time (pre-test and post-test). This test is appropriate as both variables are categorical, and the analysis examines whether those variables are independent or dependent of the other.
Strategy use and time were found to be significantly related, Pearson χ2(1, N = 76) = 9.69, p = .002, Cramer's V = 0.36. Although caution needs to be taken for interpreting the significant results because the groups were very small, on the basis of the raw scores there is an apparent change from pre-test to post-test, as all participants reported using the sound it out strategy at post-test.
Discussion
The present study sought to evaluate the effects of a simplicity principle-based reading intervention (Solity & Vousden, 2009; Vousden et al., 2011) in an RCT design with a comparable taught control condition to children who had generally mastered basic phonemic blending but were considered at-risk readers. Results from both by-participant and by-item analyses provided clear evidence that simplicity principle-based complex GPC instruction led to greater improvement than word usage instruction on all post-test assessments: spelling, word recognition of taught GPCs, word recognition, reading motivation and children's self-reported strategy use. Both complex GPC and word usage groups were taught exactly the same words in interventions, were randomised within classroom to conditions and did not as groups differ from each other at pre-tests, so the post-test effects reported must most likely reflect the differences between interventions on outcomes. As noted in the review earlier, there exist to date no behavioural data on the impact of the simplicity principle on reading and spelling. The present results suggest that such approaches can have large positive effects on these measures for young at-risk readers in Grades 1 and 2.
Generally, these effects were similar in these two grades, both in significance and in effect size. Where these effects differed (primarily for motivation), effects were larger in Grade 2 than in Grade 1. There is thus no evidence of the diminishing returns reported in some reviews (NRP, 2000; Suggate, 2010) for this additional attention to phonics, at least in the age ranges studied here. One reason for this may be that we taught additional ‘common-complex’ GPCs pitched to add to children's existing basic GPC knowledge and blending skills and that such GPCs were maximally useful for children in allowing them to read many words in children's books.
More specifically, the results for spelling thus also suggest a bidirectional relationship of the effect of teaching common-complex GPCs. That is, students in the complex GPC group were only taught unidirectional GPCs. Thus, for them to perform better on a spelling test probably requires applying this information in a phoneme-to-grapheme direction: transfer in a direction that was not explicitly taught. More generally, there is limited reliable evidence from RCT studies of phonics on spelling (McArthur et al., 2012). The effect of intervention on spelling here replicates the findings from the only two studies meeting the criteria for McArthur's review (Lovett, Warren-Chaplin, Ransby & Borden, 1990; Savage & Carless, 2005). Some evidence from the results for reading also suggests transfer effects were evident. The analysis on specific items from the GRADE that contained taught and untaught GPCs demonstrated that students in the complex GPC were able to transfer the material to new words and so similarly adds to rather modest existing evidence from RCT studies in poor readers of phonics programmes on transfer effects (McArthur et al., 2012), and indeed, on reading generally.
Turning more specifically to motivation and self-reported strategy use, as predicted by the simplicity principle (Vousden et al., 2011), the generalisability of learning efficient GPC units led to increased reading motivation but only for Grade 2 students. It was argued earlier that as a result of learning GPCs with a high rate of returns (GPCs that are maximally useful for reading children's texts), students would see their efforts pay off quickly, leading to a higher sense of reading confidence and enjoyment in reading and of greater self-reported use of decoding strategies at post-tests. These predictions were generally reported, with the exception that children in Grade 1 did not show growth in motivation. Why might this be? It may be that children's reading self-concept in Grade 1 is less closely linked to attainment. There is evidence that reading self-concept is a consequence of reading (Chapman & Tunmer, 1995; Daki & Savage, 2010) and so may be less amenable to change in Grade 1 poor readers. If so, it is all the more encouraging that the Grade 2 poor readers showed improvements both in reading and reading motivation, because negative self-concepts as a consequence of poor reading in Grade 2 have been noted (Chapman & Tunmer, 1995). Additionally, we were able to confirm that the effect of our complex GPC intervention on motivation was specific in nature, with perceptions of the difficulty of reading being reduced at post-test for the older children in Grade 2. Overall, such patterns suggest that phonics programmes based on the simplicity principle have impacts not only on reading ability but also on corresponding perceptions of task difficulty and reading motivation.
A caveat must be added to the exploration of children's self-reported strategy use: because of the small cell size (less than five, in three of the four self-report categories), the original hypothesis could not be examined, so instead, the strategy of sounding words out was compared with all of the others combined together. Although there was a significant change from pre-test to post-test, caution needs to be taken when interpreting the results because all students in the sample reported they ‘sound out unfamiliar words’. The lack of differences between the groups at post-test could be attributed to a number of reasons. From in-class observations, it was noted that the teacher would encourage students to try sounding out words on their own rather than asking the teacher for help. Students were very familiar with the strategy of ‘sounding out words’, and they would use this strategy as a default because of repeated instruction and have not yet come across other strategies such as word usage. It is also possible that the self-report was not sensitive enough to detect small differences in strategy usage, and a direct measurement of strategy use may be more sensitive. Nevertheless, this finding is broadly consistent with Beech's (2010) findings.
Before closing, some potential limitations of the research should be considered. One limitation of the study is the small sample size drawn from one school, which can cause concerns of generalisability. The sample, however, were typical of at-risk readers and also had quite typical nonverbal abilities. The school from which children were drawn was in all respects a typical local school from a local school board, not a specialist school. It was atypical only in having a large number of poor readers and having identified this willing to be part of research to ameliorate such problems. Small sample sizes while leading to reduced power (especially for the by-grade analyses) probably did not compromise the capacity to detect the main effects of interventions of central interest. This is because the results yielded medium to large robust value-added effect sizes for interventions on all outcome measures that always reached conventional significance.
One reason for these clear main effects is that the present study while modest in size used a methodologically rigorous and tightly controlled design. Students were matched and randomly assigned to either the experiment or control groups. The control group taught intervention was comparable in all but the key differential aspects to the independent group intervention, reducing possible Hawthorne and related effects and controlling for teacher effects. The main intervention included no additional intervention beyond phonics. Finally, analyses were conducted both at the by-item and by-participant to ensure results were not due to a few items on the assessments or to a few participants in the sample.
Another potential limitation of the study was the reading motivation questionnaire. Problems during pre-assessments led to concerns in the reliability of these specific results. Items requiring clarification were eliminated as they deviated from the standardised procedures. Additionally, reliability analyses were conducted on pre-test results in order to remove items that reduced the assessment's Cronbach's alpha. However, the control group received the same tests, and if the conditions had no impact on reading motivation, then there should be no difference between groups' post-test scores. Yet, there was a significant difference between the groups, along with large effect sizes, suggesting that the intervention conditions did impact children's reading motivation selectively.
An additional limitation was that the GPCs were selected from a list developed from children's books in the United Kingdom. There are occasional differences between Canadian and UK pronunciations, and it is unclear whether the order would be the same for a database of Canadian books. Before replicating this study, a database should be created of Canadian books, and GPC units should be analysed in the same way as Vousden et al. (2011). Finally, future larger-scale teacher-delivered reading intervention studies might profitably include both reading comprehension and fluency measures.
In conclusion, the present study suggests that supplemental small group teaching of common-complex GPCs selected on the basis of a simplicity principle (Vousden et al., 2011) provided significant benefits to reading, spelling and reading motivation. This study also adds to the limited number of RCTs with taught controls, and clearly specified phonics-only content (McArthur et al., 2012). The present study might thus suggest a virtuous circle of early intervention effects on attainment and reading self-perceptions. Finally, in another sense, this study may be part of a virtuous circle of research, wherein intervention studies are used to cyclically test specific theories of causal claims in reading, that in turn lead to better interventions (Snowling & Hulme, 2011).
Biographies
Victoria Chen holds a Master's degree from McGill University in Human Development and is commencing her PhD studies at Queen's University, Ontario.
Robert S. Savage is an Associate Professor at McGill University. He obtained his degrees from Oxford and Cambridge Universities and his PhD from the University of London in 1998. He has published nearly 80 research articles in international journals on children's literacy. He has recently published research on school-based assessment and preventative early intervention projects for reading and spelling problems.