Volume 46, Issue 2 p. 150-165
Full Access

The Stroop revisited: a meta-analysis of interference control in AD/HD

Rosa Van Mourik

Rosa Van Mourik

1 Department of Clinical Neuropsychology, Vrije Universiteit, Amsterdam, The Netherlands

Search for more papers by this author
Jaap Oosterlaan

Jaap Oosterlaan

1 Department of Clinical Neuropsychology, Vrije Universiteit, Amsterdam, The Netherlands

Search for more papers by this author
Joseph A. Sergeant

Joseph A. Sergeant

1 Department of Clinical Neuropsychology, Vrije Universiteit, Amsterdam, The Netherlands

Search for more papers by this author
First published: 07 September 2004
Citations: 214
R. van Mourik, Department of Clinical Neuropsychology, Vrije Universiteit, Van der Boechorststraat 1, 1081 BT Amsterdam, The Netherlands; Email: R.van.mourik@psy.vu.nl

Abstract

Background: An inhibition deficit, including poor interference control, has been implicated as one of the core deficits in AD/HD. Interference control is clinically measured by the Stroop Colour-Word Task. The aim of this meta-analysis was to investigate the strength of an interference deficit in AD/HD as measured by the Stroop Colour-Word Task and to assess the role of moderating variables that could explain the results. These moderating variables included: methods of calculating the interference score, comorbid reading and psychiatric disorders, AD/HD-subtypes, gender, age, intellectual functioning, medication, and sample size.

Methods: Seventeen independent studies were located including 1395 children, adolescents, and young adults, in the age range of 6–27 years. A meta-analysis was conducted to assess the effect sizes for the scores on the word and the colour card as well as the interference score.

Results: Children with AD/HD performed more poorly on all three dependent variables. The effect sizes for word reading (d = .49) and colour naming (d = .58) were larger and more homogeneous than the effect size for the interference score (d = .35). The method used to calculate the interference score strongly influenced the findings for this measure. When interference control was calculated as the difference between the score on the colour card minus the score on the colour-word card, no differences were found between AD/HD groups and normal control groups.

Discussion: The Stroop Colour-Word Task, in standard form, does not provide strong evidence for a deficit in interference control in AD/HD. However, the Stroop Colour-Word Task may not be a valid measure of interference control in AD/HD and alternative methodologies may be needed to test this aspect of the inhibitory deficit model in AD/HD.

Numerous authors have highlighted the role of executive dysfunction in Attention Deficit/Hyperactivity Disorder (AD/HD) (Pennington & Ozonoff, 1996; Sergeant, Geurts, & Oosterlaan, 2002). A key process in executive functioning is response inhibition (Barkley, 1997). Barkley (1997) distinguished three interrelated processes believed to constitute response inhibition: (1) inhibiting a prepotent response, (2) stopping an ongoing response, and (3) interference control. The Stroop Colour-Word Task (Stroop, 1935, see for review MacLeod, 1991) is widely used as a measure of interference control in studies with AD/HD groups and is recommended as part of a psychological test battery in clinical settings (Doyle, Biederman, Seidman, Weber, & Faraone, 2000). Given both the clinical and research interest in the Stroop Colour-Word Task with respect to AD/HD, we report here a quantitative meta-analysis of studies that compare children with AD/HD and normal controls on the Stroop Colour-Word Task, as opposed to a head-count (Sergeant et al., 2002).

The standard Stroop Colour-Word Task (Golden, 1978) consists of three conditions, represented by three different cards. There are different versions, but in the AD/HD literature, the ‘Golden’ version is the most widely used. On the first card, the ‘word’ card, the speed of word reading is measured: the subject is required to read rows containing four different colour words (red, green, yellow and blue) printed in black ink and presented in random order. On the second card, the colour card, the subject has to name the colours of either rows of four Xs or blocks that are printed in the colours red, green, yellow, and blue. This condition measures colour-naming speed. On the third card, the colour-word card, the subject is required to name colours of the colour-content mismatching colour-words; for example, the colour-word red may be presented in blue ink. On the third card, the distracter is the colour meaning of the word. Interference occurs when the to-be-named colour differs from the to-be-ignored word (incongruent). This causes response conflict (Posner & DiGirolamo, 1998). On all three cards, the subject is timed for 45 seconds and the number of correct responses is counted. All stimuli are presented in a random order and the subject is required to name them as quickly as possible. A lower score on the colour-word card, in the presence of normal scores on the word and colour card, reflects the interference effect (Golden, 1978).

Generally, there are two different theoretical models to explain the interference effect in the Stroop Colour-Word Task: sequential models and parallel models. In sequential explanations, processing in one stage must be completed (or almost completed) before the next stage can begin. Interference is supposed to occur only at the response stage. Sequential theories have not been very successful in explaining all the effects found with different task manipulations (MacLeod, 1991). Parallel theories emphasise the speed of processing and the automaticity of word reading and colour naming. Cohen, Dunbar, and McClelland (1990) have developed a parallel model that states that the two features of the stimuli on the colour-word card: word (meaning) and colour, are processed simultaneously. The relative automaticity of the two dimensions (word processing and colour processing) determines the direction and the degree of interdimensional interference (MacLeod & MacDonald, 2000). In this account, the two processes run in parallel through activation moving along pathways of different strength in the system. The degree of automaticity is a function of the strength of each pathway. The difference between this parallel model and previous parallel models is that interference can also occur during the processing of the stimuli and not simply at the end of a ‘horse race’.

Neuro-imaging studies show that the brain areas that are active, while subjects perform Stroop-like tasks, include the anterior cingulate cortex (Adleman et al., 2002; Bush, Luu, & Posner, 2000; Carter, Mintun, & Cohen, 1995b; Pardo, Pardo, Janer, & Raichle, 1990; Peterson et al., 1999, 2002), a region of the frontal cortex associated with the frontal executive networks (Posner & DiGirolamo, 1998). Other regions that consistently show differential increases in activation for the incongruent condition compared to a neutral control condition are: the frontal polar cortex (Bench et al., 1993; Carter et al., 1995b), the lateral prefrontal cortex (Peterson et al., 1999; Zysset, Müller, Lohmann, & von Cramon, 2001), the inferior frontal regions (Adleman et al., 2002; Chung Leung, Skudlarski, Gatenby, Peterson, & Gore, 2000; Peterson et al., 1999), and the inferior parietal lobule (Adleman et al., 2002; Carter et al., 1995b; Peterson et al., 1999). Note that neuro-imaging studies differ with respect to the neutral control condition (for example, coloured Xs, coloured neutral words or congruent colour-words) and that the tasks used were paced rather than self-paced, as conducted in the clinical Stroop card procedure. The neural networks, which are activated when subjects perform Stroop-like tasks, are also considered to be implicated in AD/HD. Especially the frontal cortex has been hypothesised to play a major role in AD/HD pathology (see for review Barkley, Grodzinsky, & DuPaul, 1992). Therefore, it might seem surprising that the search for a deficit in interference control in AD/HD, as measured by Stroop-type tasks, has yielded conflicting findings (Nigg, 2001).

These conflicting findings can be explained in at least three possible ways. First, rapid naming deficiencies have also been observed in children with AD/HD (Tannock, Martinussen, & Frijters, 2000). Thus, a lower score on the CW-card may also be due to slower rapid naming instead of poorer interference control. Not all studies that reported deficits in interference control in AD/HD controlled for reading ability (the word condition) or naming speed (the colour condition). Second, estimates of children with AD/HD who also have a comorbid reading disorder range from 25% to 40% (Dykman & Ackerman, 1991; Semrud-Clikeman et al., 1992). If a child cannot read well, it is probably easier to ignore the word meaning on the colour-word card. This could lead to relatively faster responses on the colour-word card in children with AD/HD who are comorbid for reading disorder as compared to children with AD/HD without a reading disorder. However, this is not always the case. Children with a reading disability actually show more interference than normal controls in some studies (Everatt, Warne, Miles, & Thomson, 1997; Helland & Asbjornsen, 2000). This suggests that a deficit in interference control might not be specific to AD/HD. Third, an alternative explanation for the conflicting results might be that children with AD/HD often have other comorbid disorders such as a disruptive disorder or an anxiety disorder (Angold, Costello, & Erkanli, 1999). Results may be confounded by the high comorbidity of AD/HD with other psychiatric disorders. Inhibition deficits have also been found in comorbid disruptive disorders (Oosterlaan, Logan, & Sergeant, 1998), whereas anxiety disorders have been associated with an increased ability to inhibit (Oosterlaan, 2001). Thus, the presence of rapid naming difficulties, comorbid reading, or psychiatric disorders might have affected the interference scores found in the various studies.

Since the interference score is also determined by reading and rapid naming ability, the first goal of this meta-analysis is to test if children with AD/HD have lower word or colour scores, indicating rapid naming, and/or reading problems. A second goal is to determine the strength of an interference deficit in AD/HD. Third, we will examine the influence of eight possible moderating factors. These moderating variables include: methods of calculating the interference score, comorbid reading and psychiatric disorders, AD/HD-subtypes, gender, age, intellectual functioning, the use of medication, and sample size. To assess if a deficit in interference control is specific to AD/HD, we will compare AD/HD groups with reading disorder and psychiatric disorder groups. Furthermore, the AD/HD inattentive subtype is compared with the AD/HD combined subtype. This issue is theoretically important because of the discussion on the validity of the distinction between these subtypes. AD/HD inattentive subtype and AD/HD hyperactive/impulsive subtype have been characterised as distinct, unrelated disorders (Milich, Balentine, & Lynam, 2001). Barkley (1997) explicitly states that his behavioural inhibition model of AD/HD refers only to the AD/HD combined and hyperactive/impulsive subtypes but not to the AD/HD inattentive subtype. On average, children with AD/HD have a lower IQ than their normal peers (Barkley, 1997). We wish to test if differences on the Stroop Colour-Word Task might be partly attributable to differences in IQ. The possible moderating effect of sample size is assessed to ascertain if the meta-analytic results are influenced by a publication bias.

Method

Description of the studies

This meta-analysis covers 17 studies published between 1990 and 2002. Table 1 summarises the main features of these studies. In column 1, the authors are listed. Column 2 shows the subject groups and the number of subjects in each of these groups. Column 3 provides the mean age and the age range for each of the groups. Information on the IQ of the children is summarised in column 4. We describe in column 5 how the different studies deal with various possible moderators, including reading and psychiatric disorders, AD/HD-subtypes, gender, age, IQ, and medication. Column 6 summarises the main results. In column 7 appear some remarks on the study. The studies were located in Pubmed, PsycInfo, Science Direct, Web of Science, and Picarta. We combined search terms related to the Stroop Colour-Word Task (such as Stroop, interference, executive) with search terms related to AD/HD (such as AD/HD, hyperactive, attention). The reference lists of published articles were used to locate other relevant studies.

Table 1. Summary of articles on AD/HD and the Stroop Colour-Word Task
Study Participants Age IQ Confounding Variables Results Remarks
1. Golden & Golden, 2002 43 LD (=RD)
43 NC
43 PD*
43 AD/HD
(AD/HD- subtypes:
 24 AD/HD-C; 14 AD/HD-H; 5 AD/HD-I)
10.0
9.9
9.4
9.9
Range:
6–15
Not Confounding:
 – Excluded: comorbidity: groups mutually exclusive
 – Matched on: All groups are matched with the LD group on: age, gender education, ethnicity
 – Statistically not different: age, education
Not Reported: medication, IQ, AD/HD- subtype
Interference: AD/HD = PD = NC > LD
Colour naming: AD/HD = PD = NC > LD
Word reading: AD/HD = PD = NC > LD
Colour-Word: NC > AD/HD = PD = LD
PD = 6 CD, 11 ODD; 1 Mood Disorder NOS; 5 Anxiety Disorder; 18 Adjustment Disorder; 2 PTSD
2. Nigg et al., 2002 69 AD/HD-C
35 AD/HD-I
51 NC
9.6
9.9
9.7
Range:
7–12
101.5
104.9
109.4
Not Confounding:
 – Excluded: Autism, Tourette, Depression, and Bipolar Disorder
 – Matched on: age, gender, recruitment source
 – Tested as a covariate: IQ, RD, ODD, CD, medication
 – Statistically not different:
AD/HD-subtype, gender
Interference: AD/HD = NC
Colour naming: NC > AD/HD
Word reading: NC > AD/HD
Colour-Word: NC > AD/HD
AD/HD-I = AD/HD-C
More data than presented in the article are used. In calculating the overall effect size, only AD/HD-C and NC are included.
3. Rucklidge & Tannock, 2002 35 AD/HD
12 RD
24 AD/HD + RD
37 NC
15.2
15.1
14.9
15.0
Range: 13–16
102.2
99.9
101.1
111.0
Not Confounding:
 – Excluded: medication
 – Matched on: age
 – Statistically not different: gender
 – Tested as a covariate: ODD/CD, PD, IQ, SES
Confounding:
Statistically different: comorbid RD
Not Reported:
 – AD/HD-subtypes
Interference: AD/HD = RD = AD/HD + RD
Colour naming: NC > AD/HD, AD/HD + RD
Word-reading: NC > RD, AD/HD + RD and AD/HD > AD/HD + RD
Colour-Word: NC, AD/HD, RD > AD/HD + RD and NC > AD/HD
4. Scheres et al., in press 18 AD/HD
20 NC
(AD/HD-subtypes:
 8 AD/HD-I, 10 AD/HD-C)
9.3
9.9
Range:
8–12
99.6
104.8
Not Confounding:
 – Excluded: girls, medication, PD
 – Statistically not different: AD/HD-subtype
Possible Confounding:
 – ODD/CD: symptoms correlated high with AD/HD symptoms
Confounding:
 – Tested as a covariate: age, IQ
Not Reported:
 – RD
Interference: AD/HD > NC
AD/HD-I = AD/HD-C
A selection of subjects from the original study were analysed because not all children performed the Stroop.
5. Schmitz et al., 2002 10 AD/HD-H
10AD/HD-I
10 AD/HD-C
60 NC
14.4
14.1
14.1
13.8
Range:
12–16
91.3
87.8
85.8
92.9
Not Confounding
 – Excluded: medication
 – Statistically not different: gender, IQ
Confounding:
 –Tested as a covariate: education, age, SES
 –Statistically different: AD/HD-subtype
Not Reported: – ODD/CD, PD, RD
Word reading: AD/HD = NC
Colour-Word: NC, AD/HD-H, AD/HD-C >  AD/HD-I
In calculating the effect size for word reading, only the AD/HD-C and NC are included.
6. Reeve & Schandler, 2001 10 AD/HD
10 NC
15.3
15.2
Range:
12–17
98.9
100.6
Not Confounding
 –Excluded: Medication, LD, ODD, CD
 – Matched on: age, gender
Not Reported: – PD, AD/HD-subtype
Interference: AD/HD > NC
Colour naming: NC > AD/HD
Word-reading: NC = AD/HD
Colour-Word: NC > AD/HD
7. Seidman, et al., 2001 21 AD/HD + 2 LD*
32 AD/HD + AD
16 AD/HD + RD
79 AD/HD
127 NC
14.3
13.8
15.2
15.1
15.1
Range:
6–17
96.1
105.4
102.2
113.3
118.0
Not Confounding:
 – Excluded: girls
 – Tested as a covariate: IQ, age, PD, ODD/CD, SES
 – Statistically not different: medication
Confounding:
 – Statistically different: RD and AD
Not Reported AD/HD-subtype,
Interference: AD/HD + RD > NC Colour naming: NC > AD/HD + LD; AD/HD >  AD/HD + LD > AD/HD + 2LD
Word reading: NC > AD/HD > AD/HD + LD; AD/HD + AD > AD/HD + 2 LD
Colour-Word: NC > AD/HD > AD/HD + LD, AD/HD + 2 LD
2 LD = combined reading and arithmetic disorder
8. Spalletta et al., 2001 8 AD/HD
8 NC
(7 AD/HD-C, 1 AD/HD-I)
9.4
9.0
Range:
6–14
Not Confounding
 – Excluded: PD, CD, medication
 – Matched on: age, gender, education.
Not Reported: IQ, ODD, RD, AD/HD-subtype
Colour naming: NC > AD/HD
Colour-Word: NC > AD/HD
PET-study
9. Willcutt et al., 2001 121 NC
93 RD
52 AD/HD
48 RD + AD/HD
10.7
10.4
10.8
10.6
Range:
8–16
113.3
100.1
101.1
99.2
Not Confounding
 – Excluded: Medication
 – Tested as a covariate: IQ, Reading ability, gender, ODD/CD
Confounding:
 – Statistically different: age (older AD/HD more impaired), RD
Not Reported: AD/HD-subtype
Interference: NC = AD/HD = RD = AD/HD + RD
Colour naming: NC = AD/HD > AH/HD + RD
Word reading: NC = AD/HD > RD, AD/HD + RD
Colour-Word: NC = AD/HD = RD > AD/HD + RD
10. Perugini, et al., 2000 21 AD/HD
22 NC
9.6
9.2
Range:
6–12
110.1
114.2
Not Confounding:
 – Excluded: girls, medication
 – Statistically not different: age, IQ, ODD/CD
Not Reported: – PD, RD, AD/HD-subtype
Colour-Word: AD/HD = NC
11. Seidman, et al., 2000 40 AD/HD
116 NC (sibling of
(AD/HD-child
118 NC (no sibling with AD/HD)
15.5
15.5
15.0
Range:
6–27
108.4
111.5
113.2
Not Confounding:
 – Tested as a covariate: gender, PD, ODD/CD, LD, IQ, SES
 – Statistically not different: medication
Confounding: – {LD, PD, SES} only on word reading
Not Reported: – Age, AD/HD-subtype.
Interference: AD/HD = NC
Colour naming: NC > AD/HD
Word reading: NC > AD/HD*
Colour-Word: NC > AD/HD
(No differences between the NC - groups)
*If controlled for PD, LD, SES: AD/HD = NC on Word reading
12. Semrud-Clikeman et al., 2000 10 AD/HD
11 NC
12.9
15.1
Range:
8–18
120.5
125.4
Not Confounding:
 – Excluded: Girls, PD, LD
 – Matched on: age and IQ
Possible Confounding:
 – Medication
Not Reported:
 – ODD/CD, AD/HD–subtype
Colour naming: NC > AD/HD
Word reading: NC > AD/HD
Colour-Word: NC > AD/HD
MRI study
13. Houghton et al., 1999 32 AD/HD-I
62 AD/HD-C
28 NC
10.5
9.9
10.2
Range:
6–12
V: 107
P: 119
V:100
P:111
V:107
P:116
Not Confounding:
 – Excluded: PD, LD, medication
 – Tested as a covariate: gender, age
Confounding:
 – Statistically different: AD/HD-subtype
Interference: NC = AD/HD-I = AD/HD-H
Colour naming: NC > AD/HD-C
Word reading: NC > AD/HD-C, AD/HD-I
Colour-Word: NC > AD/HD-C
In calculating the overall Effect size, only AD/HD-C and NC are included.
14. Seidman et al., 1997 43 AD/HD
36 NC
11.4
11.9
Range:
6–17
106.0
112.1
Not Confounding:
 – Excluded: boys
 – Matched on: age, SES, grade
 – Tested as a covariate: family history, PD, LD
Possible Confounding:
 – Medication
·Not Reported:
AD/HD-subtype, IQ
Interference: NC = AD/HD
Colour naming: NC = AD/HD
Word reading: NC = AD/HD
Colour-Word: NC = AD/HD
15. MacLeod & Prior, 1996 12 NC
12 AD/HD
10 CD
12 PD*
15.5
14.5
15.8 15.6Range:
12–18
112.5
107.2
103.6
106.4
Not Confounding:
 – Excluded: PD and CD: groups mutually
exclusive, medication.
 – Tested as a covariate: IQ and age
Not Reported: gender, RD, AD/HD-subtype
Interference: AD/HD, CD, PD > NC *PD = 3 anorexia, 4 schizophrenia, 3 depression, 2 school refusals
16. Grodzinsky & Diamond, 1992 66 AD/HD
34 young
32 old
64 NC
30 young
34 old
7.6
10.2
7.5
10.4 Range:
6–11
112.9
110.1
112.9
115.5
Not Confounding:
 – Excluded: girls, LD, PD medication, AD/HD-I
 – Statistically not different: age, IQ, motor performance
Possible Confounding:
 – Tested as a covariate: SES
Not Reported: – ODD/CD
Colour naming: NC > AD/HD
Word reading: NC > AD/HD
Colour-Word: NC > AD/HD
17. Lufi et al., 1990 29 AD/HD
21 PD*
20 NC
12.7
13.2
12.7
Range:
9–16
Not Confounding:
 – Excluded: girls, medication
Not Reported: – RD, PD, ODD/CD, AD/HD-subtype, IQ
Interference: AD/HD > PD
Colour naming: NC = AD/HD = PD
Word reading: AD/HD > PD
Colour-Word: NC = AD/HD = PD
*PD = various conduct and anxiety disorders
  • Note: Interference >= more interference. Other measures >= a better score, thus a faster response time.
  • Index of abbreviations:
  • AD = Arithmetic Disorder
  • AD/HD = Attention Deficit/ Hyperactivity Disorder
  • AD/HD-I = predominantly inattentive subtype
  • AD/HD-H = predominantly hyperactive subtype
  • AD/HD-C = combined subtype
  • CD = Conduct Disorder
  • IQ = Intelligence Quotient
  • LD = Learning disorders (reading disorder or artithmetic disorder)
  • NC = Normal Controls
  • ODD = Oppositional Defiant Disorder
  • PD = Psychiatric Disorders (other than AD/HD, ODD and CD)
  • RD = Reading Disorder
  • SES = Social Economic Status.

To be included in the meta-analysis, studies had to meet the following criteria: (1) the studies contained at least one AD/HD group and a comparison group of normal control children, (2) studies had to use the standard Stroop Colour-Word Task, and (3) for the interference score: studies were required to use one of two methods (described in the following section) to calculate the interference score. Where studies did not report the interference score (Schmitz et al., 2002; Perugini, Harvey, Lovejoy, Sandstrom, & Webb, 2000) or another interference score was used than the two proposed in this meta-analysis (Willcutt et al., 2001), attempts were made to locate the primary author to provide the group means and the standard deviations of the group means in order to allow computation of the C–CW interference score. This meta-analysis reports on the results of 15 studies for the colour and word score; furthermore, meta-analytic results for the interference score pertain to 14 studies. With a single exception (Reeve & Schandler, 2001), all studies in this meta-analysis used DSM criteria (DSM-III-R; American Psychiatric Association, 1987; DSM-V; APA, 1994) to establish a diagnosis of ADHD. More specifically, for the studies included in the meta-analysis, 291 children were diagnosed as ADHD using DSM-III-R criteria and 306 children using DSM-IV criteria.

Studies were excluded if the same subject data were also (partly) published in another study. Therefore, we excluded the following studies: Barkley et al. (1992); Seidman et al. (1995); Seidman, Biederman, Faraone, Weber, and Oullette (1997); and the study by Grodzinsky and Barkley (1999). We excluded studies that were published before 1990 (Cohen, Weiss, & Minde, 1972; Boucugnani & Jones, 1989; Gorenstein, Mammato, & Sandy, 1989) for one or both of the following reasons. First, these studies did not use DSM-IIIR or DSM-IV criteria. Second, some studies did not report the findings for the interference score and we were unable to locate the primary author of older studies in order to obtain the data that allows computation of the interference score.

Computerised versions of the Stroop Colour-Word Task are not comparable with the standard version in terms of control condition, response mode, and Stroop stimuli. The studies by Carter, Krener, Chaderjian, Northcutt, and Wolfe (1995a), Miller, Kavcic, and Leslie (1996) and Gaultney, Kipp, Weinstein, and McNeill (1999) used a computerised Stroop. These studies were excluded from the present meta-analysis.

Dependent variables

This meta-analysis focused on the following three dependent variables derived from the Stroop Colour-Word Task:

  • 1

    The number of words named correctly in 45 seconds on the word card: a rough indication of reading ability and rapid naming.

  • 2

    The number of colours named correctly in 45 seconds on the colour card: an indication of speed of colour processing and rapid naming.

  • 3

    The interference score: the measure of interference control in the Stroop Colour-Word Task. This measure quantifies how much slower colour naming becomes, when word reading interferes with colour naming. Two widely used methods of calculating the interference score are available. The first method controls only for colour naming, the second for both colour naming and word reading.

In the first method (the classical method), the score derived from the colour-word card was subtracted from the score on the colour card (Hammes, 1971). In the second method (Golden, 1978), correction for colour naming and word reading was achieved as follows. First, a colour-word (CW) score was predicted. This predicted score was subtracted from the uncorrected raw CW score. The predicted CW score can be calculated either by using a regression formula or by a theoretical formula (Golden, 1978). The regression formula is based on a mean score corrected for the subjects’ age and education. The theoretical model suggests that the time to read one colour-word is actually the time to read one word followed by the time to name one colour. The following formula (Golden, 1978) can be deduced from the theoretical model:
image
In which the W score is the score on the word card, the C score represents the score on the colour card, and the CW score is the score on the colour-word card.

The interference score is positive when a subject is able to inhibit word reading, and negative when word reading actively interferes with the colour naming process. The two methods used to predict a CW score yield highly comparable results (r = .96), suggesting that these two methods are interchangeable. Both methods will be referred to as the ‘Golden’ method, since Golden (1978) proposed both methods.

We compared the ‘Golden’ method (irrespective of how the CW score was predicted) with the classical method (C score − CW score, Hammes, 1971) as far as was possible with the available data. If no interference score was reported, the classical interference score was calculated using the raw mean data. We estimated the group standard deviation (SD) with the following formula:
image

In this formula, r(C score − CW score) represents the average correlation between the number of correct responses on the colour card and the number of correct responses on the colour-word card. The r(C score − CW score) was set at .7 (C. J. Golden, personal communication, 25 February 2003).

Statistical analyses

Analyses were conducted using a computer program developed by Borenstein and Rothstein (1999). The effect sizes (in terms of Cohen's d) for the word, colour and interference score were calculated for each study separately. An overall effect size was computed by weighting all the effect sizes with the sample size of the study. Following Cohen's guidelines, effect sizes of .20, .50, and .80 were used as thresholds to define small, medium, and large effects, respectively (Cohen, 1988). To test if the variability in effect sizes exceeded that expected from sampling error alone, a test of heterogeneity was conducted (Borenstein & Rothstein, 1999).

Since it is unreasonable to assume that all of the heterogeneity in the effect sizes of the studies can be explained, the possibility of ‘residual heterogeneity’ must be acknowledged in the statistical analysis. The appropriate analysis is, therefore, a ‘random effects’ rather than a ‘fixed effects’ meta-regression model (Thompson & Higgins, 2002). In the ‘fixed effects’ model, it is assumed that all studies are derived from a common population, and the only source of variation between the studies is random error. With a sufficiently large sample, this error will approach zero and the estimates of the effect sizes reflect together the true combined effect size. In the ‘random effects’ model, it is assumed that the effect sizes may differ because the subject characteristics vary from one study to another. When an attempt is made to combine data, two sources of variance need to be dealt with: random error and variance that reflects real differences between the populations from which subjects are sampled. A fixed effects analysis estimates the assumed common effect, whereas a random effects analysis estimates the mean of a distribution of effects across studies. If residual heterogeneity exists, a random effects analysis appropriately yields wider confidence intervals for the combined effect size than a fixed effects analysis (Thompson & Higgins, 2002).

The random combined effect sizes were calculated for the word, colour, and interference score of studies (1) comparing AD/HD groups without comorbid reading disorder and AD/HD groups with comorbid reading disorder, (2) comparing AD/HD groups and reading disorder groups, (3) comparing AD/HD groups and psychiatric control groups and, (4) comparing AD/HD primarily inattentive subtype groups and AD/HD combined subtype groups. Note that for the third comparison, it was not possible to calculate the effect sizes for the word and colour score because there were too few studies to conduct this comparison. Age, IQ, and sample size were correlated with the effect sizes for the word, colour, and interference score. The correlations for age and IQ were weighted for the relative number of subjects in the study (Stevens, 1996). The two methods for calculating the interference score were compared using a Wilcoxon rank order test. Alpha was set at .05 for all analyses.

Results

The results of the three dependent measures of interest: the word score, the colour score, and the interference score are summarised in Tables 2 to 4.

Table 2. Comparison of the effect sizes for the word score in AD/HD studies
inline image
  • Note: Positive effect sizes indicate better performance for normal controls as compared to children with AD/HD. Two different age groups are included for the Grodzinsky & Diamond study (1992).The random combined effect size is reported in the last row.
Table 3. Comparison of the effect sizes for the colour score in AD/HD studies
inline image
  • Note: Positive effect sizes indicate better performance for normal controls as compared to children with AD/HD. Two different age groups are included for the Grodzinsky & Diamond study (1992).The random combined effect size is reported in the last row.
Table 4. Comparison of the effect sizes for the interference score in AD/HD studies
inline image
  • Note: Positive scores indicate better performance for normal controls as compared to children with AD/HD. Separate combined effect sizes are shown in the 6th row for the C−CW score, in the 16th row for the ‘Golden’ score and in the last row for both methods combined.

The combined random effect size for the word condition (Table 2) was .49 and significant (p < .001). This effect is close to Cohen's standard for a medium effect size. The effect sizes for the word condition were heterogeneous (p = .003), indicating that there were large variations in the magnitude of the difference between children with AD/HD and normal controls. Two effect sizes were close to zero (Golden & Golden, 2002; Schmitz et al., 2002) and 14 effect sizes were positive.

The combined random effect size for the colour condition, .58 (see Table 3), is significant (p < .001) and corresponds to a medium effect size. Again, effect sizes were heterogeneous (p = .003). All the effect sizes for the colour condition were positive, which means that only the magnitude of the effect varied between studies. This indicates that in all studies, normal controls performed better than children with AD/HD in the colour naming condition.

The combined random effect size of the variable of primary interest, the interference score, was .35 (see Table 4) and significant (p = .004). This is considered a small effect size. The effect sizes for the interference scores were heterogeneous (p < .001). One effect size for the interference score was negative (Perugini et al., 2000). In eight studies, the effect sizes were around zero (Golden & Golden, 2002; Houghton et al., 1999; Nigg, Blaskey, Huang-Pollock, & Rappley, 2002; Rucklidge & Tannock, 2002; Seidman et al., 1997; Seidman, Biederman, Monuteaux, Weber, & Faraone, 2000; Seidman, Biederman, Monuteaux, Doyle, & Faraone, 2001; Willcutt et al., 2001) and in five studies the effect sizes were positive (Lufi et al., 1990; MacLeod & Prior, 1996; Reeve & Schandler, 2001; Scheres et al., in press; Spalletta et al., 2001). This indicates that studies report inconsistent results for the difference between children with AD/HD and their normal peers with regard to the interference score.

Moderating variables

Methods for calculating the interference score. The overall effect size for the interference score (.35) was calculated with the interference score as reported by the authors. The effect size for the C − CW interference score was .26, not significant and heterogeneous (p = .04) while the effect size for the Golden interference score was .40 and significant (p = .01) but heterogeneous (p < .001)

If both the raw mean data and the Golden score were available, a C − CW score was computed and compared with the Golden score. This was done for seven studies (Houghton et al., 1999; Lufi et al., 1990; Nigg et al., 2002; Reeve & Schandler, 2001; Scheres et al., in press; Seidman et al., 2000, 2001). The random combined effect size for the C − CW interference score was −.003, not significant and homogeneous. The effect size for the Golden interference score, was .29, significant (p = .03) but heterogeneous (p = .02). No significant difference was found between these two methods (Z = −1.69, p = .09), but this result suggests that there is a trend for the Golden score being larger than the C − CW score.

Table 1 indicates how the studies deal with the moderating variables described in the following section.

Reading disorder and psychiatric disorders. The meta-analytic results of the group comparisons between: (1) AD/HD − AD/HD and comorbid reading disorder, (2) AD/HD− reading disorder, and (3) AD/HD and psychiatric disorders are presented in Table 5.

Table 5. Random combined effect sizes for the word, colour and interference score in studies with AD/HD groups, reading disorder groups and psychiatric disorder groups
Group comparison Sample size Studies Word Colour Interference
d p d p d p
AD/HD − AD/HD + RD 254 3 .96a .00 .54a .00 −.26b .42
AD/HD − RD 278 3 .64a .00 .29a .06 −.32a .02
AD/HD − PD 160 3 −.36b .28
  • Note: Dashes indicate that the effect size was not calculated. Positive effect sizes indicate better performance for the AD/HD group as compared to the other groups. d = random combined effect size, RD = reading disorder, PD = various psychiatric disorders.
  • aHomogeneous effect. bHeterogeneous effect.

As can be seen in Table 5, children with AD/HD perform better at word reading and colour naming than children with AD/HD and a comorbid reading disorder but there is no significant difference in the interference score. Compared with children with a reading disorder, children with AD/HD perform better at word reading, there is a trend for better performance at colour naming but they have a reduced resistance to interference. There is no significant difference between children with AD/HD and children with various psychiatric disorders in interference control.

AD/HD-Subtypes. Three studies (Houghton et al., 1999; Nigg et al., 2002; Scheres et al., in press) compared children with AD/HD inattentive subtype (AD/HD-I) and AD/HD combined subtype (AD/HD-C) and found no differences. Meta-analytic results, however, reveal a small, but significant and homogeneous combined random effect size of −.35 (p = .02) for the interference score. This effect size indicates that children with AD/HD-I have less resistance to interference than children with AD/HD-C. The effect sizes for the time-to-read words and the time-to-name colours were not significantly different between the subtypes (combined random effect size: −.14, ns, and .21, ns, respectively) and homogeneous.

Gender. Research has failed to find a substantial difference in the Stroop Colour-Word Task dependent measures between men and women at any age (MacLeod, 1991), although women may be somewhat faster, especially in naming colours. In this meta-analysis, the proportion of boys and girls was approximately equal across AD/HD groups and the normal control groups. Hence, there is no reason to suspect an influence of gender on the dependent variables.

Age. The interference effect begins early in the school years, rising to its highest level around grades 2 to 3 as reading skills develop (Schiller, 1966). Cognitive control is still developing after grades 2 and 3 with an accompanying improvement in interference control. No developmental changes have been reported until approximately 60 years, at which age interference control begins to decrease (Comalli, Wapner, & Werner, 1962).

No significant correlations were found between the effect sizes for each of the dependent variables and mean age. Thus, it seems that the differences on the Stroop Colour-Word Task between children with AD/HD and their normal peers remain the same across the age range studied here.

Intellectual functioning. No significant correlations were found between the difference between AD/HD groups and normal control groups in IQ scores and the effect sizes for the word: r(14) = −.31, ns, colour: r(13) = −.20, ns, and interference score: r(12) =.11, ns.

Medication. Methylphenidate (MPH) is the most common pharmacological treatment for children with AD/HD (Greenhill, Halperin, & Abikoff, 1999; MTA Cooperative Group, 1999). Recently, it has been shown that MPH improves colour naming and word reading, but that it has no effect on response interference (Bedard, Ickowicz, & Tannock, 2002). There were too few studies using the Stroop Colour-Word Task to analyse the effects of medication in this meta-analysis.

Sample size. There was a strong negative correlation between sample size and the effect sizes for the dependent variables in this meta-analysis for the colour: r(16) = −.68, p < .01, word: r(16) = −.42, ns and interference score: r(14) = −.60, p = .02. This means that studies with larger samples report small effect sizes, while studies with small samples report large effect sizes. These correlations may reflect the difficulty of publishing studies including small samples and reporting no group differences.

Discussion

Impairments in interference control have been implicated as one of the core deficits in AD/HD (Barkley, 1997). The Stroop Colour-Word Task has been frequently used to demonstrate this deficit and as an aid in clinical diagnosis. Seventeen independent studies, encompassing large groups of children, were analysed to determine the degree of this deficit in interference control in children with AD/HD compared with normal controls. The role of the following moderator variables was assessed: comorbid reading and psychiatric disorders, AD/HD-subtypes, gender, age, IQ and sample size. The results reported here indicate that a deficit in interference control, as measured with the Stroop Colour-Word Task, is either absent or very small in children with AD/HD and depends heavily on the method of calculation. Children with AD/HD had lower word reading and colour naming scores than normal controls. Comorbid reading disorder was found to have a negative impact on colour naming and word reading, but there was no consistent effect on the interference score. Compared with children with a reading disorder, children with AD/HD had a better word and colour score, but a lower interference score. There was no significant difference between children with AD/HD and children with various psychiatric disorders on the interference score. A small difference was found in interference control between the AD/HD-subtypes: children with predominantly inattentive subtype had poorer control over interference than the children with AD/HD-combined subtype. No effects of gender, age, and IQ were noted, but the correlations between the effect sizes and sample size suggest a publication bias.

Study limitations

The negative correlation between sample size and the effect sizes for the colour and interference scores may be an indication of a publication bias. Small studies with significant results will probably be published more often than small studies with no significant results (see for a review Rosenthal, 1979).

Some children in the AD/HD-inattentive subtype group may be just one hyperactivity symptom below the threshold for the AD/HD-combined subtype or may be formerly children with AD/HD-combined subtype, who have outgrown one or two symptoms of hyperactivity/impulsivity over time. The distinction between AD/HD-inattentive subtype and AD/HD-combined subtype may be confounded by problems of contamination of the inattentive subtype with subthreshold combined subtype cases.

The comparisons between children with AD/HD and children with various psychiatric disorders, between children with AD/HD with and without a comorbid reading disorder, and between children with AD/HD and children with a reading disorder are based on a limited number of studies (three studies). Thus, the results pertaining to these group comparisons should be interpreted with caution.

No significant correlations were found between, on the one hand, age and IQ, and, on the other, the effect sizes for the word, colour, and interference scores. However, these correlations probably underestimate the associations that would be found if this analysis were conducted using data at a subject level. Furthermore, one study (Scheres et al., in press) found that covarying for age (and IQ) reduced the differences between children with AD/HD and their normal peers.

What is the best method to calculate interference control?

This meta-analysis shows that the method of calculating interference is crucial to the interpretation of the results. When interference is calculated by subtracting the CW score from the C score, there is no difference in interference control between children with AD/HD and normal controls. Thus, because children with AD/HD are slower on both cards (C card and CW card) compared with normal controls, there is no difference in the interference score. The Golden method is better in differentiating children with AD/HD from normal controls than the classical C–CW score. It should be borne in mind that the interference score proposed by Golden (1978) is based on a comparison of an estimation of a CW score and the real CW score. This estimation is based on the assumption that the time to read one colour-word is actually the time to read one word followed by the time to name one colour. This assumption corresponds with older, sequential explanations of the Stroop effect: that processing in one stage must be completed (or almost completed) before processing in the next stage may begin. Neural imaging research on the Stroop Colour-Word Task supports the notion that Stroop stimuli are processed in parallel in a network of brain areas (Atkinson, Drysdale, & Fulham, 2003; Ukai et al., 2002; West & Alain, 1999). Therefore, the theoretical model on which the formula is founded does not stand on strong ground. For this reason, we suggest that the traditional method of calculating the interference score may be a more ‘pure measure’ of interference.

Do children with AD/HD have a reduced resistance to interference?

This meta-analysis suggests that there is little support for a deficit in interference control in AD/HD, as measured by the Stroop Colour-Word Task. The fact that no deficit in interference control was observed using the traditional method to calculate interference, and that children with AD/HD-inattentive subtype may have less resistance to interference than children with AD/HD-combined subtype, does not support the inhibition deficit hypothesis (Barkley, 1997; Pennington & Ozonoff, 1996), which pertains to the AD/HD combined subtype in particular.

Results of other studies, using a different design to measure interference control, are mixed. Scheres et al. (in press) and Jonkman et al. (1999) measured interference control with a Flanker Task and found an interference effect on errors. Cornoldi et al. (2001) found that children with AD/HD had difficulties in controlling interference related to working memory. When a computerised version of the Stroop was used, Carter et al. (1995a) found a difference in reaction time between children with AD/HD and normal controls, while Gaultney et al. (1999) did not find such an effect.

An interesting finding emerges from interference studies with fMRI in which AD/HD groups and normal control groups are compared on brain activation during a ‘counting Stroop’ (Bush et al., 1999) and a ‘go-nogo’ task (Durston et al., 2003). Activation patterns indicated that the normal adults activated the anterior cingulate cortex; specifically the cognitive division (Bush et al., 1999) and normal children activated fronto-striatal regions (Durston et al., 2003). In contrast, adults with AD/HD failed to activate the anterior cingulate cortex, and children with AD/HD failed to activate fronto-striatal regions. In both studies, the AD/HD groups relied on a more diffuse network of regions, although in the study by Bush et al. (1999) no performance differences were observed between the control group and the AD/HD group. Bush and colleagues interpreted these finding as demonstrating that adults with AD/HD may compensate for impairments by recruiting a different and less responsive pathway. Based on only the card version of the Stroop Colour-Word Task, one cannot conclude that children with AD/HD have no deficit in interference control. This is because results from other interference tasks and imaging research indicate that AD/HD is related to problems in interference control. The fact that this is not shown by the card version of the Stroop Colour-Word Task may indicate that this is not a generalised deficit but may be context dependent.

Rapid naming

Interference scores need to be controlled for at least colour naming. If this is not done, differences on the CW card may also reflect differences in rapid naming. Deficiencies on the W, C, and CW cards have been related to abnormalities in brain structure (Semrud-Clikeman et al., 2000). Semrud-Clikeman et al. (2000) demonstrated that poorer performance on all three cards of the Stroop Colour-Word Task was significantly related to reversed asymmetry of the caudate. Thus, a slower retrieval of colour-names and a slower reading speed may be an indication of abnormalities in brain structure in AD/HD. Therefore, it is important to assess these deficits in AD/HD. Slow processing speed is frequently reported in children with AD/HD compared to normal controls (e.g., Mason, Humphreys, & Kent, 2003; Sergeant, Oosterlaan, & van der Meere, 1999). This general slowing has been interpreted as reflecting a ‘non-optimal activation state’ (see, for review, Sergeant & Van der Meere, 1990, 1991; Sergeant et al., 1999; Van de Meere, 1996). Other evidence that children with AD/HD may be less able than their normal peers to maintain the state required for optimal task performance can be derived from the work of Leth-Steensen, King Elbaz, and Douglas (2000). Their results confirmed that the mean slower reaction times of boys with AD/HD were not due to a generalised slowing of all responses but was due to a greater proportion of abnormally slower responses, as shown earlier by Sergeant (1988). Children with AD/HD may be less able than their normal peers to maintain a stable reaction time over trials. This result can explain the slower naming and reading speeds and is consistent with the hypothesis that AD/HD involves a non-optimal activation state. Unfortunately, the present data does not allow this theoretical explanation to be tested. Future studies should address this issue.

Clinical practice and future research

Based on this meta-analysis, we cannot recommend the Stroop Colour-Word Task in its standard form for use in clinical practice in AD/HD. Another reason to advise against the use of the Stroop Colour-Word Task in clinical practice is its low negative predictive power: a normal score can be obtained despite the fact that the child has AD/HD (Doyle et al., 2000; Grodzinsky & Barkley, 1999). The predictive validity can be improved when used in combination with other executive tests (Perugini et al., 2000). Therefore, if the Stroop Colour-Word Task is used in clinical practice, it should always be used in combination with other executive function tests.

The interference score cannot differentiate between children with AD/HD and children with various other psychiatric disorders. The interference score can differentiate between children with AD/HD and children with a reading disorder. This difference probably reflects the fact that reading is less automatised in children with a reading disability. Word reading will thus interfere less with colour naming on the CW-card in children with a reading disorder.

A better alternative for research and clinical use may be a ‘trial-by-trial’ computerised version of the Stroop Colour-Word Task. Perlstein, Carter, Barch, and Baird (1999) showed that a trial-by-trial version of the Stroop Colour-Word Task showed greater sensitivity to attentional pathology. A second advantage is that a computer allows response times and the response variability to be measured with high accuracy. A computerised Stroop Colour-Word Task and variations on this task have already been used in various studies (Bush et al., 1999, Gaultney et al., 1999, Carter et al., 1995a, Miller, Kavic, & Leslie, 1996).

Conclusion

The results obtained with the Stroop Colour-Word Task do not provide strong evidence for a core deficit in interference control in AD/HD. This result argues against current theoretical models, which emphasise inhibitory control deficits in AD/HD (Barkley, 1997; Pennington & Ozonoff, 1996). Studies using other measures of interference control, however, do provide evidence in favour of the interference control deficit hypothesis, which suggests that there might be a subtler and contextually dependent interference deficit in AD/HD. Interestingly, in this meta-analysis, rapid naming deficiencies are more pronounced in AD/HD than a deficit in interference control. Should we reject the Stroop Colour-Word Task in its standard form if we want to investigate interference control in children with AD/HD? Our conclusion is affirmative to this question. The Stroop Colour-Word Task is not a golden standard to demonstrate an interference deficit in AD/HD.

Acknowledgements

We would like to thank Dr Joel T. Nigg and Dr Russell A. Barkley for their helpful comments on an earlier version of this manuscript and for providing us with data. We are also most grateful to Dr Larry J. Seidman, Dr Gail M. Grodzinsky, Dr Elisabeth Harvey, and Dr Stephan Houghton for providing us with all the necessary information for conducting the meta-analysis. We also would like to thank Dr Charles Golden for providing us with information about the Stroop Colour-Word Task.

    Abbreviations:

  1. AD/HD
  2. Attention Deficit/Hyperactivity Disorder
  3. C
  4. Colour
  5. CD
  6. Conduct Disorder
  7. CW
  8. Colour-Word
  9. ODD
  10. Oppositional Defiant Disorder
  11. W
  12. Word
  13. Footnotes

  14. * References marked with an asterisk indicate studies included in the meta-analysis.