Culture and listeners’ gaze responses to stuttering
Abstract
Abstract Background: It is frequently observed that listeners demonstrate gaze aversion to stuttering. This response may have profound social/communicative implications for both fluent and stuttering individuals. However, there is a lack of empirical examination of listeners’ eye gaze responses to stuttering, and it is unclear whether cultural background plays a role in regulating listeners’ eye gaze response to stuttering.
Aim: To examine listeners’ eye gaze responses to stuttering speech relative to fluent speech in three cultural groups.
Methods & Procedures: Eighteen African-American, 18 European-American and 18 Chinese adults were audiovisually presented with three stuttering and three fluent speech samples, when an eye-tracking device simultaneously recorded their gaze behaviours. The targets of listeners’ eye gaze included four regions of interest (ROIs) on the speaker's face: eyes, nose, mouth and outside (i.e., everything else). Listeners’ per cent of gaze time, gaze fixation count and average duration of gaze fixation were analysed with repeated-measures ANOVAs regarding each ROI as functions of the speaker's fluency status and listeners’ cultural background.
Outcomes & Results: When observing stuttering speech, listeners tended to reduce gaze fixation duration on the speaker's eyes and increase their gaze time on the mouth. However, different from the two American groups, the Chinese group reduced their gaze time on the speaker's mouth. In addition, the Chinese participants’ gaze behaviours were more focused on the ROI of outside, whereas the two American groups showed a similar focus on the ROIs of eyes and mouth.
Conclusions & Implications: All groups of listeners responded to stuttering with gaze aversions mainly contributed to by a reduction in gaze fixation duration rather than gaze fixation number. This pattern of gaze aversion suggests that stuttering oppresses listeners with an emotional and/or cognitive overload. Attention shift and compensation strategies for speech signal degradation may also account for listeners’ gaze responses to stuttering. Cultural differences in eye gaze responses to stuttering were observed mainly between Chinese and American listeners.
What this paper adds
What is already known on the subject
Listeners tend to show eye gaze aversion to stuttering, which may have strong social/communicative implications for both listeners and speakers.
What this study adds
This paper indicates that listeners' gaze aversion in response to stuttering is more likely the result of a reduction in gaze fixation duration rather than gaze fixation amount, implying that observing stuttering is a heavy burden for listeners, emotionally and/or cognitively. Attention shift and compensation strategies for speech perception may also contribute to the eye gaze change. Cultural difference was mainly observed between Chinese and American participants.
Introduction
Stuttering is a speech disorder characterized by involuntary disruptions in the normal rhythmic flow of speech (Perkins 1990, Bloodstein and Bernstein-Ratner 2008). The primary behavioural manifestations, which are usually observed when a child starts to stutter, are comprised of sound prolongations, syllable repetitions, and audible or inaudible postural fixations (van Riper 1982, Kalinowski and Saltuklaroglu 2006). Along with the development of stuttering, persons who stutter (PWS) usually devise various secondary behaviours, such as head jerking, tongue protruding, gaze aversion, eye blinking, arm swinging, foot stomping, etc. The form and severity of these associated features of stuttering may vary across individual PWS, and may change over time (van Riper 1982, Bloodstein and Bernstein-Ratner 2008). Stuttering is usually estimated to have an incidence of about 5% and a prevalence of approximately 1% (van Riper 1982, Bloodstein and Bernstein-Ratner 2008). Although there is a lack of data about its prevalence and incidence outside Europe and North America, it is generally believed that stuttering affects people from different tribes, societies and cultures non-discriminately.
Numerous studies have suggested that stuttering is a social disorder with strong social/emotional involvements. The social/emotional elements of stuttering are manifest in its symptoms, therapeutic efforts to treat stuttering and listeners’ responses to stuttering. For example, PWS frequently report that their stuttering worsens when they talk to an audience, on the telephone or with an authoritative figure (van Riper 1982, Bloodstein and Bernstein-Ratner 2008). Strong negative feelings such as anxiety, embarrassment, shame, guilt, fear, etc., were reportedly experienced by PWS before and during moments of stuttering (Sheehan 1958, van Riper 1982, Craig et al. 2003, Blumgart et al. 2010). These negative emotional feelings have been the core elements of stuttering modification programmes (van Riper 1973), and have been suggested to regulate PWS's choice of stuttering therapy and maintenance of therapeutic techniques (Plexico et al. 2005, 2009a, 2009b, Daniels et al. 2006). Numerous studies indicated that PWS are perceived to have negative personality traits, such as being introverted, nervous, insecure, tense, shy, reticent, afraid, anxious, passive and sensitive, by various groups of people (Woods and Williams 1976, Crowe and Walton 1981, St. Louis and Lass 1981, Hurst and Cooper 1983a, 1983b, White and Collins 1984, Cooper and Cooper 1985, Cooper and Rustin 1985, Horsley and Fitzgibbon 1987, Lass et al. 1989, Hulit and Wirtz 1994, Cooper and Cooper 1996, Rami et al. 2003, Langevin et al. 2009). Consequently, stuttering has a negative, long-lasting impact on the quality of life for PWS (Klompas and Ross 2004, Craig et al. 2009, Zhang et al. 2009, Koedoot et al. 2011), including children and adolescents who stutter (Chun et al. 2010).
Among the social/emotional elements of stuttering, the dynamic interaction between PWS and listeners during communication is of special interest. It reflects the in vivo responses to stuttering in both listeners and speakers, and may help to reveal the nature of the negative perceptions toward stuttering. One measure of such dynamics is eye contact, or mutual gaze, which occurs when two people are looking at each other's eyes (Argyle and Cook 1976). It has been long known that human gaze behaviours have communicative and emotional effects. For example, making eye contact can facilitate interpersonal communication (Grumet 1983, Krantz et al. 1983, Vertegaal et al. 2001), deliver social and emotional content (Bailenson et al. 2003, Grossmann et al. 2008), increase one's attractiveness (Kampe et al. 2001), indicate love and intimacy (Goldstein et al. 1976, Cordell and McGahan 2004), facilitate gender categorization (Macrae et al. 2002), facilitate the perception of emotions involved with approach motivation (e.g., anger and joy) but not avoidance-motivated emotions (e.g., fear and sadness) (Adams and Kleck 2003), and have an effect on observers’ emotional responses together with facial expressions (Bayliss et al. 2007). On the contrary, avoiding eye contact may have negative implications for these social/emotional aspects.
Averting eye contact is one typical secondary stuttering behaviour (van Riper 1982). It is also among listeners’ most common responses to stuttering (Kamhi 2003). However, there is very limited empirical evidence regarding listeners’ gaze responses to stuttering, probably because of technological difficulty. Rosenberg and Curtiss (1954) had observers examine college students’ eye contact, hand movement and other body movement when having a conversation with a person simulating stuttering with repetition and minimal body movement, and a normally fluent speaker, respectively. They reported that listeners showed reduced eye contact along with depressed hand and other body movement when talking to the simulated PWS as compared with the fluent speaker. Recently, Bowers et al. (2010) used an infrared eye-tracking device to compare listeners’ gaze fixation on various facial features (e.g., eyes, nose and mouth) of one male speaker who demonstrated either stuttering speech or fluent speech. Their results indicated that listeners showed significantly less eye contact when observing stuttering speech compared with fluent speech.
Questions remain about whether and how cultural backgrounds influence listeners’ gaze responses to stuttering. Culture is known to play a role in both visual processing and eye contact. For example, different from European and North American cultures, direct eye contact denotes disrespect to the superior, or a challenge to authority in Africa and Eastern Asia (Cheng 1990, Terrell and Jackson 2002). Many experimental studies focused on contrasting Easterners with Westerners in their visual processing patterns (Kitayama et al. 2003, Hedden et al. 2008). Various authors (Chua et al. 2005, van Gompel et al. 2007, Blais et al. 2008, Boduroglu et al. 2009) suggested that Asians tend to have a holistic visual perception, Westerners an analytical visual perception, and the differences were contributed by cultural dimensions such as individualism versus collectivism.
This experiment aimed to examine the role of culture in listeners’ eye gaze responses to stuttering speech. Specifically, listeners’ gaze behaviours (e.g., gaze time, number and duration of gaze fixations) on the speaker's eyes, mouth and nose, as well as other facial areas and background when they observed stuttering speech relative to fluent speech were compared across three different cultural backgrounds. The following research questions were asked:
- •
Is listeners’ eye gaze response to stuttering speech significantly different than that of normally fluent speech?
- •
Are there significant cross-group differences in listeners’ eye gaze responses toward stuttering?
Previous empirical studies suggested that listeners spend less time fixated on the stuttering speaker's eyes (Bowers et al. 2010). It was expected that regardless of their cultural background, participants made less eye contact (e.g., fewer eye gazes, shorter gaze duration) with the stuttering speaker as compared with the fluent speaker. With previous studies of visual processing and eye contact, it was reasonably expected that culture exerts a significant impact on listeners’ eye gaze response to stuttering speech. Specially, Chinese and African-Americans were expected to show fewer and shorter eye gazes on the stuttering speaker as compared with European-Americans.
Results from this study may provide a detailed description of the eye gaze behaviours in listeners when having conversations with PWS, shed light on the formation and development of listeners’ negative perceptions toward PWS, and provide clues about the form and severity of the social punishment a PWS endures in different cultural settings.
Methods
Participants
Sixty-four participants were recruited via word of mouth at East Carolina University at Greenville, NC, USA. All participants identified themselves as African-American or European-American or Chinese. All Chinese participants were born in mainland China and came to the United States after 18 years of age. Participants self-reported to have normal hearing and normal or corrected vision. Participants were excluded if they had a previous diagnosis of speech, language, hearing or cognitive difficulty, or formal training about fluency disorders. Ten participants were excluded from the analysis because of software glitches, excessive eye closure or problems in experimental operation. The final sample for statistical analysis included 18 African-Americans (12 females and six males; age range = 20–50 years, mean = 29.28 years, SD = 8.30 years), 18 European-Americans (14 females and four males; age range = 19–54 years, mean = 26.61 years, SD = 9.55 years), and 18 Chinese (11 females and seven males; age range = 22–45 years, mean = 29.22 years, SD = 6.27 years).
Stimuli and apparatus
The stimuli included six 60-s segments of recorded speech samples. They were recorded in a multimedia studio by professional multimedia staff at East Carolina University. Each recording showed a speaker's upper torso (from head to shoulder) placed in the centre with a solid maroon background (figure 1). Three adult Caucasian male PWS provided the speech samples, with each producing one stuttering and one fluent speech sample. When recording, the speakers verbally produced the texts displayed in a teleprompter close to the camera. They were instructed to gaze toward the camera and limit their head movement while reading. Their stuttering speech samples contained ostensible primary and secondary stuttering behaviours (e.g., sound prolongation, syllabic repetition, eyes blinking, lip tremor, tongue protrusion, etc.). Their fluent speech was induced through repeated practise and/or by using an in-the-ear altered auditory feedback device. From each recording of the speech sample, a 60-s segment was selected to serve as the stimuli. Two graduate student clinicians independently rated the fluency of each speech sample using the Stuttering Severity Instrument, Fourth Edition (Riley 2009). One stuttering sample was rated as moderate and the two other stuttering samples as severe, with average scores of 26.5, 31.5 and 33.0, respectively. The ratings of stuttering severity from the two students were highly correlated (Pearson's r= 0.99). On the same scale, students reported that all three fluent speech samples sounded natural and did not have stuttering behaviours.

Look zone configuration. Squares represent the look zones of the eyes, nose and mouth, respectively. Other parts of the image, including other facial features, shoulders and background, are categorized into the region of interest (ROI) of outside.
The six video clips were arranged in a counterbalanced order with the diagram-balanced Latin square design (Wagenaar 1969) and burned onto six DVDs so that each DVD contained a different presentation order of the six videos.
Participants’ gaze behaviours were measured using an eye-tracking device, ASL model D6 (Applied Science Laboratories, Bedford, MA, USA). This desktop-mounted eye-tracking device consists mainly of a control unit and a remote camera with a video head tracker. It uses ‘bright pupil’ technology to capture listeners’ eye gaze: the camera projects infrared light into a participant's eye, receives the light reflection and transmits the data to the control unit, which calculates the centres of the pupil and corneal reflection and determines the gaze point (i.e., where on the display screen the participant is looking at). The video head tracker provides compensation for head movement. In our laboratory the eye-tracking device was connected to two IBM-compatible personal computers (Dell model GX280). Computer 1 ran the software GazeTracker 8.0 (Eye Responses Technologies, Inc., Charlottesville, VA, USA), which presented the video signals on a 19-inch LCD display of 1024 × 768 pixels (Dell model 1905FP; display 1) and delivered the audio signals to a pair of Harman-Kardon multimedia speakers located at each side of display 1. The remote camera of ASL model D6 was placed under the bottom of display 1. Computer 2 ran the software EyeTrac 6 (Applied Science Laboratories) to control the eye-tracking device, including the control unit and the camera.
Participants’ gaze point data (i.e., three-dimensional coordinates on display 1) were acquired from their left eye at a sampling rate of 60 Hz. These data were later sent to computer 1 to be overlaid as a set of crosshairs on the stimuli for offline analysis with GazeTracker.
Procedure
The research protocol was approved by the University Medical Center Institutional Review Board of East Carolina University. Participants were briefed about the study and signed the informed consent form. They then completed a questionnaire survey regarding their perceptions toward a hypothetical PWS and a hypothetical fluent speaker. Afterwards, they were seated in front of display 1 at a distance of about 24 inches (i.e., 60 cm) with their eyes approximately level with the centre of display 1. The experimenter instructed them to watch the presented stimuli while keeping their head stable. The video head tracker was activated to compensate for their head movement. Prior to stimuli presentation, a nine-point calibration routine was performed with participants’ left eye to make sure the captured eye gaze point coordination fitted with the target point on the screen. Afterwards, listeners’ eye gaze behaviours were recorded whereas the stimuli were presented with an inter-stimuli interval of 5 s, when display 1 turned black. Participants then completed another questionnaire about their perceptions toward PWS; the results of the two questionnaires are reported by Zhang (2010). To ensure the validity of the recorded signal, the experimenter let the participants examine the recorded eye gaze behaviours as indicated by crosshairs embedded into the stimuli video segments; manual shifting was conducted when participants pointed out shifting of the recorded gaze points.
Analysis
For each speaker, three mutually exclusive, static rectangular look zones, pertaining to the eyes, nose and mouse, were manually defined with GazeTracker (figure 1 and table 1). For the same speaker, whether he was producing a stuttering or fluent speech sample, these look zones were of the same shape and area. Therefore, four regions of interest (ROIs) were marked: eyes, nose, mouth and outside (i.e., areas other than the eyes, nose and mouth, including the background, neck, ears, hair, etc.), and these ROIs were independent of the speaker's fluency status.
Speaker | Eyes | Nose | Mouth | ||||||
---|---|---|---|---|---|---|---|---|---|
Width | Height | Area | Width | Height | Area | Width | Height | Area | |
1 | 260 | 90 | 23 400 | 150 | 100 | 15 000 | 150 | 60 | 9000 |
2 | 240 | 85 | 20 400 | 160 | 90 | 14 400 | 160 | 60 | 9600 |
3 | 250 | 90 | 22 500 | 160 | 90 | 14 400 | 160 | 60 | 9600 |
- Note: The above information is based on a 1024 × 768 pixel display. For each speaker, the look zones of the eyes, mouth and nose were of the same shape and area, whenever the speaker was speaking fluently or stuttered. Unit: pixel.
Three measures were selected to analyse listeners’ gaze responses during the stimuli presentation: per cent of time spent on the ROI (PT), gaze fixation count on the ROI (FC), and average fixation duration on the ROI (AFD). PT was based on the recording of gaze point, and the other two on gaze fixations. Gaze fixation is a more meaningful measure of visual attention than gaze point (Applied Sciences Laboratories 2008), and was defined by the GazeTracker software run on computer 1 by three criteria: a minimum of three gaze points; a minimum of 0.2 s; and the diameter of the circle surrounding the gaze points, in pixels, not exceeding 40 (Applied Sciences Laboratories 2008). Roughly, PT was correlated with the product of FC and AFD.
First, each measure was averaged across the stuttering or fluent speech samples for each ROI (e.g., eyes, mouth, nose and outside); the data were then normalized for inferential analysis: PT data underwent an arcsine transformation; FC and AFD data square root transformation (Moore and McCabe 2002). The transformed gaze response data were considered as the functions of culture (e.g., group) and fluency of the speaker (e.g., fluent or stuttering). Each type of data was examined using repeated-measures analyses for each ROI with PASW for Windows (Version 17).
Results
Means and standard errors of PT, PT and AFD are displayed in table 2.
Groups | Region of interest (ROI) | |||||||
---|---|---|---|---|---|---|---|---|
Eyes | Mouth | Nose | Outside | |||||
Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | |
African-American | 36.51 (7.78) | 37.83 (8.28) | 18.44 (5.84) | 27.31 (6.71) | 21.53 (2.42) | 18.97 (2.43) | 33.55 (4.23) | 25.99 (3.35) |
Chinese | 24.39 (7.78) | 27.09 (8.28) | 35.40 (5.84) | 32.91 (6.71) | 18.37 (2.42) | 18.91 (2.43) | 26.54 (4.23) | 24.69 (3.35) |
European-American | 49.53 (7.78) | 49.57 (8.28) | 20.28 (5.84) | 25.24 (6.71) | 28.65 (2.42) | 29.35 (2.43) | 17.09 (4.23) | 14.09 (3.35) |
- Note: Data were square-root transformed.
Per cent of time
Repeated-measures ANOVAs indicated that fluency had a significant effect on the ROIs of mouth and outside: for the mouth, F(1, 51) = 4.716, p= 0.035, η2= 0.09, φ= 0.57; for the outside, F(1, 51) = 5.733, p= 0.020, η2= 0.10, φ= 0.65. These results indicated that compared with fluent speech, stuttering speech attracted more gaze time from listeners on the speaker's mouth, and simultaneously reduced their time gazing on the speaker's hairs, neck, background, etc.
Group was found to be a significant effect on the nose, F(2, 51) = 7.566, p= 0.001, η2= 0.23, φ= 0.93. The group differences in the PT on the eyes and outside were not significant. However, post-hoc pair-wise comparisons with least-significant-difference (LSD) adjustment indicated that, compared with European-Americans, Chinese participants spent less time on a speaker's eyes (p= 0.039) and nose (p < 0.001), but more time on the outside (p= 0.031); African-American participants spent less time on a speaker's nose (p= 0.011) compared with European-American participants.
The interaction of fluency by group showed a significant effect on the mouth, F(2, 51) = 4.867, p= 0.012, η2= 0.16, φ= 0.78. This result indicated that when listening to stuttering speech compared with fluent speech, both groups of American participants increased, whereas Chinese participants reduced their gaze time on a speaker's mouth.
Average fixation duration
On average, listeners’ gaze fixations on the speaker's eyes were shorter when listening to the stuttering speech compared with the fluent speech.
Both fluency and group showed significant effects on the eyes: for fluency, F(1, 51) = 4.570, p= 0.037, η2= 0.08, φ= 0.56; for group, F(2, 51) = 3.394, p= 0.041, η2= 0.12, φ= 0.61. Group was also significant on the nose and outside: for the nose, F(2, 51) = 3.382, p= 0.042, η2= 0.12, φ= 0.61; for the outside, F(2, 51) = 3.187, p= 0.050, η2= 0.11, φ= 0.58. Post-hoc pair-wise comparisons with LSD adjustment demonstrated that, compared with European-Americans, Chinese participants showed significantly shorter AFD on the eyes (p= 0.016) and nose (p= 0.017). Compared with African-Americans, Chinese participants showed significantly longer AFD on the outside (p= 0.021).
The interaction of group by fluency was found to be non-significant.
Fixation count
No significant effects of fluency, group or their interaction were found. However, post-hoc pair-wise comparisons with LSD adjustment indicated that Chinese participants showed significantly more fixations on the mouth (p= 0.037) than African-Americans.
Groups | Region of interest (ROI) | |||||||
---|---|---|---|---|---|---|---|---|
Eyes | Mouth | Nose | Outside | |||||
Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | |
African-American | 4.72 (0.70) | 4.98 (0.70) | 2.90 (0.55) | 3.32 (0.55) | 4.04 (30) | 4.00 (0.29) | 2.58 (0.36) | 2.24 (0.34) |
Chinese | 3.95 (0.70) | 3.95 (0.70) | 4.77 (0.55) | 4.45 (0.55) | 4.13 (0.30) | 4.15 (0.29) | 3.08 (0.36) | 2.98 (0.34) |
European-American | 4.92 (0.70) | 4.91 (0.70) | 3.30 (0.55) | 3.55 (0.55) | 4.66 (0.30) | 4.67 (0.29) | 2.55 (0.36) | 2.17 (0.34) |
- Note: Data were square-root transformed.
Discussion
To the limit of the authors’ knowledge, this experiment was the first to study the cultural impact on listeners’ eye gaze responses to stuttering speech relative to fluent speech. Significant effects were found in some measures of listeners’ gaze behaviours regarding speakers’ fluency status, listeners’ cultural backgrounds, and the interaction of speakers’ fluency status and listeners’ cultural background.
First, this study demonstrated how fluency made a difference in listeners’ gaze responses. The differences were mainly twofold: (1) listeners had a shorter fixation duration on the eyes of the stuttering speaker relative to the fluent speaker; and (2) when the speaker stuttered, listeners increased their gaze time on the speaker's mouth, and reduced gaze time on the background. The first change in listeners’ gaze behaviours might be interpreted as gaze aversion, which is frequently observed in listeners when having face-to-face conversation with PWS (van Riper 1982, Kamhi 2003). However, judged by the similar amount of gaze fixations listeners spent on the speaker's eyes, it seemed that listeners tried to maintain the same pattern of gaze responses on the eyes of both stuttering and fluent speakers, but they could not maintain the same duration. This interpretation is consistent with DePaulo and Friedman (1998), who suggested that listeners consciously self-regulate their non-verbal responses to meet with their communication norms. Furthermore, listeners’ shorter gaze fixation durations on the stuttering speaker's eyes provided support to the notion that stuttering episodes might be emotionally or cognitively overloaded for the perceivers (Kamhi 2003). This information would be useful for charity groups and self-help groups for stuttering regarding their advice for the general public. For example, the advice to ‘maintain natural eye contact’ (Stuttering Foundation of America 2001) could be more specific and suggest listeners maintain a usual gaze fixation duration, since the current study indicated that listeners tend to maintain a similar amount of gaze fixations.
Groups | Region of interest (ROI) | |||||||
---|---|---|---|---|---|---|---|---|
Eyes | Mouth | Nose | Outside | |||||
Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | Fluent | Stuttering | |
African-American | 0.70 (0.08) | 0.74 (0.11) | 0.72 (0.06) | 0.82 (0.07) | 0.77 (0.04) | 0.83 (0.07) | 0.63 (0.06) | 0.61 (0.07) |
Chinese | 0.58 (0.08) | 0.52 (0.11) | 0.78 (0.06) | 0.78 (0.07) | 0.68 (0.04) | 0.69 (0.07) | 0.64 (0.06) | 0.64 (0.07) |
European-American | 0.82 (0.08) | 0.79 (0.11) | 0.67 (0.06) | 0.69 (0.07) | 0.76 (0.04) | 0.80 (0.07) | 0.68 (0.06) | 0.58 (0.07) |
- Note: Data were square-root transformed.
The second change in listeners’ gaze behaviours, that they reduced their fixation time on the background and increased their time on the speaker's mouth, probably indicated a shift of the listeners’ attention. That is, when the speaker stuttered, listeners were attracted to the locus of the stuttering behaviours: the mouth. It seemed natural for listeners to shift attention like this when perceiving the ostensible, deviant and struggling behaviours of stuttering in the speakers. An alternative explanation is related to the degradation of speech signals in the stuttering speech. Vatikiotis-Bateson et al. (1998) observed that listeners increased gaze fixations on the speaker's mouth when the speech was coupled with masking noise. They suggested this was because listeners needed to counteract the acoustic signal degradation with the visual input (e.g., the McGurk effect; McGurk and MacDonald 1976). Therefore, since stuttering speech inevitably adds noise to the speech signal by prolonging sounds and repeating syllables and words, listeners need to spend more time looking at the speaker's mouth to seek compensation. Because of the social meaning of making eye contact, it might be interesting to divide listeners into two groups: those who showed significant gaze diversion from the speaker's eyes to the mouth; and those who maintained their gaze on the eyes. Listeners’ gaze change (i.e., when observing fluent speech relative to stuttering speech) was computed using an index of arcsine transformed PT on the speaker's eyes relative to the mouth. For African-Americans, 13 of 18 (72%) participants increased their gaze time on the speaker's mouth when the speaker stuttered; for Chinese, 11 of 18 (61%); and for European-Americans, 10 of 18 (56%). These data suggested that in real-life scenarios of face-to-face communications that involve a PWS, some listeners might seem ‘nice’ by maintaining their gaze on the PWS's eyes, whereas many may seem ‘rude’ because of their gaze diversion when the speaker stutters. Encounters with seemingly ‘rude’ listeners may be considered as discouraging, penalizing for a PWS (Kalinowski and Saltuklaroglu 2006); However, the possibility may remain high. Therefore, it is important to inform clinicians who work with PWS and PWS themselves about listener behaviours so they have an evidence-based understanding of listener perception and listener responses to stuttering, foster a realistic expectation of listener responses (e.g., many listeners shift their gaze from a PWS's eyes to mouth when he/she stutters not because of their rudeness, but probably due to a compensation for acoustic noise), and focus on strategies that could enhance PWS's communication efficiency and efficacy.
Secondly, this study provided empirical evidence that listeners from different cultural backgrounds responded to stuttering in different manners. The difference was most evident in the gaze time spent on the speaker's mouth. The two American groups of listeners increased, whereas Chinese listeners reduced, their gaze time on the speaker's mouth when he stuttered. Compared with the Chinese and European-American groups, African-American participants showed the greatest increase in their gaze fixation time on the mouth in reactions to stuttering, contributed by increases in both fixation number and fixation duration. It is unclear why these listener groups responded in this way; further research is warranted. Listeners’ cultural background might be involved in the process because of the role of culture in regulating listeners’ attention, affection and behaviour to stuttering (Battle 2002). For example, the oral tradition of African-American culture gives a high appraisal to abilities to produce a continuous verbal utterance and to remain emotionally controllable (Terrell and Jackson 2002). The overt stuttering manifestation seems one of the greatest deviations from such cultural preferences, and may deserve more attention from African-American listeners. For the Chinese listeners, their linguistic background may pose difficulties for them to understand oral English, and alter their gaze responses when the speaker's intelligibility degraded by stuttering episodes. Cultural variation in listener gaze responses could be underlined for public education.
Thirdly, this study demonstrated that listeners’ eye gaze responses were similar between the two American groups, but different between the Chinese and the Americans. Specifically, relative to European-Americans, Chinese listeners spent more time on the speaker's background and less on the eyes and nose. Compared with African-Americans, Chinese listeners had longer fixations on the background, and more fixations on the mouth. In comparison, the two American groups of participants showed similar responses, with the only significant difference being that African-Americans spent less time on the speaker's nose. These findings were generally consistent with previous studies comparing visual processing in Easterners and Westerners, where African-Americans were oftentimes included as Westerners (Nisbett et al. 2001, Nisbett 2003, Nisbett and Miyamoto 2005). Results from the current experiment validated the findings of Chua et al. (2005) and Boduroglu et al. (2009) that Chinese individuals showed a much broader visual attention, whereas Americans focused on focal features (e.g., eyes, mouth) of the image. The current study also corroborated the work of Rayner et al. (2007) that Chinese individuals showed shorter fixation duration in face perception compared with Westerners.
Limitations of the current study should be taken into consideration when extrapolating the results. Firstly, when recording, the speakers were instructed to look at the camera and limit their head movement. This effort was meant to reduce secondary stuttering behaviours such as head jerking and gaze aversion, so as to draw relatively stable, comparable ROIs regarding the speaker's eyes and mouth whether the speaker stuttered or not. However, it probably decreased the severity of the physical concomitant of stuttering, and consequently reduced the intensity of listeners’ responses. In real-life face-to-face communications, the stuttering speakers, especially the severe ones, might show more frequent, intense and abnormal head movement, and more frequent gaze aversions (Jensen et al. 1986) relative to the speakers in this study, and listeners’ behavioural responses could be reasonably expected to be more intense and evident. Secondly, the stimuli contained both fluent and stuttering speech segments from the same speakers. This arrangement was intended to reduce systemic bias caused by variations in stuttering and fluent speakers’ facial features and voice qualities. In so doing, one problem was introduced. For example, when listeners first saw a speaker stuttered severely in one speech sample, and then spoke with normal fluency in another, they might become confused and allocate more attention to listening. This problem might have reduced the difference between listeners’ responses to stuttering and fluent speech samples. A possible solution will be to present different speech samples (i.e., stuttering or fluent) of the same speakers to different listener groups and measure their gaze responses. Thirdly, great individual variations were witnessed in listeners’ gaze responses to stuttering. This seems to be inherent of the eye gaze responses, probably contributed by both individual characteristics (Ellsworth and Ludwig 1972, Argyle and Cook 1976) and cultural background (Chua et al. 2005, Rayner et al. 2007, Blais et al. 2008). One may also argue that because the Chinese participants had lived in the United States for a certain period, and their values, behaviours and emotional responses may have been altered (Tsai 1999), their eye gaze responses may not be representative of the Chinese population. However, previous research suggested that one's eye gaze pattern is established before reaching adulthood (Argyle and Cook 1976, Phan et al. 2010). Therefore, because these Chinese participants left China after they were 18 years of age, a change of cultural background would not have a major impact on their gaze pattern.
Future studies will compare gaze responses in listeners from other major cultural backgrounds (e.g., Japanese, Mexicans, Africans, etc.). Of special interest will be the gaze responses in normally developing children relative to adult listeners, the parents of children who stutter relative to the parents of normally fluent children, and listeners who stutter relative to normally fluent listeners. A number of permutations may be achieved with controls of the speaker's stuttering severity, non-verbal stuttering behaviours and presentation channels (e.g., video only, audio only, and video and auditory). In addition, the gender of either speakers or listeners could play an interesting role in the dyadic communication: females may have less negative perceptions toward PWS than males (Burley and Rinaldi 1986), though findings may be inconsistent (Patterson and Pring 1991), and might differ from males in their eye gaze behaviours (Bayliss et al. 2005), These future studies would provide better descriptions of the dynamics between PWS and listeners, provide clues of the origin of listeners’ negative perceptive and behavioural responses to PWS, shed light on PWS's development of stuttering behaviours along with their regulative efforts of their disorder, and help charity groups and self-help groups for stuttering to offer more concrete advice to children and both male and female adults regarding the optimal communication manners with PWS.
Acknowledgements
Declaration of interest: The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.