ArticlePDF Available

Towards a Native OPERA Hypothesis: Musicianship and English Stress Perception

Authors:

Abstract and Figures

Musical experience facilitates speech perception. French musicians, to whom stress is foreign, have been found to perceive English stress more accurately than French non-musicians. This study investigated whether this musical advantage also applies to native listeners. English musicians and non-musicians completed an English stress discrimination task and two control tasks. With age, non-verbal intelligence and short-term memory controlled, the musicians exhibited a perceptual advantage relative to the non-musicians. This perceptual advantage was equally potent to both trochaic and iambic stress patterns. In terms of perceptual strategy, the two groups showed differential use of acoustic cues for iambic but not trochaic stress. Collectively, the results could be taken to suggest that musical experience enhances stress discrimination even among native listeners. Remarkably, this musical advantage is highly consistent and does not particularly favour either stress pattern. For iambic stress, the musical advantage appears to stem from the differential use of acoustic cues by musicians. For trochaic stress, the musical advantage may be rooted in enhanced durational sensitivity.
Content may be subject to copyright.
Running head: MUSIC-TO-LANGUAGE TRANSFER 1
Towards a Native OPERA Hypothesis:
Musicianship and English Stress Perception
William Choi
Academic Unit of Human Communication, Development, and Information Sciences,
The University of Hong Kong
Choi, W. (2021). Towards a native OPERA hypothesis: Musicianship and English stress
perception. Language and Speech. Advance online publication.
doi:10.1177/00238309211049458.
Address for correspondence: Room 765, Meng Wah Complex, The University of Hong Kong,
Pokfulam, Hong Kong; willchoi@hku.hk
MUSIC-TO-LANGUAGE TRANSFER 2
Abstract
Musical experience facilitates speech perception. French musicians, to whom stress is foreign,
have been found to perceive English stress more accurately than French non-musicians. This
study investigated whether this musical advantage also applies to native listeners. English
musicians and non-musicians completed an English stress discrimination task and two control
tasks. With age, non-verbal intelligence and short-term memory controlled, the musicians
exhibited a perceptual advantage relative to the non-musicians. This perceptual advantage was
equally potent to both trochaic and iambic stress patterns. In terms of perceptual strategy, the two
groups showed differential use of acoustic cues for iambic but not trochaic stress. Collectively,
the results could be taken to suggest that musical experience enhances stress discrimination even
among native listeners. Remarkably, this musical advantage is highly consistent and does not
particularly favour either stress pattern. For iambic stress, the musical advantage appears to stem
from the differential use of acoustic cues by musicians. For trochaic stress, the musical
advantage may be rooted in enhanced durational sensitivity.
Keywords: Music-to-language transfer, stress, music, pitch, rhythm, OPERA
MUSIC-TO-LANGUAGE TRANSFER 3
Towards a Native OPERA Hypothesis:
Musicianship and English Stress Perception
Long-term musical experience facilitates speech perception (Patel, 2011; 2014). Research
has frequently shown that musicians are better able to perceive lexical tones than non-musicians
(e.g., Alexander, Wong, & Bradlow, 2005; Choi, 2020; Kraus & Chandrasekaran, 2010; Zheng
& Samuel, 2018). These findings underpin contemporary theories of cross-domain transfer, the
most notable of which is the OPERA hypothesis (Patel, 2011; 2014). Unfortunately, most
research has only focused on lexical tone perception among non-native listeners. This has led to
research gaps concerning whether the OPERA hypothesis applies to native speech perception
and, more specifically, to other prosodic features such as stress. To ascertain the generalisability
of the OPERA hypothesis, this study investigates (i) whether English musicians outperform
English non-musicians on English stress discrimination. To provide additional insight into
music-to-language transfer, this study further examines (ii) whether the musical advantage is
selective about stress pattern, and (iii) the means by which musical experience enhances native
stress discrimination.
The OPERA hypothesis proposes that musical experience facilitates speech encoding
when five conditions are met: the neural networks for music and speech must overlap
anatomically (Overlap), and the music activities must entail more precise acoustic processing
than speech (Precision), bring about strong positive emotion (Emotion), repeat frequently
(Repetition), and require focused attention (Attention) (Patel, 2011; 2014). Patel’s hypothesis is
well supported by empirical studies, most of which have shown a musical advantage in lexical
tone perception (see Choi, 2020). For example, English musicians identified and discriminated
Mandarin tones more accurately than did English non-musicians (Alexander et al., 2005). Quite
MUSIC-TO-LANGUAGE TRANSFER 4
surprisingly, the English musicians even performed on a par with native Mandarin listeners.
English musicians’ perceptual advantage in Mandarin tone discrimination was also evident at the
phrase level (Zheng & Samuel, 2018). Neurophysiologically, French musicians also showed a
larger P3b response to Mandarin tonal and segmental variations than French non-musicians
(Marie, Delogu, Lampis, Belardinelli, & Besson, 2011).
In addition to the apparent acoustic similarities between lexical tones and musical pitch,
stress patterns coincide with the metrical structures in music (Henrich, Alter, Wiese, & Domahs,
2014; Palmer & Kelly, 1992; Patel, 2003; Lerdahl, 2001; see Gandour, 1981; Tong, Choi, &
Man, 2018 for lexical tones). Lexical stress is the relative prominence assigned to a certain
syllable in a word (Teschner & Whitley, 2004). In English, stressed syllables are typically
associated with higher fundamental frequency (f0), longer duration, and higher intensity (Choi,
2021a; Choi, Tong, & Samuel, 2019; Choi, Tong, & Singh, 2017; Fry, 1958; Wang, 2008; Yu &
Andruski, 2010). Unstressed syllables typically exhibit a vowel quality change (e.g., the second
vowel in harmony /ˈhɑməni/ is reduced to /ə/), although this is not necessarily the case (e.g., the
second vowel in import /ˈɪmport/ is not reduced). Similar to English speech, music is
characterised by repeated sequences of stressed and unstressed beats (rhythm; Toussaint, 2005).
Analyses of English- (with stress) and French- (without stress) composed music revealed that the
metrical structures paralleled the composers’ spoken language (Patel & Daniele, 2003a; 2003b).
This finding, together with the commonalities between lexical stress and musical rhythm, give
rise to the possibility of cross-domain transfer between these two features.
Indeed, musical experience facilitates English stress perception among French listeners.
Unlike English, French does not use lexical stress contrastively (it is a fixed stress language;
Garde, 1968). In an AX discrimination task, French listeners could discriminate lexical stress
MUSIC-TO-LANGUAGE TRANSFER 5
contrasts with a very low error rate (3%; Dupoux, Pallier, Sebastian, & Mehler, 1997). In an
event-related potential study, French listeners also showed a mismatch negativity (MMN)
response to stress violations, reflecting pre-attentive stress discrimination (Aguilera, El Yagoubi,
Espesser, & Astésano, 2014). The lack of MMN response in the reverse oddball task further led
to the claim that French listeners had long-term memory traces of stress. Although French
listeners are behaviourally and neurophysiologically sensitive to stress, they struggle to perceive
stress at more abstract perceptual levels. Specifically, French listeners recall stress sequences
with very high error rates (49% and 73%; Dupoux, Peperkamp, & Sebastian-Galles, 2010). Of
direct relevance to the current study is that French listeners’ difficulties in recalling stress
sequences could be mitigated by musical experience (Kolinsky, Cuvelier, Goetry, Peretz, &
Morais, 2009). In particular, French musicians were better able to recall stress sequences than
French non-musicians. This musical advantage was evident at all sequence lengths, which
suggested enhanced perception rather than increased memory span.
Considering the above findings in light of the OPERA hypothesis, musical experience
does facilitate non-native speech perception (e.g., Alexander et al., 2005; Kolinsky et al., 2008;
Zheng & Samuel, 2018; cf. Schellenberg, 2015). Here, a critical question arises as to whether the
OPERA hypothesis is also applicable to native speech perception. Subcortically, English
musicians were more sensitive to English consonantal changes (/ba/ /da/ and /ga/) than English
non-musicians (Parbery-Clark, Tierney, Strait, & Kraus, 2012). In terms of speech prosody,
French musicians exhibited a larger P200 response to metrical violations in naturally produced
French (Marie, Magne, & Besson, 2010). Collectively, these results offer some support to the
notion that musical experience facilitates native speech perception. The current study
hypothesises that English musicians discriminate English stress more accurately than do English
MUSIC-TO-LANGUAGE TRANSFER 6
non-musicians. English stress is chosen not only because of the formerly established non-native
musical advantage but also because English stress sensitivity contributes to reading
comprehension among English children (Holliman, Wood, & Sheehy, 2010; 2012; Kolinsky et
al., 2009).
Another way to extend the OPERA hypothesis is to examine the selectivity of musical
advantage. In a recent study, English musicians and non-musicians completed a Cantonese tone
discrimination task and a Cantonese tone sequence recall task (Choi, 2020). In both tasks, the
musicians outperformed the non-musicians only in half of the tonal contexts. This reflected that
musical experience only facilitated the perception of certain Cantonese tones. English stress
contains trochaic and iambic stress patterns (Ladd, 2008). In a trochaic stress pattern, a stressed
syllable precedes an unstressed one (e.g., CAmel) and vice-versa for an iambic stress pattern
(e.g., caNAL). Relative to the trochaic stress pattern, the iambic stress pattern is less common
and acquired later by English infants (Cutler, 2014; Jusczyk, Cutler, & Redanz, 1993). Thus, it is
possible that the musical advantage is more pronounced for iambic than trochaic stress patterns.
The current study tests this hypothesis.
A further goal of this study is to explore the means by which musical experience
facilitates native stress discrimination. One possibility is that musical experience alters listeners’
choice of acoustic cues, for which support is drawn from a tone perception study (Choi, 2020). In
the high-rising tone context for which musical advantage was shown, English musicians and
non-musicians attended to different acoustic cues (i.e., f0 contour and f0 onset, respectively).
However, in the low-rising tone context for which musical advantage was absent, the two groups
attended to the same acoustic cues (i.e., F0 contour). As mentioned above, stress is signalled by
f0, duration, and intensity (Choi et al., 2017; 2019; Fry, 1958; Wang, 2008; Yu & Andruski,
MUSIC-TO-LANGUAGE TRANSFER 7
2010). It is possible that English musicians and non-musicians attend to different acoustic cues
for stress discrimination. It is also possible that musicians and non-musicians attend to the same
acoustic cues but with different relative weights assigned to each. Drawing on parallels with
cross-linguistic research, Russian and English listeners attended to the same set of acoustic cues
for English stress perception (Chrabaszcz, Winn, Lin, & Idsardi, 2014). However, the Russian
and English listeners showed different weighting patterns among f0, duration, and intensity cues:
the f0 cue was weighted most heavily by the English listeners (f0 > intensity > duration) but least
heavily by the Russian listeners (intensity > duration > f0). Based on the above findings, it is
possible that musical experience drives listeners to rely on a different set of acoustic cues or to
rely differently on the same set of acoustic cues for English stress discrimination.
The main theme of this study is music-to-language transfer. In the literature, correlational
and intervention designs have been frequently adopted. Correlational studies compare musicians
and non-musicians on variables of interest, such as lexical tone sensitivity (e.g., Alexander et al.,
2005; Choi, 2020; Kolinsky et al., 2009; Zheng & Samuel, 2018). As the groups are pre-defined,
the correlational design guarantees that musicians have many years of musical experience. This
is particularly useful for studying cross-domain transfer, as long-term musical experience
induces more prominent plastic changes than does short-term musical experience (Patel, 2011;
2014). However, the standard caveat of correlational design is weak causal inference (see
Corrigall, Schellenberg, & Misura, 2013; Schellenberg, 2015). Intervention studies typically
involve two or three groups, each of which receives music training, music-irrelevant training, or
no training (e.g., Moreno, Marques, Santos, Santos, Castro, & Besson, 2009; Nan et al., 2018).
Clearly, this design permits a stronger causal inference. Nevertheless, laboratory training only
lasts for weeks or months so this design reduces the possibility of studying the long-term effect
MUSIC-TO-LANGUAGE TRANSFER 8
of musical experience. Correlational and intervention designs have their own merits and
limitations, which makes both types of research necessary. As long-term musical experience is
crucial for music-to-language transfer, the current study adopts a correlational design as a first
step.
The overarching goal of this study is to investigate (1) whether English musicians exhibit
a perceptual advantage in English stress discrimination. To elucidate the potential musical
advantage, this study further examines (2) whether the musical advantage is selective about
stress patterns, and (3) the means by which musical experience enhances English stress
discrimination. Given the possible influence of non-verbal intelligence and short-term memory
on English stress perception, these two constructs were controlled (Choi et al., 2019; see also
Asaridou, Hagoort, & McQueen, 2015; Bidelman, Hutka, & Moreno, 2013; Hutka, Bidelman, &
Moreno, 2015). To this end, participants were also tested on non-verbal intelligence and short-
term memory. To minimise testing time, I adopted two tasks that could provide quick and
reliable estimates of the above constructs among English listeners (Choi, 2020; Choi et al., 2019;
Zheng & Samuel, 2018).
Methods
Participants
Forty native English listeners were recruited at University College London through an
online participant recruitment system. Based on the criteria adopted in previous studies (Choi,
2020; 2021b; Tong et al., 2018), the listeners were assigned to the musician (n = 20) and non-
musician (n = 20) groups. All musicians had received at least seven years of continuous music
training and were able to play their instruments at the time of testing. All non-musicians had
MUSIC-TO-LANGUAGE TRANSFER 9
received no more than two years of music training, if any. None of them had received any music
training in the recent five years and were unable to play any musical instrument at the time of
testing. Two non-musicians and one musician were excluded from the study due to no-show,
excessive music training (non-musician), and Mandarin learning experience. Thus, there were 19
musicians (5 male, 14 female; Mage = 26.63 years, SD = 5.89 years) and 18 non-musicians (8
male, 10 female; Mage = 32.67 years, SD = 11.60 years) in the final sample.
Table 1 summarises the musical experience of the musicians. On average, the musicians
had received 11.63 years of music training (SD = 3.90 years) with a mean onset age of 7.84 years
(SD = 2.89 years). The non-musicians had received 0.90 year of music training (SD = 1.56
years). For the non-musicians who had received music training, their mean onset age of music
training was 12.00 years (SD = 4.86 years). None of the participants in the study reported having
absolute pitch.
English Stress Discrimination Task
Stimuli. Four pairs of real English words, /ˈpɚmɪt - pɚˈmɪt/, /ˈsəspekt - səsˈpekt/, /ˈɪnsɚt
- ɪnˈsɚt/, and /ˈimpɔrt - imˈpɔrt/ (permit, suspect, insert, import) were recorded at a sampling rate
of 48 kHz. All stimuli were naturally produced by two native English speakers (one male and
one female). The recording was made in a sound-shielded booth.
Material Presentation. An AX paradigm was adopted. In each trial, two real words were
audibly presented via Sennheiser HD280 PRO headphones. The inter-stimulus interval was 600
ms. The two real words either carried the same (e.g., /ˈɪnsɚt - ˈɪnsɚt/) or different stress (e.g.,
/ˈɪnsɚt - ɪnˈsɚt/). To prevent the listeners from adopting an ad-hoc acoustic strategy, the two real
MUSIC-TO-LANGUAGE TRANSFER 10
words in each trial were produced by speakers of different genders. The voice order was random
within each trial.
Procedure. Listeners were asked to judge, as quickly as possible, whether the two real
words carried the same stress. They responded by pushing keyboard buttons ([f] for same, [j] for
different). The accuracy and response time were recorded for each trial. Prior to the experimental
trials, six practice trials with feedback were run. There were 96 trials (8 stimuli × 2 speaker
orders × 2 trial types × 3 repetitions). A sensitivity index (d’) was obtained based on the hits and
false alarms for the same and different trials (see Figure 1). The sample-specific reliabilities were
high (αmusicians = .87, αnon-musicians = .90). This task has also been used successfully to assess
English stress discrimination among English listeners in a previous study (Choi et al., 2019).
Non-verbal Intelligence Task
This task consisted of 14 multiple-choice questions, all of which required participants to
organise pictures by a logical sequence under time pressure. In each trial, participants were given
30 seconds to choose the picture that best completed the visual pattern described in the question.
One point was awarded for each correct answer. This task has been used successfully in previous
studies to assess English listeners’ non-verbal intelligence (Choi, 2020; Choi et al., 2019; Zheng
& Samuel, 2018). The sample-specific reliabilities were moderate to high (αmusicians = .54, αnon-
musicians = .79).
Short-term Memory Task
This computerised task consisted of a plate displayed at the centre of a touchscreen. The
plate contained four coloured (red, green, blue, and yellow) wedges. On each trial, a sequence of
colours (e.g., yellow-blue-red) was presented. Following the presentation, the participants were
MUSIC-TO-LANGUAGE TRANSFER 11
required to reproduce the colour sequence by tapping the corresponding wedges. One point was
awarded for each correctly reproduced sequence. The sequence length started at one and
increased by one after each correct response. The score started at zero and increased by one
following each correct response. For example, a participant who correctly reproduced up to eight
sequences would score eight in that round. Each participant completed five rounds, from which
the median score was obtained. This task has also been used successfully in previous studies to
assess English listeners’ short-term memory (Choi, 2020; Choi et al., 2019; Zheng & Samuel,
2018). As in these previous studies, the sound was turned off so that the measure was
independent of auditory short-term memory. The sample-specific reliabilities were satisfactory to
high (αmusicians = .65, αnon-musicians = .81).
Results
Musical Advantage in Stress Discrimination
To investigate whether the musicians exhibited a perceptual advantage in English stress
discrimination, a one-way analysis of covariance (ANCOVA) was conducted on d’ with group
(musician and non-musician) as the independent variable. Age, non-verbal intelligence, and
short-term memory were controlled (see Table 2; see also Appendix I). As expected, the
ANCOVA revealed a significant group difference, F(1, 32) = 9.62, p < .01, η2 = .23, in which the
musicians discriminated English stress more accurately than did the non-musicians (see Figure
2). Correlational analyses further showed that d’ correlated significantly with years of music
training, r(35) = .37, p < .05, but not with onset age of music training, p = .398. This suggests
that for English stress discrimination, the amount of music training received matters more than
the age at which music training started.
MUSIC-TO-LANGUAGE TRANSFER 12
Selectivity of Musical Advantage
To evaluate the selectivity of the musical advantage, a two-way mixed ANCOVA was
conducted on hit rate with stress type (iambic and trochaic) as the within-subject factor and
group (musician and non-musician) as the between-subjects factor. Age, non-verbal intelligence,
and short-term memory were also controlled. The ANCOVA revealed a significant main effect
of group, F(1, 32) = 5.94, p < .05, η2 = .16 (see Figure 3). However, the main effect of stress
type, p = .171, and the interaction between stress type and group, p = .822, were non-significant.
Consistent with the earlier analysis, a clear musical advantage was found. Remarkably, this
musical advantage was highly consistent and did not particularly favour either stress pattern.
In terms of response time, the two-way mixed ANCOVA showed non-significant main
effects of group, p = .822, and stress type, p = .789. Their interaction effect was also non-
significant, p = .607. An analysis of the mean response time across all 96 trials yielded consistent
results (see Appendix II). The lack of a group difference in response time testifies against a
speedaccuracy trade-off: the greater accuracy of the musicians over the non-musicians was not
because they had taken longer to respond.
Use of Acoustic Cues by Musicians and Non-musicians
All stimuli were analysed acoustically with Praat 6.0.50 (Institute of Phonetic Sciences,
University of Amsterdam, the Netherlands), yielding the set of acoustic parameters summarised
in Table 3 (see also Appendix III). For each stimulus, the f0, durational, and intensity ratios of
the first to second syllables were obtained (see Table 4).
To explore the use of acoustic cues by musicians and non-musicians, an acoustic
behavioural correlational analysis was conducted. All different trials (N = 48) were extracted
MUSIC-TO-LANGUAGE TRANSFER 13
from the dataset. Each different trial, as a single entry, contained eight variables: (1) the f0 ratio,
(2) durational ratio, and (3) intensity ratio of the trochaic stress stimulus; (4) the f0 ratio, (5)
durational ratio, and (6) intensity ratio of the iambic stress stimulus; (7) the trial-specific mean
accuracy of the musician group; and (8) the trial-specific mean accuracy of the non-musician
group.
Of interest to the study was whether the acoustic parameters (16) correlated with the
behavioural accuracies among the musicians and non-musicians (see Table 5). The mean
accuracies of the musicians correlated significantly with the f0 (r = .24, p < .05), durational (r =
-.38, p < .01), and intensity (r = .29, p < .05) ratios of the iambic stress stimuli. For the trochaic
stress stimuli, the mean accuracies of musicians correlated significantly with the durational ratio
(r = -.27, p < .05), but not with the f0 (p = .459) and intensity (r = .23, p = .062) ratios.
The mean accuracies of the non-musicians correlated significantly with the f0 (r = .46, p
< .01), durational (r = -.42, p < .01), and intensity (r = .28, p < .05) ratios of the iambic stress
stimuli. For the trochaic stress stimuli, the mean accuracies of non-musicians correlated
significantly with the durational ratio (r = -.29, p < .05), but not with the f0 (p = .361) and
intensity (r = .23, p = .055) ratios.
Taken together, both groups’ discriminatory abilities were related to (a) the degree of f0,
durational, and intensity variations among the iambic stress stimuli, and (b) the degree of
durational variations among the trochaic stress stimuli. For the trochaic stress stimuli, the
musicians and non-musicians attended mostly to duration. However, for the iambic stress
stimuli, the musicians attended mostly to duration whereas the non-musicians attended mostly to
f0 (see Table 5).
MUSIC-TO-LANGUAGE TRANSFER 14
Discussion
This study endeavoured to investigate (1) whether English musicians exhibit a perceptual
advantage in English stress discrimination, (2) whether the musical advantage is selective about
stress patterns, and (3) the means by which musical experience facilitates English stress
discrimination.
The core result was the presence of a musical advantage in English stress discrimination
among the English listeners. This fits the OPERA hypothesis well. In terms of precision, music
entails more precise metrical processing than speech. Provided that the other four conditions
(Overlap, Emotion, Repetition, and Attention) were met, musical experience enhanced the
English musicians’ sensitivity to English stress. As mentioned above, the OPERA hypothesis has
been widely applied to account for a musical advantage in non-native speech perception (e.g.,
Alexander et al., 2005; Choi, 2020; Kolinsky et al., 2009; Zheng & Samuel, 2018). Consistent
with the previous studies on native consonantal and metrical discrimination, the present result
supports the theoretical view that music-to-language transfer could also occur given relevant
linguistic experience (Marie et al., 2010; Parbery-Clark et al., 2012). Thus, it stands to reason
that the OPERA hypothesis applies to both non-native and native listeners. From a practical
perspective, this theoretical view points towards the potential use of music training to aid native
speech perception (e.g., Moreno et al., 2009; Vidal, Lousada, & Vigário, 2020). For example,
piano training enhanced Mandarin children’s behavioural sensitivities to Mandarin vowels and
neural sensitivities to Mandarin tones (Nan et al., 2018). For English children, English stress
sensitivity is essential for literacy development and poor readers often show deficits in stress
sensitivity (Holliman et al., 2010; 2012). Thus, with the musical advantage in English stress
perception now established, it is important to determine whether music training can improve
MUSIC-TO-LANGUAGE TRANSFER 15
English children’s stress perception. Interestingly, English stress sensitivity also contributes to
second language English literacy development among Cantonese children (Choi, Tong, & Cain,
2016; Choi, Tong, & Deacon, 2018). As such, it is also worthwhile to investigate whether
Cantonese musicians show a perceptual advantage in English stress perception.
Remarkably, the musical advantage identified herein was highly consistent across stress
patterns. It was originally believed that musical experience would exert differential effects on
trochaic and iambic stress perception. However, the results clearly showed that musical
experience did not particularly favour either stress pattern. This is in contrast to the recent
finding of a study on English listeners that musical advantage was selective about Cantonese
tones (Choi, 2020). These discrepancies may stem from acoustic or even linguistic differences.
Although Cantonese tones and English stress share f0 as a common acoustic cue, Cantonese
tones are signalled by f0 in a more fine-grained manner (Choi et al., 2019). Whereas Cantonese
tones are largely f0 variations, English stress has other acoustic cues, such as duration and
intensity (Choi, Tong, Gu, Tong, & Wong, 2017; Gandour, 1981; Ladd, 2008; Wang, 2008).
Conceivably, the differences in terms of the selectivity of musical advantage across Cantonese
tone and English stress perception might be due to the acoustic differences between the two
features. Hypothetically, the discrepancies might also arise from linguistic experience: Cantonese
tones were non-native to the musicians but English stress was native to them. As unlikely as this
may seem, future studies that include non-native listeners of English stress are needed to falsify
this hypothetical account.
Acoustically, the musicians appear to have adopted a different perceptual strategy for
iambic stress. Although the musicians and non-musicians attended to the same acoustic cues,
they relied on these acoustic cues differently. Specifically, non-musicians attended most heavily
MUSIC-TO-LANGUAGE TRANSFER 16
to f0 whereas musicians attended most heavily to duration. This is somewhat reminiscent of a
recent finding that, unlike non-musicians who attended to a less effective cue (f0 onset),
musicians attended to a more effective cue (f0 contour) for high-level tone perception (Choi,
2020). Indeed, for the iambic stress stimuli, stressed and unstressed syllables did not differ
significantly in f0, making it a less effective cue than duration and intensity. Considering the
OPERA hypothesis in the current context, it is possible that musical experience had orientated
the musicians to attend more heavily to more effective cues (duration and intensity), thereby
facilitating iambic stress perception. Future studies can further validate these findings by testing
stress perception across different acoustic conditions, e.g., f0-only, duration-only, and intensity-
only (Choi et al., 2019).
By contrast, the musicians and non-musicians adopted the same perceptual strategy for
trochaic stress. Specifically, the acousticbehavioural correlation analysis of trochaic stress
implied that the two groups attended mainly, if not only, to durational cues.
Neurophysiologically, changes in speech temporal structure elicited the P200 response among
musicians but not non-musicians (Marie et al., 2010). This suggests that musicians have stronger
automatic detection of syllable temporal structure than non-musicians. Based on the literature, it
is believed that long-term musical experience in discerning metrical structures sharpened the
English musicians’ sensitivity to duration (e.g., Skoe & Kraus, 2013). Thus, for trochaic stress,
one plausible explanation for the musical advantage is enhanced sensitivity to duration. This
proposed mechanism is also consistent with the OPERA hypothesis (Patel, 2011; 2014).
Collectively, musical advantage may stem from the differential use of acoustic cues (iambic
stress) and enhanced durational sensitivity (trochaic stress).
MUSIC-TO-LANGUAGE TRANSFER 17
In terms of the study’s theoretical contribution, the current findings have several
implications for the OPERA hypothesis. Most importantly, the musical advantage in native stress
discrimination converges with studies on native consonantal and metrical discrimination (Marie
et al., 2010; Parbery-Clark et al., 2012). Taken together, these findings suggest that the OPERA
hypothesis also applies to native speech perception. In contrast to a previous finding on the
selectivity of musical advantage, the musical advantage identified herein was highly consistent
(Choi, 2020). Crucially, the current study further adds that musical advantage is not necessarily
selective, and the OPERA hypothesis can be revised to account for this. Although the OPERA
hypothesis argues that musical experience increases neuronal sensitivity to speech, the present
and previous studies further suggest that musical experience may also alter listeners’ perceptual
strategy (Choi, 2020; see Patel, 2011). This points to a need for the OPERA hypothesis to
incorporate new elements on how musical experience orients musicians to different acoustic
cues.
In terms of the methodological contribution, this study has demonstrated that stress is a
potent feature for investigating cross-domain transfer. As mentioned in the Introduction, most
studies on cross-domain transfer have focused on lexical tone perception, presumably due to its
sharing of an acoustic cue (f0) with musical pitch (e.g., Choi, 2020; Cooper & Wang, 2012;
Marie et al., 2011; Zheng & Samuel, 2018). Crucially, stress has three acoustic correlates f0,
duration, and intensity that are used intensively for discerning metrical structures in music
(Henrich et al., 2014; Palmer & Kelly, 1992; Patel, 2003; Lerdahl, 2001). Indeed, the current
study has identified linkages between stress discrimination and musical experience, highlighting
a more potent candidate for studying music-to-language transfer and even language-to-music
transfer. In the latter direction, one interesting question is whether English non-musicians discern
MUSIC-TO-LANGUAGE TRANSFER 18
metrical structures more accurately than do French non-musicians, given the presence of stress in
English.
Readers are cautioned that this study is correlational. Like the preponderance of studies
on music-to-language transfer, this study cannot rule out the possibility of geneenvironment
interaction (e.g., Alexander et al., 2005; Choi, 2020; Cooper & Wang, 2012; Kolinsky et al.,
2009; Marie et al., 2011; Zheng & Samuel, 2018). Schellenberg (2015) argued that cognitive
abilities and socioeconomic status determine the likelihood of a child receiving music training.
More specifically, Corrigall and colleagues (2013) reasoned that high-functioning children from
high socioeconomic status families were more likely to take music lessons than other children.
As such, music training might only exaggerate pre-existing differences between musicians and
nonmusicians. This is contrary to a widely adopted premise that musicians and non-musicians
do not differ systematically prior to musical experience (e.g., Francois & Schön, 2011; Fitzroy &
Sanders, 2013; Shook, Marian, Bartolotti, & Schroeder, 2013). Despite the English musicians
and non-musicians matching on non-verbal intelligence and short-term memory, they may still
have differed in other respects, such as learning motivation and personality traits, some of which
are difficult to control for. Intriguingly, there are numerous reports that musicians exhibit a
memory advantage (George & Coch, 2011; Roden, Grube, Bongard, & Kreutz, 2014; Schulze,
Dowling, & Tillmann, 2012). It might be that the English musicians in the current sample did not
possess this advantage; it is also be possible that their cognitive difference was not captured by
the tasks. Ideally, each cognitive construct should have been measured with multiple tasks.
In conclusion, the present study has identified a musical advantage in native stress
discrimination. This finding adds to the body of evidence that musical experience facilitates
native speech perception, in turn suggesting that the OPERA hypothesis also applies to native
MUSIC-TO-LANGUAGE TRANSFER 19
listeners (Marie et al., 2010; Parbery-Clark et al., 2012). The musical advantage identified herein
was highly consistent, suggesting that musical advantage is not necessarily selective. The present
results also imply that part of the musical advantage might arise from the differential use of
acoustic cues by the musicians. Despite the standard caveats of correlational studies, the current
study presents theoretically and practically significant findings that I believe will withstand
scrutiny by future intervention studies.
Acknowledgement
I wish to thank Mairéad MacSweeney for her dedicated support. I also appreciate Arthur
Samuel for recording the stimuli and sharing the control tasks. This research was supported by
the Croucher Postdoctoral Fellowship from the Croucher Foundation to William Choi. It was
also supported by the Start-up Research Fund from The University of Hong Kong to William
Choi.
MUSIC-TO-LANGUAGE TRANSFER 20
References
Aguilera, M., El Yagoubi, R., Espesser, R., & Astésano, C. (2014). Event-related potential
investigation of initial accent processing in French. Proceedings of Speech Prosody, 383
387.
Asaridou, S. S., Hagoort, P., & McQueen, J. M. (2015). Effects of early bilingual experience
with a tone and a non-tone language on speechmusic integration. PLoS ONE, 10(12),
e0144225.
Alexander, J., Wong, P. C. M., & Bradlow, A. R. (2005). Lexical tone perception in
musicians and nonmusicians. Paper presented in INTERSPEECH 2005 Eurospeech,
9th European Conference on Speech Communication and Technology, Lisbon,
Portugal, September 48, 2005.
Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone language speakers and
musicians shared enhanced perceptual and cognitive abilities for musical pitch: Evidence
for bidirectionality between the domains of language and music. PLoS ONE, 8(4),
e60676.
Choi, W. (2020). The selectivity of musical advantage: Musicians exhibit perceptual
advantage for some but not all Cantonese tones. Music Perception, 37(5), 423434.
Choi, W. (2021a). Cantonese advantage on English stress perception: Constraints and neural
underpinnings. Neuropsychologia, 158, 107888.
Choi, W. (2021b). Musicianship influences language effect on musical pitch perception.
Frontiers in Psychology, 12, 712753.
MUSIC-TO-LANGUAGE TRANSFER 21
Choi, W., Tong, X., & Cain, K. (2016). Lexical prosody beyond first-language boundary:
Chinese lexical tone sensitivity predicts English reading comprehension. Journal of
Experimental Child Psychology, 148, 7086.
Choi, W., Tong, X., & Deacon, H. (2017). Double dissociations in reading comprehension
difficulties among ChineseEnglish bilinguals and their association with tone awareness.
Journal of Research in Reading, 40(2), 184198.
Choi, W., Tong, X., Gu, F., Tong, X., & Wong, L. (2017). On the early neural perceptual
integrality of tones and vowels. Journal of Neurolinguistics, 41, 1123.
Choi, W., Tong, X., & Samuel, A. G. (2019). Better than native: Tone language
experience enhances English lexical stress discrimination in CantoneseEnglish
bilingual listeners. Cognition, 189, 188192.
Choi, W., Tong, X., & Singh, L. (2017). From lexical tone to lexical stress: A cross-language
mediation model for Cantonese children learning English as a second language.
Frontiers in Psychology, 8, 492.
Chrabaszcz, A., Winn, M., Lin, C. Y., & Idsardi, W. J. (2014). Acoustic cues to
perception of word stress by English, Mandarin, and Russian speakers. Journal of
Speech, Language and Hearing Research, 57, 14681479.
Cooper, A., & Wang, Y. (2012). The influence of linguistic and musical experience on
Cantonese word learning. Journal of the Acoustical Society of America, 131(6), 4756
4768.
Corrigall, K. A., Schellenberg, E. G., Misura, N. M. (2013). Music training, cognition, and
MUSIC-TO-LANGUAGE TRANSFER 22
personality. Frontiers in Psychology, 4, 222.
Cutler, A. (2014). Native Listening: Language Experience and the Recognition of Spoken
Words. Cambridge MA: MIT Press.
Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A distressing “deafness” in French?
Journal of Memory and Language, 36, 406421.
Dupoux, E., Peperkamp, S., & Sebastian-Galles, N. (2010). Limits on bilingualism
revisited: Stress ‘deafness’ in simultaneous French–Spanish bilinguals. Cognition, 114,
266275.
Fitzroy, A. B., & Sanders, L. D. (2013). Musical expertise modulates early processing of
syntactic violations in language. Frontiers in Psychology, 3, e603.
Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and
linguistic structures. Cerebral Cortex, 21(10), 23572365.
Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1,
205213.
Gandour, J. (1981). Perceptual dimensions of tone: Evidence from Cantonese. Journal of
Chinese Linguistics, 9, 2036.
Garde, O. (1968). L’accent. Paris: Presses Universitaires de France.
George, E. M., & Coch, D. (2011). Music training and working memory: An ERP study.
Neuropsychologia, 49(5), 10831094.
Henrich, K., Alter, K., Wiese, R., & Domahs, U. (2014). The relevance of rhythmical alternation
MUSIC-TO-LANGUAGE TRANSFER 23
in language processing: An ERP study on English compounds. Brain and Language, 136,
1930.
Holliman, A. J., Wood, C., & Sheehy, K. (2010). The contribution of sensitivity to speech
rhythm and non-speech rhythm to early reading development. Educational Psychology,
30(3), 247267.
Holliman, A. J., Wood, C., & Sheehy, K. (2012). A cross-sectional study of prosodic sensitivity
and reading difficulties. Journal of Research in Reading, 35(1), 3248.
Hutka, S., Bidelman, G. M., & Moreno, S. (2015). Pitch expertise is not created equal: Cross-
domain effects of musicianship and tone language experience on neural and behavioural
discrimination of speech and music. Neuropsychologia, 71, 5263.
Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress
patterns of English words. Child Development, 64(3), 675687.
Kolinsky, R., Cuvelier, H., Goetry, V., Peretz, I., & Morais, J. (2009). Music training facilitates
lexical stress processing. Music Perception, 26(3), 235246.
Kraus, N., & Chandrasekaran, B. (2010). Music training for developmental auditory skills.
Nature Reviews Neuroscience, 11(8), 599605.
Ladd, D. R. (2008). Intonational Phonology. Cambridge: Cambridge University Press.
Lerdahl, F. (2001). Tonal Pitch Space. Oxford and New York: Oxford University Press.
Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of
musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of
Cognitive Neuroscience, 23(10), 27012715.
MUSIC-TO-LANGUAGE TRANSFER 24
Marie, C., Magne, C., & Besson, M. (2010). Musicians and the metric structure of words.
Journal of Cognitive Neuroscience, 23(2), 294305.
Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical
training influences linguistic abilities in 8-year-old children: More evidence for brain
plasticity. Cerebral Cortex, 19(3), 712723.
Nan, Y., Liu, L., Geiser, E., Shu, H., Gong, C. C., Dong, Q., Gabrieli, J. D. E., & Desimone, R.
(2018). Piano training enhances the neural processing of pitch and improves speech
perception in Mandarin-speaking children. Proceedings of the National Academy of
Sciences, 115(28), 66306639.
Palmer, C., & Kelly, M. H. (1992). Linguistic prosody and musical meter in song. Journal of
Memory and Language, 31(4), 525542.
Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned
neural distinction of speech syllables. Neuroscience, 219, 111119.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674681.
Patel, A. D. (2011). Why would musical training benefit the neural encoding of
speech? The OPERA hypothesis. Frontiers in Psychology, 2, 142.
Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes
speech? The expanded OPERA hypothesis. Hearing Research, 308, 98108.
Patel, A. D., & Daniele, J. R. (2003a). An empirical comparison of rhythm in language and
music. Cognition, 87, B35B45.
Patel, A. D., & Daniele, J. R. (2003b). Stress-timed vs. syllable-timed music? A comment on
MUSIC-TO-LANGUAGE TRANSFER 25
Huron and Ollen (2003). Music Perception, 21, 273276.
Roden, I., Grube, D., Bongard, S., & Kreutz, G. (2014). Does music training enhance working
memory performance? Findings from a quasi-experimental longitudinal study.
Psychology of Music, 42(2), 284298.
Schellenberg, E. G. (2015). Music training and speech perception: A gene-environment
interaction. Annals of the New York Academy of Science, 1337, 170177.
Schulze, K., Dowling, W. J., & Tillmann, B. (2012). Working memory for tonal and atonal
sequences during a forward and a backward recognition task. Music Perception, 29(3),
255267.
Shook, A., Marian, V., Bartolotti, J., & Schroeder, S. R. (2013). Musical experience influences
statistical learning of a novel language. American Journal of Psychology, 126(1), 95104.
Skoe, E., & Kraus, N. (2013). Musical training heightens auditory brainstem function during
sensitive periods in development. Frontiers in Psychology, 4, e622.
Teschner, R. V., & Whitley, S. M. (2004). Pronouncing English: A Stress-Based
Approach with CD-ROM. Washington DC: Georgetown University Press.
Tong, X., Choi, W., & Man, Y. Y. (2018). Tone language experience modulates the
effect of long-term musical training on musical pitch perception. Journal of the
Acoustical Society of America, 144(2), 690697.
Toussaint, G. T. (2005). The geometry of musical rhythm. In M. Kano & X. Tan (Eds).
Proceedings of the Japan Conference on Discrete and Computational Geometry, 3742,
198212.
MUSIC-TO-LANGUAGE TRANSFER 26
Vidal, M. M., Lousada, M., & Vigário, M. (2020). Music effects on phonological awareness
development in 3-year-old children. Applied Psycholinguistics, 41(2), 299318.
Wang, Q. (2008). Perception of English stress by Mandarin Chinese learners of
English: An acoustic study (Unpublished doctoral dissertation). British Columbia:
University of Victoria.
Yu, V. Y., & Andruski, J. E. (2010). A cross-language study of perception of lexical
stress in English. Journal of Psycholinguistic Research, 39, 323344.
Zheng, Y., & Samuel, A. G. (2018). The effects of ethnicity, musicianship, and tone
language experience on pitch perception. Quarterly Journal of Experimental
Psychology, 71(12), 26272642.
MUSIC-TO-LANGUAGE TRANSFER 27
Table 1. Musical experience of the musicians
Participant
Onset age
(years)
Amount of music
training (years)
First
instrument
Second
instrument
Third
instrument
M1
7
11
Piano
Oboe
-
M2
12
11
Piano
-
-
M3
7
11
Piano
Guitar
-
M4
5
10
Guitar
Keyboard
-
M5
9
12
Piano
Guitar
Bass
M6
13
10
Drums
Guitar
-
M7
9
8
Piano
-
-
M8
14
8
Piano
Bass
-
M9
6
10
Piano
Flute
-
M10
8
10
Flute
-
-
M11
11
9
Piano
Ukulele
-
M12
6
18
Piano
Violin
-
M13
6
20
Clarinet
-
-
M14
7
7
Piano
-
-
M15
4
20
Piano
-
-
M16
9
10
Clarinet
-
-
M17
6
10
Piano
Guitar
Ukulele
M18
5
16
Violin
-
-
M19
5
10
Piano
-
-
MUSIC-TO-LANGUAGE TRANSFER 28
Table 2. Comparison of age, non-verbal intelligence, and short-term memory between English
musicians and non-musicians.
Variable
Musicians
Non-musicians
Group difference (p value)
Chronological age in years (SD)
26.63 (5.89)
32.67 (11.60)
.052
Non-verbal intelligence (SD)
9.68 (2.21)
8.83 (3.29)
.360
Short-term memory (SD)
7.84 (2.39)
7.83 (3.76)
.993
Note. The maximum possible value of non-verbal intelligence is 14. There are no maximum
possible values for age and short-term memory.
MUSIC-TO-LANGUAGE TRANSFER 29
Table 3. Fundamental frequency, duration, and intensity values of the stimuli.
Stimuli
First syllable
Second syllable
F0
(Hz)
Duration
(ms)
Intensity
(dB)
F0
(Hz)
Duration
(ms)
Intensity
(dB)
Male
ˈpɚmɪt
168
155
69
188
445
58
pɚˈmɪt
124
151
67
127
449
62
ˈsəspekt
218
305
69
483
295
59
səsˈpekt
127
300
67
157
300
65
ˈɪnsɚt
168
124
70
160
476
63
ɪnˈsɚt
117
139
67
158
461
66
ˈimpɔrt
163
177
60
103
423
67
imˈpɔrt
110
224
58
123
376
68
Female
ˈpɚmɪt
119
237
70
209
363
59
pɚˈmɪt
228
209
69
261
391
64
ˈsəspekt
130
332
68
269
268
62
səsˈpekt
219
315
62
245
285
69
ˈɪnsɚt
240
239
69
209
361
60
ɪnˈsɚt
218
204
65
311
396
67
ˈimpɔrt
262
290
68
205
310
62
imˈpɔrt
222
250
63
239
350
68
Note. All values are rounded to the nearest integer.
MUSIC-TO-LANGUAGE TRANSFER 30
Table 4. Fundamental frequency, durational, and intensity ratios of the first to second syllables
of the stimuli.
Stimuli
First-to-second syllable
(Male)
First-to-second syllable
(Female)
F0 ratio
Duration
ratio
Intensity
ratio
F0 ratio
Duration
ratio
Intensity
ratio
ˈpɚmɪt
0.89
0.35
1.21
0.57
0.65
1.19
pɚˈmɪt
0.97
0.34
1.09
0.87
0.53
1.08
ˈsəspekt
0.45
1.03
1.17
0.48
1.24
1.10
səsˈpekt
0.81
1.00
1.03
0.89
1.11
0.89
ˈɪnsɚt
1.05
0.26
1.12
1.15
0.66
1.17
ɪnˈsɚt
0.74
0.30
1.01
0.70
0.52
0.97
ˈimpɔrt
1.58
0.42
0.90
1.28
0.94
1.09
imˈpɔrt
0.89
0.60
0.86
0.93
0.71
0.93
Note. All values are rounded to the nearest two decimal places.
MUSIC-TO-LANGUAGE TRANSFER 31
Table 5. Correlations between the F0, duration, and intensity ratios and the trial-specific mean
accuracies of musicians and non-musicians.
Musicians’ accuracy
Non-musicians’ accuracy
Iambic stress stimuli
F0 ratio
.24*
.46**
Duration ratio
-.38**
-.42**
Intensity ratio
.29*
.28*
Trochaic stress stimuli
F0 ratio
ns
ns
Duration ratio
-.27*
-.29*
Intensity ratio
.23
.23
Note. All values are rounded to the nearest two decimal places. ** p < .01; * p < .05; p = .062; p
= .055.
MUSIC-TO-LANGUAGE TRANSFER 32
Figure 1. The hit and false alarm rates of musicians and non-musicians in the English stress
discrimination task. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 33
Figure 2. The mean sensitivity index of musicians and non-musicians in the English stress
discrimination task. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 34
Figure 3. The mean hit rate and response time of musicians and non-musicians for trochaic and
iambic stress. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 35
Appendix I
Analysis of Age and Cognitive Profiles
Correlational analyses showed significant correlations between age and non-verbal
intelligence, r = -.39, p < .05, age and short-term memory, r = -.43, p < .01, and non-verbal
intelligence and short-term memory, r = .46, p < .01. Thus, multivariate analysis of variance
(MANOVA) was conducted to examine the potential group differences. MANOVA showed non-
significant main effect of group, p = .181, implying that the groups matched on these variables.
To be empirically stringent, independent sample t-tests were conducted as MANOVA has
a weak power for detecting differences. Consistent with MANOVA, both groups did not differ
significantly in non-verbal intelligence, t(35) = .93, p = .360, and short-term memory, t(35)
= .01, p = .993. However, the nonmusicians were marginally older than the musicians, t(35) = -
2.01, p = .052, d = .66. As the perceptual differences between the musicians and nonmusicians
were only meaningful if they remained evident when age and general cognitive abilities were
held constant, these three variables were controlled in the main analysis.
MUSIC-TO-LANGUAGE TRANSFER 36
Appendix II
Analysis of Response Time Across All Trials
Figure S1 shows the mean response time (collapsed across 96 trials) of musicians and
nonmusicians. To examine whether musicians and nonmusicians differed in the mean response
time across all 96 trials, one-way ANCOVA was conducted on mean response time with group
(musicians and nonmusicians) being the independent variable. Age, non-verbal intelligence, and
short-term memory were controlled. ANCOVA showed no significant group difference in mean
response time, p = .837.
Figure S1. The mean response time (collapsed across 96 trials) of musicians and nonmusicians in
the English stress discrimination task. Error bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 37
Appendix III
Acoustic Analysis of Gender Differences
As f0, duration, and intensity did not correlate with each other, ps > .05, three sets of
three-way ANOVAs were conducted on each acoustic parameter with stress pattern (iambic and
trochaic), syllable status (stressed or unstressed), and gender (male and female) as the
independent variables.
For f0, three-way ANOVA revealed a significant main effect of gender, F(1, 24) = 6.98,
p < .05, η2 = .23, but not syllable status, p = .922, and stress pattern, p = .411. The interaction
between stress pattern and gender was significant, F(1, 24) = 4.27, p = .05, η2 = .15, but not the
interactions between stress pattern and syllable status, p = .163, and between gender and syllable
status, p = .411. The three-way interaction was also non-significant, p = .687. For the interaction
between stress pattern and gender, pairwise comparisons showed that f0 varied marginally
significantly across trochaics and iambics only for male, p = .051, but not for female, p = .393.
For duration, three-way ANOVA showed non-significant main effects of gender, p =
1.00, stress pattern, p = 1.00, and syllable status, p = .714. The interaction between gender and
syllable status was also non-significant, p = .430. However, the interaction between stress pattern
and syllable status was significant, F(1, 24) = 43.03, p < .001, η2 = .64, so was the three-way
interaction between gender, stress pattern, and syllable status, F(1, 24) = 7.15, p < .05, η2 = .23.
Simple main effect analysis was conducted to unpack the three-way interaction. For male-
produced iambics and trochaics, stressed and unstressed syllables differed significantly in
duration, ps < .01. For female-produced iambics, stressed and unstressed syllables also differed
MUSIC-TO-LANGUAGE TRANSFER 38
significantly in duration, p < .01. However, for female-produced trochaics, stress and unstressed
syllables did not differ significantly in duration, p = .089.
For intensity, three-way ANOVA revealed a significant main effect of syllable status,
F(1, 24) = 4.44, p < .05, η2 = .16, in which stressed syllables had higher intensity than unstressed
syllables. All other main effects and interactions were not significant, ps > .05.
... In the music-to-language transfer literature, most studies pertain to the first fundamental question. For example, they have identified the positive effect of musicianship on tone perception (Choi, 2020;Zheng and Samuel, 2018), stress perception (Choi, 2022a;Kolinsky et al., 2009), and segmental perception (Cooper et al., 2017;Sadakata and Sekiyama, 2011). Based on empirical works, theoreticians have promisingly captured how musicianship enhances perceptual sensitivities to segmental and suprasegmental information (Patel, 2011(Patel, , 2014Tierney and Kraus, 2014). ...
... In both tasks, the musicians outperformed the nonmusicians on some but not all Cantonese tones, suggesting that their musical advantage was selective to some tones rather than general. In addition to tones, English and French musicians also discriminate and recall English stress sequences more accurately than their non-musician counterparts (Choi, 2022a;Kolinsky et al., 2009). Segmentally, English musicians outperformed English non-musicians on discriminating phonemic length vocalic contrasts in Thai language (Cooper et al., 2017). ...
... All participants completed a music background questionnaire (Choi, 2021). On average, the Cantonese musicians received 10.8 years [standard deviation (SD) ¼ 2.9 years] of music training starting from 7.8 years old (SD ¼ 3.5 years), the Cantonese non-musicians received 0.3 years (SD ¼ 0.6 years) of music training starting from 12.3 years old (SD ¼ 6.9 years), the English musicians received 10.4 years (SD ¼ 3.2 years) of music training starting from 9 years old (SD ¼ 3.5 years), and the English non-musicians received 0.4 years (SD ¼ 0.7 years) of music training starting from 11.6 years old (SD ¼ 2.9 years). ...
Article
Full-text available
This study investigated the effect of musicianship on the perceptual integrality of tones and segmental information in non-native speech perception. We tested 112 Cantonese musicians, Cantonese non-musicians, English musicians, and English non-musicians with a modified Thai tone AX discrimination task. In the tone discrimination task, the control block only contained tonal variations, whereas the orthogonal block contained both tonal and task-irrelevant segmental variations. Relative to their own performance in the control block, the Cantonese listeners showed decreased sensitivity index (d') and increased response time in the orthogonal block, reflecting integral perception of tones and segmental information. By contrast, the English listeners performed similarly across the two blocks, indicating independent perception. Bayesian analysis revealed that the Cantonese musicians and the Cantonese non-musicians perceived Thai tones and segmental information equally integrally. Moreover, the English musicians and the English non-musicians showed similar degrees of independent perception. Based on the above results, musicianship does not seem to influence tone-segmental perceptual integrality. While musicianship apparently enhances tone sensitivity, not all musical advantages are transferrable to the language domain.
... Unlike intervention studies, cross-sectional studies focus on musicianship rather than music training, per se (Choi, 2021b;Delogu, Lampis, & Belardinelli, 2010;Zheng & Samuel, 2018). Though musicianship and music training are often used interchangeably, they are not equivalent. ...
... Does OPERA apply to stress perception? Relative to tones, stress has received far less empirical attention (Choi, 2021b;Kolinsky, Cuvelier, Goetry, Peretz, & Morais, 2009). Apart from enriching the literature, the present study focuses on English stress perception considering its practical significance in Cantonese ESL learning. ...
... English stress perception (Choi, 2021b;Kolinsky et al., 2009). French listeners showed mismatch negativity (MMN) response to stress violations, indicating that they are cortically sensitive to stress (Aguilera, Yagoubi, Espesser, & Astésano, 2014). ...
Article
Full-text available
Purpose: This study investigates how Cantonese language experience influences the potential effects of (i) musicianship and (ii) musical ability on English stress perception. Method: The sample contained 124 participants, evenly split into Cantonese musician, Cantonese non-musician, English musician, and English non-musician groups. They completed the English stress discrimination task, English stress sequence recall task, Musical Ear Test, and non-verbal intelligence task. Following the musicianship-based analysis, 44 Cantonese and English listeners were re-assigned to four groups based on their musical ability—Cantonese high musical ability, Cantonese low musical ability, English high musical ability, and English low musical ability groups. Results: Musicianship-based analysis on English stress perception revealed a significant interaction between musicianship and language. Specifically, musicians outperformed non-musicians only among the English but not the Cantonese listeners. By contrast, ability-based analysis showed significant main effects of musical ability and language. For both Cantonese and English listeners, those with a high musical ability outperformed those with a low musical ability. Regardless of musical ability, Cantonese listeners outperformed English listeners. Correlational analyses yielded consistent findings. Conclusions: This study has found cross-sectional evidence that musical ability, but not musicianship, facilitates Cantonese English as a second language (ESL) listeners’ English stress perception. From a theoretical perspective, the current findings motivate two potential additions to the OPERA hypothesis for music-to-language transfer—unsaturation and utilization. Practically, the findings cast doubt on the application of non-perceptual based instrumental music training to enhance Cantonese ESL learners’ perceptual learning of English stress.
... In a follow-up paper on refining the OPERA hypothesis, Patel (2012) raised the possibility that the OPERA hypothesis might apply to rhythmic perception. For music-to-language transfer, there was behavioral and neural evidence of enhanced speech rhythm sensitivity among musicians (Marie et al., 2011;Cason et al., 2015;Magne et al., 2016;Choi, 2021b). This suggested that the OPERA hypothesis also applied to rhythmic perception, at least unidirectionally (i.e., music-to-language). ...
... All English listeners reported that they (iv) were living in the United States, (v) spoke English as a first language, and (vi) had normal hearing. 2 prolific.co Based on the pre-established criteria, musicians were individuals who (a) had received 7 or more years of continuous music training and (b) could play at least one music instrument (Choi, 2020(Choi, , 2021b. Non-musicians were individuals who (c) had never received more than 2 years of music training, (d) had not received any music training in the past 5 years, and (e) could not play any music instrument. ...
... An automatic procedure ensured that the participants were using a computer but not phones or tablets. After giving written consent, the participants filled out a language and music background questionnaire (Choi et al., 2017(Choi et al., , 2019Choi, 2021b). Prior to the Musical Ear Test, the participants could test and adjust the sound volume to their satisfaction (Wallentin et al., 2010). ...
Article
Full-text available
Given its practical implications, the effect of musicianship on language learning has been vastly researched. Interestingly, growing evidence also suggests that language experience can facilitate music perception. However, the precise nature of this facilitation is not fully understood. To address this research gap, I investigated the interactive effect of language and musicianship on musical pitch and rhythmic perception. Cantonese and English listeners, each divided into musician and non-musician groups, completed the Musical Ear Test and the Raven’s 2 Progressive Matrices. Essentially, an interactive effect of language and musicianship was found on musical pitch but not rhythmic perception. Consistent with previous studies, Cantonese language experience appeared to facilitate musical pitch perception. However, this facilitatory effect was only present among the non-musicians. Among the musicians, Cantonese language experience did not offer any perceptual advantage. The above findings reflect that musicianship influences the effect of language on musical pitch perception. Together with the previous findings, the new findings offer two theoretical implications for the OPERA hypothesis—bi-directionality and mechanisms through which language experience and musicianship interact in different domains.
... Whereas past studies only tested whether FWP or PAM-S applied to simple discrimination (e.g., AX, AXB, or ABX tasks; Francis et al., 2008;Kan & Schmid, 2019;So & Best, 2010;, this study also tests whether they apply to more complex perceptual operations, especially as speech perception entails not only discrimination but also abstract phonological processing (Wong & Perrachione, 2007). For example, judging the loudness of two beeps only requires discrimination, but recalling phonemic sequences produced by different speakers requires higher-level perceptual operations of speaker normalization, phonetic/phonological encoding, and memory sequencing (Choi, 2020;2022b;Correia et al., 2015;Dupoux et al., 2010;Kim & Tremblay, 2021). ...
... The inter-stimulus interval was 600ms and each stimulus was followed by one produced by the other gender (e.g., male-female-male-female; or female-male-female-male). These gender variations obstructed direct auditory comparisons (Choi, 2020;2022b;Dupoux et al., 2008). After a sequence was presented, the participants were asked to reproduce the sequence by pressing the associated keys in the correct order (e.g., for /tɛ1-tɛ3-tɛ3-tɛ1/, ...
Article
Full-text available
English listeners often struggle to perceive tones, but some are easier than others. This study examined these phenomena grounded in the Feature Weighing Perspective (FWP) and the Perceptual Assimilation Model for Suprasegmentals (PAM-S). Forty-seven English and Cantonese listeners completed 4,212 trials of Cantonese tone discrimination and sequence recall tasks. The English listeners showed asymmetrical perceptual patterns of discrimination but not sequence recall. Specifically, these English listeners discriminated T1-T5, T3-T5, and T4-T5 more accurately than T1-T4, T3-T4, and T1-T3. However, they recalled the contour tone and level tone sequences with similar accuracies. Results of the discrimination task aligned with the predictions of PAM-S but not FWP. However, results of the sequence recall task did not support PAM-S. Together, these results suggest that PAM-S only applies to simple discrimination, not abstract phonological processing with a high memory load.
... We classified the participants into three groups, i.e., pitched musicians, unpitched musicians, and nonmusicians. Based on the criteria used in previous studies (Choi, 2020(Choi, , 2022bTong et al., 2018), the pitched musicians had at least seven years of continuous piano and/or violin training, less than two years of unpitched percussion training, and could play their instruments at the time of testing. The unpitched musicians had at least seven years of continuous unpitched percussion training, less than two years of pitched music training, and could play their instruments at the time of testing. ...
Article
Full-text available
Different musical instruments have different pitch processing demands. However, correlational studies have seldom considered the role of musical instruments in music-to-language transfer. Addressing this research gap could contribute to a nuanced understanding of music-to-language transfer. To this end, we investigated whether pitched musicians had a unique musical advantage in lexical tone perception relative to unpitched musicians and nonmusicians. Specifically, we compared Cantonese pitched musicians, unpitched musicians, and nonmusicians on Thai tone discrimination and sequence recall. In the Thai tone discrimination task, the pitched musicians outperformed the unpitched musicians and the nonmusicians. Moreover, the unpitched musicians and the nonmusicians performed similarly. In the Thai tone sequence recall task, both pitched and unpitched musicians recalled level tone sequences more accurately than the nonmusicians, but the pitched musicians showed the largest musical advantage. However, the three groups recalled contour tone sequences with similar accuracy. Collectively, the pitched musicians had a unique musical advantage in lexical tone discrimination and the largest musical advantage in level tone sequence recall. From a theoretical perspective, this study offers correlational evidence for the Precision element of the OPERA hypothesis. The choice of musical instrumental may matter for music-to-language transfer in lexical tone discrimination and level tone sequence recall.
... The participants were sorted into three groups, that is, pitched musician (n = 15), unpitched musician (n = 13), and non-musician (n = 15). Adopting pre-established criteria in previous studies (Choi, 2021(Choi, , 2022b(Choi, , 2022cCooper & Wang, 2012), the pitched musicians had at least 7 years of continuous piano and/or violin training, less than 2 years of unpitched percussion training, and could play their instruments at the time of testing. The unpitched musician had at least 7 years of continuous unpitched percussion training, less than 2 years of pitched musical training, and could play their instruments at the time of testing. ...
Article
Full-text available
The present study investigated the differential effects of pitched and unpitched musicianship on tone identification and word learning. We recruited 44 Cantonese-pitched musicians, unpitched musicians, and non-musicians. They completed a Thai tone identification task and seven sessions of Thai tone word learning. In the tone identification task, the pitched musicians outperformed the non-musicians but the unpitched musicians did not. In session 1 of the tone word learning task, the three groups showed similar accuracies. In session 7, the pitched musicians outperformed the non-musicians but the unpitched musicians did not. The results indicate that the musical advantage in tone identification and word learning hinges on pitched musicianship. From a theoretical perspective, these findings support the precision element of the OPERA hypothesis. Broadly, they reflect the need to consider the heterogeneity of musicianship when studying music-to-language transfer. Practically, the findings highlight the potential of pitched music training in enhancing tone word learning proficiency. Furthermore, the choice of musical instrument may matter to music-to-language transfer.
... Specifically, the acoustic cue with a high functional load in L1 lexical distinction is more likely to be utilized to process the L2 perceptual attributes. For example, Cantonese listeners utilize f0 whereas English listeners use a combination of f0, duration, intensity, and vowel quality cues for English stress perception (Choi, 2021c;Lai, 2019;cf. Chrabaszcz et al., 2014 for Mandarin listeners). ...
Article
Full-text available
Can non-natives outperform natives on speech discrimination? Surprisingly, Cantonese listeners discriminated English stress more accurately than did English listeners. To ascertain its generalizability, I further ask whether this Cantonese advantage in English stress discrimination is equally potent across pitch accent and vowel reduction contexts. Sixty Cantonese and English listeners completed four blocks of English stress discrimination task with varying pitch accent and vowel reduction contexts. In the absence of rising pitch accent pattern and vowel reduction , the Cantonese listeners outperformed the English listeners on English stress discrimination. However, the Cantonese advantage disappeared when either rising pitch accent pattern or vowel reduction was present. When both rising pitch accent pattern and vowel reduction were present, the Cantonese listeners even performed poorer than the English listeners. The findings underscore two constraints of the Cantonese advantage in English stress discrimination--rising pitch accent pattern and vowel reduction. Based on collective research on non-native advantage in speech perception, the Acoustic-Attentional-Contextual hypothesis is proposed.
... Since non-verbal intelligence correlated with English stress perception in the previous study, it was included as a control variable. Musical pitch sensitivity was CONSTRAINTS OF CANTONESE ADVANTAGE 9 also included as a control measure given its possible cross-domain contribution to English stress discrimination (Choi, 2021;Patel, 2011). ...
Article
Full-text available
A prevailing conception of cross-linguistic transfer is that first language experience poses perceptual interference, or at best null effect, on second language speech perception. Surprisingly, a recent study found that Cantonese listeners outperformed English listeners on English stress perception. The present study further evaluated whether segmental variations would constrain the Cantonese advantage on English stress perception. Cantonese and English listeners were tested with both active and passive oddball paradigms in which ERP responses to English stress deviations were elicited. Behaviorally, the Cantonese listeners exhibited a perceptual advantage relative to the English listeners, but the advantage disappeared upon the introduction of segmental variations. Neurophysiologically, segmental variations diminished the P3b amplitudes of the Cantonese but not the English listeners. Collectively, results suggest that segmental variations constrain the Cantonese advantage on English stress perception.
Article
Full-text available
Musical training has been associated with various cognitive benefits, one of which is enhanced speech perception. However, most findings have been based on musicians taking part in ongoing music lessons and practice. This study thus sought to determine whether the musician advantage in pitch perception in the language domain extends to individuals who have ceased musical training and practice. To this end, adult active musicians (n = 22), former musicians (n = 27), and non-musicians (n = 47) were presented with sentences spoken in a native language, English, and a foreign language, French. The final words of the sentences were either prosodically congruous (spoken at normal pitch height), weakly incongruous (pitch was increased by 25%), or strongly incongruous (pitch was increased by 110%). Results of the pitch discrimination task revealed that although active musicians outperformed former musicians, former musicians outperformed non-musicians in the weakly incongruous condition. The findings suggest that the musician advantage in pitch perception in speech is retained to some extent even after musical training and practice is discontinued.
Article
Full-text available
Given its practical implications, the effect of musicianship on language learning has been vastly researched. Interestingly, growing evidence also suggests that language experience can facilitate music perception. However, the precise nature of this facilitation is not fully understood. To address this research gap, I investigated the interactive effect of language and musicianship on musical pitch and rhythmic perception. Cantonese and English listeners, each divided into musician and non-musician groups, completed the Musical Ear Test and the Raven’s 2 Progressive Matrices. Essentially, an interactive effect of language and musicianship was found on musical pitch but not rhythmic perception. Consistent with previous studies, Cantonese language experience appeared to facilitate musical pitch perception. However, this facilitatory effect was only present among the non-musicians. Among the musicians, Cantonese language experience did not offer any perceptual advantage. The above findings reflect that musicianship influences the effect of language on musical pitch perception. Together with the previous findings, the new findings offer two theoretical implications for the OPERA hypothesis—bi-directionality and mechanisms through which language experience and musicianship interact in different domains.
Article
Full-text available
A prevailing conception of cross-linguistic transfer is that first language experience poses perceptual interference, or at best null effect, on second language speech perception. Surprisingly, a recent study found that Cantonese listeners outperformed English listeners on English stress perception. The present study further evaluated whether segmental variations would constrain the Cantonese advantage on English stress perception. Cantonese and English listeners were tested with both active and passive oddball paradigms in which ERP responses to English stress deviations were elicited. Behaviorally, the Cantonese listeners exhibited a perceptual advantage relative to the English listeners, but the advantage disappeared upon the introduction of segmental variations. Neurophysiologically, segmental variations diminished the P3b amplitudes of the Cantonese but not the English listeners. Collectively, results suggest that segmental variations constrain the Cantonese advantage on English stress perception.
Article
Full-text available
The OPERA Hypothesis theorizes how musical experience heightens perceptual acuity to lexical tones. One missing element in the hypothesis is whether musical advantage is general to all or specific to some lexical tones. To further extend the hypothesis, this study investigated whether English musicians consistently outperformed English nonmusicians in perceiving a variety of Cantonese tones. In an AXB discrimination task, the musicians exhibited superior discriminatory performance over the nonmusicians only in the high level, high rising, and mid-level tone contexts. Similarly, in a Cantonese tone sequence recall task, the musicians significantly outperformed the nonmusicians only in the contour tone context but not in the level tone context. Collectively, the results reflect the selectivity of musical advantage--musical experience is only advantageous to the perception of some but not all Cantonese tones, and elements of selectivity can be introduced to the OPERA hypothesis. Methodologically, the findings highlight the need to include a wide variety of lexical tone contrasts when studying music-to-language transfer.
Article
Full-text available
Music and language engage similar processing mechanisms, including auditory processing and higher cognitive functions, recruiting partially overlapping brain structures. It has been argued that both are related in child development and that linguistic functions can be positively influenced by music training above 4-years-old. In this randomized control study, with a test-training-retest methodology, 44 children (3-4 years old) were assessed with a phonological awareness test, prior and after an intervention period of a school year with weekly music classes (experimental group, n = 23) or visual arts classes (control group, n = 21) in kindergarten. In the preassessment there were no significant differences between groups. When comparing pre-and postassessment, results showed significant differences in both groups, but music classes' students outperformed the control group, showing larger differences between the beginning and the end of the intervention. Improvement in both groups is expected due to general developmental reasons. However, the fact that children receiving music classes show greater improvement indicates that music lessons have influenced phonological awareness. Our results support the hypothesis that music training may promote language abilities, specifically phonological awareness, prior to the ages previously studied. In the last years, the relation between music and language development has attracted great attention in various research fields, and in particular in linguistics. According to Gerry, Unrau, and Trainor (2012) active music participation seems to influence child development at as soon as 6 months of age, as participating in classes with appropriate pedagogical methodologies improves social and communicative development between infants and their parents. Later in development, music training and/or music abilities have been found to predict literacy outcomes (phonological awareness [word, syllabic] and phonemic awareness. |Open Access|
Article
Full-text available
While many second language (L2) listeners are known to struggle when discriminating non-native features absent in their first language (L1), no study has reported that L2 listeners perform better than native listeners in this regard. The present study tested whether Cantonese-English bilinguals were better in discriminating English lexical stress in individual words or pseudowords than native English listeners, even though lexical stress is absent in Cantonese. In experiments manipulating acoustic, phonotactic, and lexical cues, Cantonese-English bilingual adults exhibited superior performance in discriminating English lexical stress than native English listeners across all phonotactic/lexical conditions when the fundamental frequency (f0) cue to lexical stress was present. The findings underscore the facilitative effect of Cantonese tone language experience on English lexical stress discrimination.
Article
Full-text available
Significance Musical training is beneficial to speech processing, but this transfer’s underlying brain mechanisms are unclear. Using pseudorandomized group assignments with 74 4- to 5-year-old Mandarin-speaking children, we showed that, relative to an active control group which underwent reading training and a no-contact control group, piano training uniquely enhanced cortical responses to pitch changes in music and speech (as lexical tones). These neural enhancements further generalized to early literacy skills: Compared with the controls, the piano-training group also improved behaviorally in auditory word discrimination, which was correlated with their enhanced neural sensitivities to musical pitch changes. Piano training thus improves children’s common sound processing, facilitating certain aspects of language development as much as, if not more than, reading instruction.
Article
Full-text available
Language and music are intertwined: music training can facilitate language abilities, and language experiences can also help with some music tasks. Possible language-music transfer effects are explored in two experiments in this study. In Experiment 1, we tested native Mandarin, Korean, and English speakers on a pitch discrimination task with two types of sounds: speech sounds and fundamental frequency (F0) patterns derived from speech sounds. To control for factors that might influence participants' performance, we included cognitive ability tasks testing memory and intelligence. In addition, two music skill tasks were used to examine general transfer effects from language to music. Prior studies showing that tone language speakers have an advantage on pitch tasks have been taken as support for three alternative hypotheses: specific transfer effects, general transfer effects, and an ethnicity effect. In Experiment 1, musicians outperformed non-musicians on both speech and F0 sounds, suggesting a music-to-language transfer effect. Korean and Mandarin speakers performed similarly, and they both outperformed English speakers, providing some evidence for an ethnicity effect. Alternatively, this could be due to population selection bias. In Experiment 2, we recruited Chinese Americans approximating the native English speakers' language background to further test the ethnicity effect. Chinese Americans, regardless of their tone language experiences, performed similarly to their non-Asian American counterparts in all tasks. Therefore, although this study provides additional evidence of transfer effects across music and language, it casts doubt on the contribution of ethnicity to differences observed in pitch perception and general music abilities.
Article
Full-text available
This study investigated how Cantonese lexical tone sensitivity contributed to English lexical stress sensitivity among Cantonese children who learned English as a second language (ESL). Five-hundred-and-sixteen second-to-third grade Cantonese ESL children were tested on their Cantonese lexical tone sensitivity, English lexical stress sensitivity, general auditory sensitivity, and working memory. Structural equation modeling revealed that Cantonese lexical tone sensitivity contributed to English lexical stress sensitivity both directly, and indirectly through the mediation of general auditory sensitivity, in which the direct pathway had a larger relative contribution to English lexical stress sensitivity than the indirect pathway. These results suggest that the tone-stress association might be accounted for by joint phonological and acoustic processes that underlie lexical tone and lexical stress perception.
Article
Long-term musical training is widely reported to enhance music pitch perception. However, it remains unclear whether tone language experience influences the effect of long-term musical training on musical pitch perception. The present study addressed this question by testing 30 Cantonese and 30 non-tonal language speakers, each divided equally into musician and non-musician groups, on pitch height and pitch interval discrimination. Musicians outperformed non-musicians among non-tonal language speakers, but not among Cantonese speakers on the pitch height discrimination task. However, musicians outperformed non-musicians among Cantonese speakers, but not among non-tonal language speakers on the pitch interval discrimination task. These results suggest that the effect of long-term musical training on musical pitch perception is shaped by tone language experience and varies across different pitch perception tasks.
Article
The current study adopted the MMN additivity approach to examine the pre-attentive perceptual integration of vowels and tones. Twenty Cantonese listeners participated in the ERP experiment. Using the passive oddball paradigm, we elicited tone-MMN, vowel-MMN and double-MMN in the speech condition; and fundamental frequency-MMN, formant frequency-MMN and double-MMN in the non-speech condition. In both conditions , the double-MMNs were significantly smaller in amplitude than the sum of single feature MMNs. Morphological comparisons showed no significant difference in the latency and topographic patterns between vowel-MMN and tone-MMN, and marginal significant differences between formant frequency-MMN and fundamental frequency-MMN. Collectively , results reflect the perceptual integration of tones and vowels at the phonological level, and partial integration of fundamental frequency and formant frequency at the auditory level.