Content uploaded by William Choi
Author content
All content in this area was uploaded by William Choi on Oct 08, 2021
Content may be subject to copyright.
Running head: MUSIC-TO-LANGUAGE TRANSFER 1
Towards a Native OPERA Hypothesis:
Musicianship and English Stress Perception
William Choi
Academic Unit of Human Communication, Development, and Information Sciences,
The University of Hong Kong
Choi, W. (2021). Towards a native OPERA hypothesis: Musicianship and English stress
perception. Language and Speech. Advance online publication.
doi:10.1177/00238309211049458.
Address for correspondence: Room 765, Meng Wah Complex, The University of Hong Kong,
Pokfulam, Hong Kong; willchoi@hku.hk
MUSIC-TO-LANGUAGE TRANSFER 2
Abstract
Musical experience facilitates speech perception. French musicians, to whom stress is foreign,
have been found to perceive English stress more accurately than French non-musicians. This
study investigated whether this musical advantage also applies to native listeners. English
musicians and non-musicians completed an English stress discrimination task and two control
tasks. With age, non-verbal intelligence and short-term memory controlled, the musicians
exhibited a perceptual advantage relative to the non-musicians. This perceptual advantage was
equally potent to both trochaic and iambic stress patterns. In terms of perceptual strategy, the two
groups showed differential use of acoustic cues for iambic but not trochaic stress. Collectively,
the results could be taken to suggest that musical experience enhances stress discrimination even
among native listeners. Remarkably, this musical advantage is highly consistent and does not
particularly favour either stress pattern. For iambic stress, the musical advantage appears to stem
from the differential use of acoustic cues by musicians. For trochaic stress, the musical
advantage may be rooted in enhanced durational sensitivity.
Keywords: Music-to-language transfer, stress, music, pitch, rhythm, OPERA
MUSIC-TO-LANGUAGE TRANSFER 3
Towards a Native OPERA Hypothesis:
Musicianship and English Stress Perception
Long-term musical experience facilitates speech perception (Patel, 2011; 2014). Research
has frequently shown that musicians are better able to perceive lexical tones than non-musicians
(e.g., Alexander, Wong, & Bradlow, 2005; Choi, 2020; Kraus & Chandrasekaran, 2010; Zheng
& Samuel, 2018). These findings underpin contemporary theories of cross-domain transfer, the
most notable of which is the OPERA hypothesis (Patel, 2011; 2014). Unfortunately, most
research has only focused on lexical tone perception among non-native listeners. This has led to
research gaps concerning whether the OPERA hypothesis applies to native speech perception
and, more specifically, to other prosodic features such as stress. To ascertain the generalisability
of the OPERA hypothesis, this study investigates (i) whether English musicians outperform
English non-musicians on English stress discrimination. To provide additional insight into
music-to-language transfer, this study further examines (ii) whether the musical advantage is
selective about stress pattern, and (iii) the means by which musical experience enhances native
stress discrimination.
The OPERA hypothesis proposes that musical experience facilitates speech encoding
when five conditions are met: the neural networks for music and speech must overlap
anatomically (Overlap), and the music activities must entail more precise acoustic processing
than speech (Precision), bring about strong positive emotion (Emotion), repeat frequently
(Repetition), and require focused attention (Attention) (Patel, 2011; 2014). Patel’s hypothesis is
well supported by empirical studies, most of which have shown a musical advantage in lexical
tone perception (see Choi, 2020). For example, English musicians identified and discriminated
Mandarin tones more accurately than did English non-musicians (Alexander et al., 2005). Quite
MUSIC-TO-LANGUAGE TRANSFER 4
surprisingly, the English musicians even performed on a par with native Mandarin listeners.
English musicians’ perceptual advantage in Mandarin tone discrimination was also evident at the
phrase level (Zheng & Samuel, 2018). Neurophysiologically, French musicians also showed a
larger P3b response to Mandarin tonal and segmental variations than French non-musicians
(Marie, Delogu, Lampis, Belardinelli, & Besson, 2011).
In addition to the apparent acoustic similarities between lexical tones and musical pitch,
stress patterns coincide with the metrical structures in music (Henrich, Alter, Wiese, & Domahs,
2014; Palmer & Kelly, 1992; Patel, 2003; Lerdahl, 2001; see Gandour, 1981; Tong, Choi, &
Man, 2018 for lexical tones). Lexical stress is the relative prominence assigned to a certain
syllable in a word (Teschner & Whitley, 2004). In English, stressed syllables are typically
associated with higher fundamental frequency (f0), longer duration, and higher intensity (Choi,
2021a; Choi, Tong, & Samuel, 2019; Choi, Tong, & Singh, 2017; Fry, 1958; Wang, 2008; Yu &
Andruski, 2010). Unstressed syllables typically exhibit a vowel quality change (e.g., the second
vowel in harmony /ˈhɑməni/ is reduced to /ə/), although this is not necessarily the case (e.g., the
second vowel in import /ˈɪmport/ is not reduced). Similar to English speech, music is
characterised by repeated sequences of stressed and unstressed beats (rhythm; Toussaint, 2005).
Analyses of English- (with stress) and French- (without stress) composed music revealed that the
metrical structures paralleled the composers’ spoken language (Patel & Daniele, 2003a; 2003b).
This finding, together with the commonalities between lexical stress and musical rhythm, give
rise to the possibility of cross-domain transfer between these two features.
Indeed, musical experience facilitates English stress perception among French listeners.
Unlike English, French does not use lexical stress contrastively (it is a fixed stress language;
Garde, 1968). In an AX discrimination task, French listeners could discriminate lexical stress
MUSIC-TO-LANGUAGE TRANSFER 5
contrasts with a very low error rate (3%; Dupoux, Pallier, Sebastian, & Mehler, 1997). In an
event-related potential study, French listeners also showed a mismatch negativity (MMN)
response to stress violations, reflecting pre-attentive stress discrimination (Aguilera, El Yagoubi,
Espesser, & Astésano, 2014). The lack of MMN response in the reverse oddball task further led
to the claim that French listeners had long-term memory traces of stress. Although French
listeners are behaviourally and neurophysiologically sensitive to stress, they struggle to perceive
stress at more abstract perceptual levels. Specifically, French listeners recall stress sequences
with very high error rates (49% and 73%; Dupoux, Peperkamp, & Sebastian-Galles, 2010). Of
direct relevance to the current study is that French listeners’ difficulties in recalling stress
sequences could be mitigated by musical experience (Kolinsky, Cuvelier, Goetry, Peretz, &
Morais, 2009). In particular, French musicians were better able to recall stress sequences than
French non-musicians. This musical advantage was evident at all sequence lengths, which
suggested enhanced perception rather than increased memory span.
Considering the above findings in light of the OPERA hypothesis, musical experience
does facilitate non-native speech perception (e.g., Alexander et al., 2005; Kolinsky et al., 2008;
Zheng & Samuel, 2018; cf. Schellenberg, 2015). Here, a critical question arises as to whether the
OPERA hypothesis is also applicable to native speech perception. Subcortically, English
musicians were more sensitive to English consonantal changes (/ba/ /da/ and /ga/) than English
non-musicians (Parbery-Clark, Tierney, Strait, & Kraus, 2012). In terms of speech prosody,
French musicians exhibited a larger P200 response to metrical violations in naturally produced
French (Marie, Magne, & Besson, 2010). Collectively, these results offer some support to the
notion that musical experience facilitates native speech perception. The current study
hypothesises that English musicians discriminate English stress more accurately than do English
MUSIC-TO-LANGUAGE TRANSFER 6
non-musicians. English stress is chosen not only because of the formerly established non-native
musical advantage but also because English stress sensitivity contributes to reading
comprehension among English children (Holliman, Wood, & Sheehy, 2010; 2012; Kolinsky et
al., 2009).
Another way to extend the OPERA hypothesis is to examine the selectivity of musical
advantage. In a recent study, English musicians and non-musicians completed a Cantonese tone
discrimination task and a Cantonese tone sequence recall task (Choi, 2020). In both tasks, the
musicians outperformed the non-musicians only in half of the tonal contexts. This reflected that
musical experience only facilitated the perception of certain Cantonese tones. English stress
contains trochaic and iambic stress patterns (Ladd, 2008). In a trochaic stress pattern, a stressed
syllable precedes an unstressed one (e.g., CAmel) and vice-versa for an iambic stress pattern
(e.g., caNAL). Relative to the trochaic stress pattern, the iambic stress pattern is less common
and acquired later by English infants (Cutler, 2014; Jusczyk, Cutler, & Redanz, 1993). Thus, it is
possible that the musical advantage is more pronounced for iambic than trochaic stress patterns.
The current study tests this hypothesis.
A further goal of this study is to explore the means by which musical experience
facilitates native stress discrimination. One possibility is that musical experience alters listeners’
choice of acoustic cues, for which support is drawn from a tone perception study (Choi, 2020). In
the high-rising tone context for which musical advantage was shown, English musicians and
non-musicians attended to different acoustic cues (i.e., f0 contour and f0 onset, respectively).
However, in the low-rising tone context for which musical advantage was absent, the two groups
attended to the same acoustic cues (i.e., F0 contour). As mentioned above, stress is signalled by
f0, duration, and intensity (Choi et al., 2017; 2019; Fry, 1958; Wang, 2008; Yu & Andruski,
MUSIC-TO-LANGUAGE TRANSFER 7
2010). It is possible that English musicians and non-musicians attend to different acoustic cues
for stress discrimination. It is also possible that musicians and non-musicians attend to the same
acoustic cues but with different relative weights assigned to each. Drawing on parallels with
cross-linguistic research, Russian and English listeners attended to the same set of acoustic cues
for English stress perception (Chrabaszcz, Winn, Lin, & Idsardi, 2014). However, the Russian
and English listeners showed different weighting patterns among f0, duration, and intensity cues:
the f0 cue was weighted most heavily by the English listeners (f0 > intensity > duration) but least
heavily by the Russian listeners (intensity > duration > f0). Based on the above findings, it is
possible that musical experience drives listeners to rely on a different set of acoustic cues or to
rely differently on the same set of acoustic cues for English stress discrimination.
The main theme of this study is music-to-language transfer. In the literature, correlational
and intervention designs have been frequently adopted. Correlational studies compare musicians
and non-musicians on variables of interest, such as lexical tone sensitivity (e.g., Alexander et al.,
2005; Choi, 2020; Kolinsky et al., 2009; Zheng & Samuel, 2018). As the groups are pre-defined,
the correlational design guarantees that musicians have many years of musical experience. This
is particularly useful for studying cross-domain transfer, as long-term musical experience
induces more prominent plastic changes than does short-term musical experience (Patel, 2011;
2014). However, the standard caveat of correlational design is weak causal inference (see
Corrigall, Schellenberg, & Misura, 2013; Schellenberg, 2015). Intervention studies typically
involve two or three groups, each of which receives music training, music-irrelevant training, or
no training (e.g., Moreno, Marques, Santos, Santos, Castro, & Besson, 2009; Nan et al., 2018).
Clearly, this design permits a stronger causal inference. Nevertheless, laboratory training only
lasts for weeks or months so this design reduces the possibility of studying the long-term effect
MUSIC-TO-LANGUAGE TRANSFER 8
of musical experience. Correlational and intervention designs have their own merits and
limitations, which makes both types of research necessary. As long-term musical experience is
crucial for music-to-language transfer, the current study adopts a correlational design as a first
step.
The overarching goal of this study is to investigate (1) whether English musicians exhibit
a perceptual advantage in English stress discrimination. To elucidate the potential musical
advantage, this study further examines (2) whether the musical advantage is selective about
stress patterns, and (3) the means by which musical experience enhances English stress
discrimination. Given the possible influence of non-verbal intelligence and short-term memory
on English stress perception, these two constructs were controlled (Choi et al., 2019; see also
Asaridou, Hagoort, & McQueen, 2015; Bidelman, Hutka, & Moreno, 2013; Hutka, Bidelman, &
Moreno, 2015). To this end, participants were also tested on non-verbal intelligence and short-
term memory. To minimise testing time, I adopted two tasks that could provide quick and
reliable estimates of the above constructs among English listeners (Choi, 2020; Choi et al., 2019;
Zheng & Samuel, 2018).
Methods
Participants
Forty native English listeners were recruited at University College London through an
online participant recruitment system. Based on the criteria adopted in previous studies (Choi,
2020; 2021b; Tong et al., 2018), the listeners were assigned to the musician (n = 20) and non-
musician (n = 20) groups. All musicians had received at least seven years of continuous music
training and were able to play their instruments at the time of testing. All non-musicians had
MUSIC-TO-LANGUAGE TRANSFER 9
received no more than two years of music training, if any. None of them had received any music
training in the recent five years and were unable to play any musical instrument at the time of
testing. Two non-musicians and one musician were excluded from the study due to no-show,
excessive music training (non-musician), and Mandarin learning experience. Thus, there were 19
musicians (5 male, 14 female; Mage = 26.63 years, SD = 5.89 years) and 18 non-musicians (8
male, 10 female; Mage = 32.67 years, SD = 11.60 years) in the final sample.
Table 1 summarises the musical experience of the musicians. On average, the musicians
had received 11.63 years of music training (SD = 3.90 years) with a mean onset age of 7.84 years
(SD = 2.89 years). The non-musicians had received 0.90 year of music training (SD = 1.56
years). For the non-musicians who had received music training, their mean onset age of music
training was 12.00 years (SD = 4.86 years). None of the participants in the study reported having
absolute pitch.
English Stress Discrimination Task
Stimuli. Four pairs of real English words, /ˈpɚmɪt - pɚˈmɪt/, /ˈsəspekt - səsˈpekt/, /ˈɪnsɚt
- ɪnˈsɚt/, and /ˈimpɔrt - imˈpɔrt/ (permit, suspect, insert, import) were recorded at a sampling rate
of 48 kHz. All stimuli were naturally produced by two native English speakers (one male and
one female). The recording was made in a sound-shielded booth.
Material Presentation. An AX paradigm was adopted. In each trial, two real words were
audibly presented via Sennheiser HD280 PRO headphones. The inter-stimulus interval was 600
ms. The two real words either carried the same (e.g., /ˈɪnsɚt - ˈɪnsɚt/) or different stress (e.g.,
/ˈɪnsɚt - ɪnˈsɚt/). To prevent the listeners from adopting an ad-hoc acoustic strategy, the two real
MUSIC-TO-LANGUAGE TRANSFER 10
words in each trial were produced by speakers of different genders. The voice order was random
within each trial.
Procedure. Listeners were asked to judge, as quickly as possible, whether the two real
words carried the same stress. They responded by pushing keyboard buttons ([f] for same, [j] for
different). The accuracy and response time were recorded for each trial. Prior to the experimental
trials, six practice trials with feedback were run. There were 96 trials (8 stimuli × 2 speaker
orders × 2 trial types × 3 repetitions). A sensitivity index (d’) was obtained based on the hits and
false alarms for the same and different trials (see Figure 1). The sample-specific reliabilities were
high (αmusicians = .87, αnon-musicians = .90). This task has also been used successfully to assess
English stress discrimination among English listeners in a previous study (Choi et al., 2019).
Non-verbal Intelligence Task
This task consisted of 14 multiple-choice questions, all of which required participants to
organise pictures by a logical sequence under time pressure. In each trial, participants were given
30 seconds to choose the picture that best completed the visual pattern described in the question.
One point was awarded for each correct answer. This task has been used successfully in previous
studies to assess English listeners’ non-verbal intelligence (Choi, 2020; Choi et al., 2019; Zheng
& Samuel, 2018). The sample-specific reliabilities were moderate to high (αmusicians = .54, αnon-
musicians = .79).
Short-term Memory Task
This computerised task consisted of a plate displayed at the centre of a touchscreen. The
plate contained four coloured (red, green, blue, and yellow) wedges. On each trial, a sequence of
colours (e.g., yellow-blue-red) was presented. Following the presentation, the participants were
MUSIC-TO-LANGUAGE TRANSFER 11
required to reproduce the colour sequence by tapping the corresponding wedges. One point was
awarded for each correctly reproduced sequence. The sequence length started at one and
increased by one after each correct response. The score started at zero and increased by one
following each correct response. For example, a participant who correctly reproduced up to eight
sequences would score eight in that round. Each participant completed five rounds, from which
the median score was obtained. This task has also been used successfully in previous studies to
assess English listeners’ short-term memory (Choi, 2020; Choi et al., 2019; Zheng & Samuel,
2018). As in these previous studies, the sound was turned off so that the measure was
independent of auditory short-term memory. The sample-specific reliabilities were satisfactory to
high (αmusicians = .65, αnon-musicians = .81).
Results
Musical Advantage in Stress Discrimination
To investigate whether the musicians exhibited a perceptual advantage in English stress
discrimination, a one-way analysis of covariance (ANCOVA) was conducted on d’ with group
(musician and non-musician) as the independent variable. Age, non-verbal intelligence, and
short-term memory were controlled (see Table 2; see also Appendix I). As expected, the
ANCOVA revealed a significant group difference, F(1, 32) = 9.62, p < .01, η2 = .23, in which the
musicians discriminated English stress more accurately than did the non-musicians (see Figure
2). Correlational analyses further showed that d’ correlated significantly with years of music
training, r(35) = .37, p < .05, but not with onset age of music training, p = .398. This suggests
that for English stress discrimination, the amount of music training received matters more than
the age at which music training started.
MUSIC-TO-LANGUAGE TRANSFER 12
Selectivity of Musical Advantage
To evaluate the selectivity of the musical advantage, a two-way mixed ANCOVA was
conducted on hit rate with stress type (iambic and trochaic) as the within-subject factor and
group (musician and non-musician) as the between-subjects factor. Age, non-verbal intelligence,
and short-term memory were also controlled. The ANCOVA revealed a significant main effect
of group, F(1, 32) = 5.94, p < .05, η2 = .16 (see Figure 3). However, the main effect of stress
type, p = .171, and the interaction between stress type and group, p = .822, were non-significant.
Consistent with the earlier analysis, a clear musical advantage was found. Remarkably, this
musical advantage was highly consistent and did not particularly favour either stress pattern.
In terms of response time, the two-way mixed ANCOVA showed non-significant main
effects of group, p = .822, and stress type, p = .789. Their interaction effect was also non-
significant, p = .607. An analysis of the mean response time across all 96 trials yielded consistent
results (see Appendix II). The lack of a group difference in response time testifies against a
speed–accuracy trade-off: the greater accuracy of the musicians over the non-musicians was not
because they had taken longer to respond.
Use of Acoustic Cues by Musicians and Non-musicians
All stimuli were analysed acoustically with Praat 6.0.50 (Institute of Phonetic Sciences,
University of Amsterdam, the Netherlands), yielding the set of acoustic parameters summarised
in Table 3 (see also Appendix III). For each stimulus, the f0, durational, and intensity ratios of
the first to second syllables were obtained (see Table 4).
To explore the use of acoustic cues by musicians and non-musicians, an acoustic–
behavioural correlational analysis was conducted. All different trials (N = 48) were extracted
MUSIC-TO-LANGUAGE TRANSFER 13
from the dataset. Each different trial, as a single entry, contained eight variables: (1) the f0 ratio,
(2) durational ratio, and (3) intensity ratio of the trochaic stress stimulus; (4) the f0 ratio, (5)
durational ratio, and (6) intensity ratio of the iambic stress stimulus; (7) the trial-specific mean
accuracy of the musician group; and (8) the trial-specific mean accuracy of the non-musician
group.
Of interest to the study was whether the acoustic parameters (1–6) correlated with the
behavioural accuracies among the musicians and non-musicians (see Table 5). The mean
accuracies of the musicians correlated significantly with the f0 (r = .24, p < .05), durational (r =
-.38, p < .01), and intensity (r = .29, p < .05) ratios of the iambic stress stimuli. For the trochaic
stress stimuli, the mean accuracies of musicians correlated significantly with the durational ratio
(r = -.27, p < .05), but not with the f0 (p = .459) and intensity (r = .23, p = .062) ratios.
The mean accuracies of the non-musicians correlated significantly with the f0 (r = .46, p
< .01), durational (r = -.42, p < .01), and intensity (r = .28, p < .05) ratios of the iambic stress
stimuli. For the trochaic stress stimuli, the mean accuracies of non-musicians correlated
significantly with the durational ratio (r = -.29, p < .05), but not with the f0 (p = .361) and
intensity (r = .23, p = .055) ratios.
Taken together, both groups’ discriminatory abilities were related to (a) the degree of f0,
durational, and intensity variations among the iambic stress stimuli, and (b) the degree of
durational variations among the trochaic stress stimuli. For the trochaic stress stimuli, the
musicians and non-musicians attended mostly to duration. However, for the iambic stress
stimuli, the musicians attended mostly to duration whereas the non-musicians attended mostly to
f0 (see Table 5).
MUSIC-TO-LANGUAGE TRANSFER 14
Discussion
This study endeavoured to investigate (1) whether English musicians exhibit a perceptual
advantage in English stress discrimination, (2) whether the musical advantage is selective about
stress patterns, and (3) the means by which musical experience facilitates English stress
discrimination.
The core result was the presence of a musical advantage in English stress discrimination
among the English listeners. This fits the OPERA hypothesis well. In terms of precision, music
entails more precise metrical processing than speech. Provided that the other four conditions
(Overlap, Emotion, Repetition, and Attention) were met, musical experience enhanced the
English musicians’ sensitivity to English stress. As mentioned above, the OPERA hypothesis has
been widely applied to account for a musical advantage in non-native speech perception (e.g.,
Alexander et al., 2005; Choi, 2020; Kolinsky et al., 2009; Zheng & Samuel, 2018). Consistent
with the previous studies on native consonantal and metrical discrimination, the present result
supports the theoretical view that music-to-language transfer could also occur given relevant
linguistic experience (Marie et al., 2010; Parbery-Clark et al., 2012). Thus, it stands to reason
that the OPERA hypothesis applies to both non-native and native listeners. From a practical
perspective, this theoretical view points towards the potential use of music training to aid native
speech perception (e.g., Moreno et al., 2009; Vidal, Lousada, & Vigário, 2020). For example,
piano training enhanced Mandarin children’s behavioural sensitivities to Mandarin vowels and
neural sensitivities to Mandarin tones (Nan et al., 2018). For English children, English stress
sensitivity is essential for literacy development and poor readers often show deficits in stress
sensitivity (Holliman et al., 2010; 2012). Thus, with the musical advantage in English stress
perception now established, it is important to determine whether music training can improve
MUSIC-TO-LANGUAGE TRANSFER 15
English children’s stress perception. Interestingly, English stress sensitivity also contributes to
second language English literacy development among Cantonese children (Choi, Tong, & Cain,
2016; Choi, Tong, & Deacon, 2018). As such, it is also worthwhile to investigate whether
Cantonese musicians show a perceptual advantage in English stress perception.
Remarkably, the musical advantage identified herein was highly consistent across stress
patterns. It was originally believed that musical experience would exert differential effects on
trochaic and iambic stress perception. However, the results clearly showed that musical
experience did not particularly favour either stress pattern. This is in contrast to the recent
finding of a study on English listeners that musical advantage was selective about Cantonese
tones (Choi, 2020). These discrepancies may stem from acoustic or even linguistic differences.
Although Cantonese tones and English stress share f0 as a common acoustic cue, Cantonese
tones are signalled by f0 in a more fine-grained manner (Choi et al., 2019). Whereas Cantonese
tones are largely f0 variations, English stress has other acoustic cues, such as duration and
intensity (Choi, Tong, Gu, Tong, & Wong, 2017; Gandour, 1981; Ladd, 2008; Wang, 2008).
Conceivably, the differences in terms of the selectivity of musical advantage across Cantonese
tone and English stress perception might be due to the acoustic differences between the two
features. Hypothetically, the discrepancies might also arise from linguistic experience: Cantonese
tones were non-native to the musicians but English stress was native to them. As unlikely as this
may seem, future studies that include non-native listeners of English stress are needed to falsify
this hypothetical account.
Acoustically, the musicians appear to have adopted a different perceptual strategy for
iambic stress. Although the musicians and non-musicians attended to the same acoustic cues,
they relied on these acoustic cues differently. Specifically, non-musicians attended most heavily
MUSIC-TO-LANGUAGE TRANSFER 16
to f0 whereas musicians attended most heavily to duration. This is somewhat reminiscent of a
recent finding that, unlike non-musicians who attended to a less effective cue (f0 onset),
musicians attended to a more effective cue (f0 contour) for high-level tone perception (Choi,
2020). Indeed, for the iambic stress stimuli, stressed and unstressed syllables did not differ
significantly in f0, making it a less effective cue than duration and intensity. Considering the
OPERA hypothesis in the current context, it is possible that musical experience had orientated
the musicians to attend more heavily to more effective cues (duration and intensity), thereby
facilitating iambic stress perception. Future studies can further validate these findings by testing
stress perception across different acoustic conditions, e.g., f0-only, duration-only, and intensity-
only (Choi et al., 2019).
By contrast, the musicians and non-musicians adopted the same perceptual strategy for
trochaic stress. Specifically, the acoustic–behavioural correlation analysis of trochaic stress
implied that the two groups attended mainly, if not only, to durational cues.
Neurophysiologically, changes in speech temporal structure elicited the P200 response among
musicians but not non-musicians (Marie et al., 2010). This suggests that musicians have stronger
automatic detection of syllable temporal structure than non-musicians. Based on the literature, it
is believed that long-term musical experience in discerning metrical structures sharpened the
English musicians’ sensitivity to duration (e.g., Skoe & Kraus, 2013). Thus, for trochaic stress,
one plausible explanation for the musical advantage is enhanced sensitivity to duration. This
proposed mechanism is also consistent with the OPERA hypothesis (Patel, 2011; 2014).
Collectively, musical advantage may stem from the differential use of acoustic cues (iambic
stress) and enhanced durational sensitivity (trochaic stress).
MUSIC-TO-LANGUAGE TRANSFER 17
In terms of the study’s theoretical contribution, the current findings have several
implications for the OPERA hypothesis. Most importantly, the musical advantage in native stress
discrimination converges with studies on native consonantal and metrical discrimination (Marie
et al., 2010; Parbery-Clark et al., 2012). Taken together, these findings suggest that the OPERA
hypothesis also applies to native speech perception. In contrast to a previous finding on the
selectivity of musical advantage, the musical advantage identified herein was highly consistent
(Choi, 2020). Crucially, the current study further adds that musical advantage is not necessarily
selective, and the OPERA hypothesis can be revised to account for this. Although the OPERA
hypothesis argues that musical experience increases neuronal sensitivity to speech, the present
and previous studies further suggest that musical experience may also alter listeners’ perceptual
strategy (Choi, 2020; see Patel, 2011). This points to a need for the OPERA hypothesis to
incorporate new elements on how musical experience orients musicians to different acoustic
cues.
In terms of the methodological contribution, this study has demonstrated that stress is a
potent feature for investigating cross-domain transfer. As mentioned in the Introduction, most
studies on cross-domain transfer have focused on lexical tone perception, presumably due to its
sharing of an acoustic cue (f0) with musical pitch (e.g., Choi, 2020; Cooper & Wang, 2012;
Marie et al., 2011; Zheng & Samuel, 2018). Crucially, stress has three acoustic correlates – f0,
duration, and intensity – that are used intensively for discerning metrical structures in music
(Henrich et al., 2014; Palmer & Kelly, 1992; Patel, 2003; Lerdahl, 2001). Indeed, the current
study has identified linkages between stress discrimination and musical experience, highlighting
a more potent candidate for studying music-to-language transfer and even language-to-music
transfer. In the latter direction, one interesting question is whether English non-musicians discern
MUSIC-TO-LANGUAGE TRANSFER 18
metrical structures more accurately than do French non-musicians, given the presence of stress in
English.
Readers are cautioned that this study is correlational. Like the preponderance of studies
on music-to-language transfer, this study cannot rule out the possibility of gene–environment
interaction (e.g., Alexander et al., 2005; Choi, 2020; Cooper & Wang, 2012; Kolinsky et al.,
2009; Marie et al., 2011; Zheng & Samuel, 2018). Schellenberg (2015) argued that cognitive
abilities and socioeconomic status determine the likelihood of a child receiving music training.
More specifically, Corrigall and colleagues (2013) reasoned that high-functioning children from
high socioeconomic status families were more likely to take music lessons than other children.
As such, music training might only exaggerate pre-existing differences between musicians and
non–musicians. This is contrary to a widely adopted premise that musicians and non-musicians
do not differ systematically prior to musical experience (e.g., Francois & Schön, 2011; Fitzroy &
Sanders, 2013; Shook, Marian, Bartolotti, & Schroeder, 2013). Despite the English musicians
and non-musicians matching on non-verbal intelligence and short-term memory, they may still
have differed in other respects, such as learning motivation and personality traits, some of which
are difficult to control for. Intriguingly, there are numerous reports that musicians exhibit a
memory advantage (George & Coch, 2011; Roden, Grube, Bongard, & Kreutz, 2014; Schulze,
Dowling, & Tillmann, 2012). It might be that the English musicians in the current sample did not
possess this advantage; it is also be possible that their cognitive difference was not captured by
the tasks. Ideally, each cognitive construct should have been measured with multiple tasks.
In conclusion, the present study has identified a musical advantage in native stress
discrimination. This finding adds to the body of evidence that musical experience facilitates
native speech perception, in turn suggesting that the OPERA hypothesis also applies to native
MUSIC-TO-LANGUAGE TRANSFER 19
listeners (Marie et al., 2010; Parbery-Clark et al., 2012). The musical advantage identified herein
was highly consistent, suggesting that musical advantage is not necessarily selective. The present
results also imply that part of the musical advantage might arise from the differential use of
acoustic cues by the musicians. Despite the standard caveats of correlational studies, the current
study presents theoretically and practically significant findings that I believe will withstand
scrutiny by future intervention studies.
Acknowledgement
I wish to thank Mairéad MacSweeney for her dedicated support. I also appreciate Arthur
Samuel for recording the stimuli and sharing the control tasks. This research was supported by
the Croucher Postdoctoral Fellowship from the Croucher Foundation to William Choi. It was
also supported by the Start-up Research Fund from The University of Hong Kong to William
Choi.
MUSIC-TO-LANGUAGE TRANSFER 20
References
Aguilera, M., El Yagoubi, R., Espesser, R., & Astésano, C. (2014). Event-related potential
investigation of initial accent processing in French. Proceedings of Speech Prosody, 383–
387.
Asaridou, S. S., Hagoort, P., & McQueen, J. M. (2015). Effects of early bilingual experience
with a tone and a non-tone language on speech–music integration. PLoS ONE, 10(12),
e0144225.
Alexander, J., Wong, P. C. M., & Bradlow, A. R. (2005). Lexical tone perception in
musicians and nonmusicians. Paper presented in INTERSPEECH 2005 – Eurospeech,
9th European Conference on Speech Communication and Technology, Lisbon,
Portugal, September 4–8, 2005.
Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone language speakers and
musicians shared enhanced perceptual and cognitive abilities for musical pitch: Evidence
for bidirectionality between the domains of language and music. PLoS ONE, 8(4),
e60676.
Choi, W. (2020). The selectivity of musical advantage: Musicians exhibit perceptual
advantage for some but not all Cantonese tones. Music Perception, 37(5), 423–434.
Choi, W. (2021a). Cantonese advantage on English stress perception: Constraints and neural
underpinnings. Neuropsychologia, 158, 107888.
Choi, W. (2021b). Musicianship influences language effect on musical pitch perception.
Frontiers in Psychology, 12, 712753.
MUSIC-TO-LANGUAGE TRANSFER 21
Choi, W., Tong, X., & Cain, K. (2016). Lexical prosody beyond first-language boundary:
Chinese lexical tone sensitivity predicts English reading comprehension. Journal of
Experimental Child Psychology, 148, 70–86.
Choi, W., Tong, X., & Deacon, H. (2017). Double dissociations in reading comprehension
difficulties among Chinese–English bilinguals and their association with tone awareness.
Journal of Research in Reading, 40(2), 184–198.
Choi, W., Tong, X., Gu, F., Tong, X., & Wong, L. (2017). On the early neural perceptual
integrality of tones and vowels. Journal of Neurolinguistics, 41, 11–23.
Choi, W., Tong, X., & Samuel, A. G. (2019). Better than native: Tone language
experience enhances English lexical stress discrimination in Cantonese–English
bilingual listeners. Cognition, 189, 188–192.
Choi, W., Tong, X., & Singh, L. (2017). From lexical tone to lexical stress: A cross-language
mediation model for Cantonese children learning English as a second language.
Frontiers in Psychology, 8, 492.
Chrabaszcz, A., Winn, M., Lin, C. Y., & Idsardi, W. J. (2014). Acoustic cues to
perception of word stress by English, Mandarin, and Russian speakers. Journal of
Speech, Language and Hearing Research, 57, 1468–1479.
Cooper, A., & Wang, Y. (2012). The influence of linguistic and musical experience on
Cantonese word learning. Journal of the Acoustical Society of America, 131(6), 4756–
4768.
Corrigall, K. A., Schellenberg, E. G., Misura, N. M. (2013). Music training, cognition, and
MUSIC-TO-LANGUAGE TRANSFER 22
personality. Frontiers in Psychology, 4, 222.
Cutler, A. (2014). Native Listening: Language Experience and the Recognition of Spoken
Words. Cambridge MA: MIT Press.
Dupoux, E., Pallier, C., Sebastian, N., & Mehler, J. (1997). A distressing “deafness” in French?
Journal of Memory and Language, 36, 406–421.
Dupoux, E., Peperkamp, S., & Sebastian-Galles, N. (2010). Limits on bilingualism
revisited: Stress ‘deafness’ in simultaneous French–Spanish bilinguals. Cognition, 114,
266–275.
Fitzroy, A. B., & Sanders, L. D. (2013). Musical expertise modulates early processing of
syntactic violations in language. Frontiers in Psychology, 3, e603.
Francois, C., & Schön, D. (2011). Musical expertise boosts implicit learning of both musical and
linguistic structures. Cerebral Cortex, 21(10), 2357–2365.
Fry, D. B. (1958). Experiments in the perception of stress. Language and Speech, 1,
205–213.
Gandour, J. (1981). Perceptual dimensions of tone: Evidence from Cantonese. Journal of
Chinese Linguistics, 9, 20–36.
Garde, O. (1968). L’accent. Paris: Presses Universitaires de France.
George, E. M., & Coch, D. (2011). Music training and working memory: An ERP study.
Neuropsychologia, 49(5), 1083–1094.
Henrich, K., Alter, K., Wiese, R., & Domahs, U. (2014). The relevance of rhythmical alternation
MUSIC-TO-LANGUAGE TRANSFER 23
in language processing: An ERP study on English compounds. Brain and Language, 136,
19–30.
Holliman, A. J., Wood, C., & Sheehy, K. (2010). The contribution of sensitivity to speech
rhythm and non-speech rhythm to early reading development. Educational Psychology,
30(3), 247–267.
Holliman, A. J., Wood, C., & Sheehy, K. (2012). A cross-sectional study of prosodic sensitivity
and reading difficulties. Journal of Research in Reading, 35(1), 32–48.
Hutka, S., Bidelman, G. M., & Moreno, S. (2015). Pitch expertise is not created equal: Cross-
domain effects of musicianship and tone language experience on neural and behavioural
discrimination of speech and music. Neuropsychologia, 71, 52–63.
Jusczyk, P. W., Cutler, A., & Redanz, N. J. (1993). Infants’ preference for the predominant stress
patterns of English words. Child Development, 64(3), 675–687.
Kolinsky, R., Cuvelier, H., Goetry, V., Peretz, I., & Morais, J. (2009). Music training facilitates
lexical stress processing. Music Perception, 26(3), 235–246.
Kraus, N., & Chandrasekaran, B. (2010). Music training for developmental auditory skills.
Nature Reviews Neuroscience, 11(8), 599–605.
Ladd, D. R. (2008). Intonational Phonology. Cambridge: Cambridge University Press.
Lerdahl, F. (2001). Tonal Pitch Space. Oxford and New York: Oxford University Press.
Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of
musical expertise on segmental and tonal processing in Mandarin Chinese. Journal of
Cognitive Neuroscience, 23(10), 2701–2715.
MUSIC-TO-LANGUAGE TRANSFER 24
Marie, C., Magne, C., & Besson, M. (2010). Musicians and the metric structure of words.
Journal of Cognitive Neuroscience, 23(2), 294–305.
Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical
training influences linguistic abilities in 8-year-old children: More evidence for brain
plasticity. Cerebral Cortex, 19(3), 712–723.
Nan, Y., Liu, L., Geiser, E., Shu, H., Gong, C. C., Dong, Q., Gabrieli, J. D. E., & Desimone, R.
(2018). Piano training enhances the neural processing of pitch and improves speech
perception in Mandarin-speaking children. Proceedings of the National Academy of
Sciences, 115(28), 6630–6639.
Palmer, C., & Kelly, M. H. (1992). Linguistic prosody and musical meter in song. Journal of
Memory and Language, 31(4), 525–542.
Parbery-Clark, A., Tierney, A., Strait, D. L., & Kraus, N. (2012). Musicians have fine-tuned
neural distinction of speech syllables. Neuroscience, 219, 111–119.
Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674–681.
Patel, A. D. (2011). Why would musical training benefit the neural encoding of
speech? The OPERA hypothesis. Frontiers in Psychology, 2, 142.
Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes
speech? The expanded OPERA hypothesis. Hearing Research, 308, 98–108.
Patel, A. D., & Daniele, J. R. (2003a). An empirical comparison of rhythm in language and
music. Cognition, 87, B35–B45.
Patel, A. D., & Daniele, J. R. (2003b). Stress-timed vs. syllable-timed music? A comment on
MUSIC-TO-LANGUAGE TRANSFER 25
Huron and Ollen (2003). Music Perception, 21, 273–276.
Roden, I., Grube, D., Bongard, S., & Kreutz, G. (2014). Does music training enhance working
memory performance? Findings from a quasi-experimental longitudinal study.
Psychology of Music, 42(2), 284–298.
Schellenberg, E. G. (2015). Music training and speech perception: A gene-environment
interaction. Annals of the New York Academy of Science, 1337, 170–177.
Schulze, K., Dowling, W. J., & Tillmann, B. (2012). Working memory for tonal and atonal
sequences during a forward and a backward recognition task. Music Perception, 29(3),
255–267.
Shook, A., Marian, V., Bartolotti, J., & Schroeder, S. R. (2013). Musical experience influences
statistical learning of a novel language. American Journal of Psychology, 126(1), 95–104.
Skoe, E., & Kraus, N. (2013). Musical training heightens auditory brainstem function during
sensitive periods in development. Frontiers in Psychology, 4, e622.
Teschner, R. V., & Whitley, S. M. (2004). Pronouncing English: A Stress-Based
Approach with CD-ROM. Washington DC: Georgetown University Press.
Tong, X., Choi, W., & Man, Y. Y. (2018). Tone language experience modulates the
effect of long-term musical training on musical pitch perception. Journal of the
Acoustical Society of America, 144(2), 690–697.
Toussaint, G. T. (2005). The geometry of musical rhythm. In M. Kano & X. Tan (Eds).
Proceedings of the Japan Conference on Discrete and Computational Geometry, 3742,
198–212.
MUSIC-TO-LANGUAGE TRANSFER 26
Vidal, M. M., Lousada, M., & Vigário, M. (2020). Music effects on phonological awareness
development in 3-year-old children. Applied Psycholinguistics, 41(2), 299–318.
Wang, Q. (2008). Perception of English stress by Mandarin Chinese learners of
English: An acoustic study (Unpublished doctoral dissertation). British Columbia:
University of Victoria.
Yu, V. Y., & Andruski, J. E. (2010). A cross-language study of perception of lexical
stress in English. Journal of Psycholinguistic Research, 39, 323–344.
Zheng, Y., & Samuel, A. G. (2018). The effects of ethnicity, musicianship, and tone
language experience on pitch perception. Quarterly Journal of Experimental
Psychology, 71(12), 2627–2642.
MUSIC-TO-LANGUAGE TRANSFER 27
Table 1. Musical experience of the musicians
Participant
Onset age
(years)
Amount of music
training (years)
First
instrument
Second
instrument
Third
instrument
M1
7
11
Piano
Oboe
-
M2
12
11
Piano
-
-
M3
7
11
Piano
Guitar
-
M4
5
10
Guitar
Keyboard
-
M5
9
12
Piano
Guitar
Bass
M6
13
10
Drums
Guitar
-
M7
9
8
Piano
-
-
M8
14
8
Piano
Bass
-
M9
6
10
Piano
Flute
-
M10
8
10
Flute
-
-
M11
11
9
Piano
Ukulele
-
M12
6
18
Piano
Violin
-
M13
6
20
Clarinet
-
-
M14
7
7
Piano
-
-
M15
4
20
Piano
-
-
M16
9
10
Clarinet
-
-
M17
6
10
Piano
Guitar
Ukulele
M18
5
16
Violin
-
-
M19
5
10
Piano
-
-
MUSIC-TO-LANGUAGE TRANSFER 28
Table 2. Comparison of age, non-verbal intelligence, and short-term memory between English
musicians and non-musicians.
Variable
Musicians
Non-musicians
Group difference (p value)
Chronological age in years (SD)
26.63 (5.89)
32.67 (11.60)
.052
Non-verbal intelligence (SD)
9.68 (2.21)
8.83 (3.29)
.360
Short-term memory (SD)
7.84 (2.39)
7.83 (3.76)
.993
Note. The maximum possible value of non-verbal intelligence is 14. There are no maximum
possible values for age and short-term memory.
MUSIC-TO-LANGUAGE TRANSFER 29
Table 3. Fundamental frequency, duration, and intensity values of the stimuli.
Stimuli
First syllable
Second syllable
F0
(Hz)
Duration
(ms)
Intensity
(dB)
F0
(Hz)
Duration
(ms)
Intensity
(dB)
Male
ˈpɚmɪt
168
155
69
188
445
58
pɚˈmɪt
124
151
67
127
449
62
ˈsəspekt
218
305
69
483
295
59
səsˈpekt
127
300
67
157
300
65
ˈɪnsɚt
168
124
70
160
476
63
ɪnˈsɚt
117
139
67
158
461
66
ˈimpɔrt
163
177
60
103
423
67
imˈpɔrt
110
224
58
123
376
68
Female
ˈpɚmɪt
119
237
70
209
363
59
pɚˈmɪt
228
209
69
261
391
64
ˈsəspekt
130
332
68
269
268
62
səsˈpekt
219
315
62
245
285
69
ˈɪnsɚt
240
239
69
209
361
60
ɪnˈsɚt
218
204
65
311
396
67
ˈimpɔrt
262
290
68
205
310
62
imˈpɔrt
222
250
63
239
350
68
Note. All values are rounded to the nearest integer.
MUSIC-TO-LANGUAGE TRANSFER 30
Table 4. Fundamental frequency, durational, and intensity ratios of the first to second syllables
of the stimuli.
Stimuli
First-to-second syllable
(Male)
First-to-second syllable
(Female)
F0 ratio
Duration
ratio
Intensity
ratio
F0 ratio
Duration
ratio
Intensity
ratio
ˈpɚmɪt
0.89
0.35
1.21
0.57
0.65
1.19
pɚˈmɪt
0.97
0.34
1.09
0.87
0.53
1.08
ˈsəspekt
0.45
1.03
1.17
0.48
1.24
1.10
səsˈpekt
0.81
1.00
1.03
0.89
1.11
0.89
ˈɪnsɚt
1.05
0.26
1.12
1.15
0.66
1.17
ɪnˈsɚt
0.74
0.30
1.01
0.70
0.52
0.97
ˈimpɔrt
1.58
0.42
0.90
1.28
0.94
1.09
imˈpɔrt
0.89
0.60
0.86
0.93
0.71
0.93
Note. All values are rounded to the nearest two decimal places.
MUSIC-TO-LANGUAGE TRANSFER 31
Table 5. Correlations between the F0, duration, and intensity ratios and the trial-specific mean
accuracies of musicians and non-musicians.
Musicians’ accuracy
Non-musicians’ accuracy
Iambic stress stimuli
F0 ratio
.24*
.46**
Duration ratio
-.38**
-.42**
Intensity ratio
.29*
.28*
Trochaic stress stimuli
F0 ratio
ns
ns
Duration ratio
-.27*
-.29*
Intensity ratio
.23†
.23‡
Note. All values are rounded to the nearest two decimal places. ** p < .01; * p < .05; † p = .062; ‡ p
= .055.
MUSIC-TO-LANGUAGE TRANSFER 32
Figure 1. The hit and false alarm rates of musicians and non-musicians in the English stress
discrimination task. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 33
Figure 2. The mean sensitivity index of musicians and non-musicians in the English stress
discrimination task. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 34
Figure 3. The mean hit rate and response time of musicians and non-musicians for trochaic and
iambic stress. Errors bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 35
Appendix I
Analysis of Age and Cognitive Profiles
Correlational analyses showed significant correlations between age and non-verbal
intelligence, r = -.39, p < .05, age and short-term memory, r = -.43, p < .01, and non-verbal
intelligence and short-term memory, r = .46, p < .01. Thus, multivariate analysis of variance
(MANOVA) was conducted to examine the potential group differences. MANOVA showed non-
significant main effect of group, p = .181, implying that the groups matched on these variables.
To be empirically stringent, independent sample t-tests were conducted as MANOVA has
a weak power for detecting differences. Consistent with MANOVA, both groups did not differ
significantly in non-verbal intelligence, t(35) = .93, p = .360, and short-term memory, t(35)
= .01, p = .993. However, the nonmusicians were marginally older than the musicians, t(35) = -
2.01, p = .052, d = .66. As the perceptual differences between the musicians and nonmusicians
were only meaningful if they remained evident when age and general cognitive abilities were
held constant, these three variables were controlled in the main analysis.
MUSIC-TO-LANGUAGE TRANSFER 36
Appendix II
Analysis of Response Time Across All Trials
Figure S1 shows the mean response time (collapsed across 96 trials) of musicians and
nonmusicians. To examine whether musicians and nonmusicians differed in the mean response
time across all 96 trials, one-way ANCOVA was conducted on mean response time with group
(musicians and nonmusicians) being the independent variable. Age, non-verbal intelligence, and
short-term memory were controlled. ANCOVA showed no significant group difference in mean
response time, p = .837.
Figure S1. The mean response time (collapsed across 96 trials) of musicians and nonmusicians in
the English stress discrimination task. Error bars represent 95% confidence intervals.
MUSIC-TO-LANGUAGE TRANSFER 37
Appendix III
Acoustic Analysis of Gender Differences
As f0, duration, and intensity did not correlate with each other, ps > .05, three sets of
three-way ANOVAs were conducted on each acoustic parameter with stress pattern (iambic and
trochaic), syllable status (stressed or unstressed), and gender (male and female) as the
independent variables.
For f0, three-way ANOVA revealed a significant main effect of gender, F(1, 24) = 6.98,
p < .05, η2 = .23, but not syllable status, p = .922, and stress pattern, p = .411. The interaction
between stress pattern and gender was significant, F(1, 24) = 4.27, p = .05, η2 = .15, but not the
interactions between stress pattern and syllable status, p = .163, and between gender and syllable
status, p = .411. The three-way interaction was also non-significant, p = .687. For the interaction
between stress pattern and gender, pairwise comparisons showed that f0 varied marginally
significantly across trochaics and iambics only for male, p = .051, but not for female, p = .393.
For duration, three-way ANOVA showed non-significant main effects of gender, p =
1.00, stress pattern, p = 1.00, and syllable status, p = .714. The interaction between gender and
syllable status was also non-significant, p = .430. However, the interaction between stress pattern
and syllable status was significant, F(1, 24) = 43.03, p < .001, η2 = .64, so was the three-way
interaction between gender, stress pattern, and syllable status, F(1, 24) = 7.15, p < .05, η2 = .23.
Simple main effect analysis was conducted to unpack the three-way interaction. For male-
produced iambics and trochaics, stressed and unstressed syllables differed significantly in
duration, ps < .01. For female-produced iambics, stressed and unstressed syllables also differed
MUSIC-TO-LANGUAGE TRANSFER 38
significantly in duration, p < .01. However, for female-produced trochaics, stress and unstressed
syllables did not differ significantly in duration, p = .089.
For intensity, three-way ANOVA revealed a significant main effect of syllable status,
F(1, 24) = 4.44, p < .05, η2 = .16, in which stressed syllables had higher intensity than unstressed
syllables. All other main effects and interactions were not significant, ps > .05.