Conference PaperPDF Available

Acoustic properties of different kinds of creaky voice

Authors:

Abstract and Figures

There is not one kind, but instead several kinds, of creaky voice, or creak. There is no single defining property shared by all kinds. Instead, each kind exhibits some properties but not others. Therefore different acoustic measures characterize different kinds of creak. This paper describes how various acoustic measures should pattern for each kind of creak.
Content may be subject to copyright.
Acoustic properties of different kinds of creaky voice
Patricia Keating1, Marc Garellek2, Jody Kreiman3
1Dept. Linguistics, UCLA, Los Angeles CA USA 90095; 2Dept. Linguistics, UCSD, San Diego CA USA 92093,
3Dept. Head & Neck Surgery, UCLA, Los Angeles CA USA 90095
keating@humnet.ucla.edu; mgarellek@ucsd.edu; jkreiman@ucla.edu
ABSTRACT
There is not one kind, but instead several kinds, of
creaky voice, or creak. There is no single defining
property shared by all kinds. Instead, each kind
exhibits some properties but not others. Therefore
different acoustic measures characterize different
kinds of creak. This paper describes how various
acoustic measures should pattern for each kind of
creak.
Keywords: phonation, voice quality, creaky voice
1. INTRODUCTION
The term “creaky voice” (or “creak”, used here
interchangeably) refers to a number of different
kinds of voice production. Early linguistic
descriptions of creak (e.g. Laver [32]) enumerated
many characteristics: low subglottal pressure and
glottal flow, slack, thick, compressed vocal folds
with a short vibrating length, ventricular contact
with the folds, weak or damped pulses, low F0,
irregular F0, period-doubled vibration. Later
descriptions (e.g. [7, 20, 28]) added such properties
as irregular amplitude, low Open Quotient, skewed
glottal pulses, narrow formant bandwidths and sharp
harmonics, abrupt closure of the folds, and low
spectral tilt. Yet it seems clear that these
characteristics are not all seen in each instance of
creak, and that (when the full range of types is
considered) there is no single defining characteristic
shared by all instances.
Indeed, previous studies have argued that there
are specific sub-categories of creak, each with its
own set of characteristics. Hedelin and Huber [24]
distinguished “creak” (or fry” or pulse”, with low
F0 and strong damping), “creaky voice” (with
irregular pulses), and “diplophonia (with period
doubling). Batliner et al. [5] used six acoustic
properties to distinguish five different types of
laryngealization”, but their brief report provides
little discussion. Later, a pair of papers from
presentations at the 1999 ICPhS proposed similar
categories, based primarily on visual inspection of
acoustic displays. First, Gerratt and Kreiman [20]
described two “supraperiodic” types - one “period
doubled” (with interharmonics), and one with
“amplitude modulation” - plus a highly aperiodic
“noisy” type. They demonstrated that these three
types are perceptually distinct to ordinary listeners.
They also described “vocal fry”, with visibly
damped pulses. Second, Redi and Shattuck-
Hufnagel [35] distinguished four types of creaky
voice: irregular “aperiodicity”, damped, low-F0
“creak”, “diplophonia” (with any kind of alternating
pulse frequency, amplitude, or shape), and the rare
“squeak” (with a sudden sustained high F0). Redi
and Shattuck-Hufnagel showed that not only do
these types vary across speakers, but also across
positions-in-utterance for individual speakers.
In this paper we build on these previous studies
about different kinds of creak from the perspective
of researchers performing varied acoustic analyses
of a range of voice samples. If each acoustic
measure reflects a specific aspect of creak, and if
different kinds of creak exhibit specific
combinations of these aspects, then different kinds
of creak will be distinguished from modal voice by
distinct acoustic measures. Specifically, we attempt
to relate different kinds of creak to acoustic
measures already used by researchers.
2. PROTOTYPICAL CREAKY VOICE
We begin by describing what we take to be
prototypical creaky voice, in line with the brief
definitions given in many research papers.
Prototypical creaky voice has the following three
key properties: (1) low rate of vocal fold vibration
(F0), (2) irregular F0, and (3) constricted glottis: a
small peak glottal opening, long closed phase, and
low glottal airflow.
Figure 1: Waveform showing prototypical creak,
phrase-final by a male English speaker, vowel /e/.
Fig. 1 shows a sample waveform of creaky voice
with these properties, from a male speaker of
English. F0 is in the range of 70 Hz, but irregular.
Glottal constriction is inferred from a high Contact
Quotient in the simultaneous EGG signal.
3. OTHER KINDS OF CREAKY VOICE
While prototypical creaky voice is often encountered
in speech samples, much of what is called creaky
voice indeed, is perceived as creaky voice may
differ from this prototype in one or more ways. Each
of the three properties of prototypical creak can be
lacking, yielding several further kinds of creak.
3.1. Vocal fry
Although the term “vocal fry” is often used
interchangeably with “creak”, vocal fry differs from
prototypical creak in a major way: the glottis is
constricted and F0 is low, but it is not necessarily
irregular. Indeed it is often quite periodic, as in Fig.
2. Its special property is high damping of the pulses
this property, due in part to the low F0, makes
individual pulses distinct and separately audible (the
“picket fence” effect). Thus the prototypical low-F0
property is enhanced.
Figure 2: Waveform showing phrase-final vocal
fry by a female English speaker, with regular F0.
It has been suggested that ventricular incursion,
as observed by [1], can be one contributor to vocal
fry (though cf. [11]): the ventricular folds contact
and mechanically load the vocal folds. This
increases the effective mass of the folds, so F0 is
extremely low; it can also make vibration irregular.
However, as vocal fry was the only kind of creak
examined by [1], the incidence of ventricular
involvement across kinds of creak is not known.
3.2. Multiply pulsed voice
A very common form of creak involves a special
kind of F0 irregularity: alternating longer and shorter
pulses. (See [20] for a literature review.) In the case
of double pulsing (or period doubling), there are two
simultaneous periodicities; higher multiples are also
possible. There are thus multiple F0s, usually one
quite low and another about (though not exactly) an
octave higher, but the resulting percept is usually of
an indeterminate pitch, plus roughness. Thus the
prototypical low-F0 is not necessarily present.
These pulses generally have a very long closed
phase, as shown by [41]’s imaging of glottal areas in
double- and triple-pulsed creak. See Figs. 3 and 4 for
sample waveform and spectrum, the latter showing
two sets of harmonics.
Figure 3: Waveform showing double pulsing by a
male English speaker on a steady /a/. Note the
regular alternation of strong and weak pulses.
Figure 4: Spectrum of vowel in Fig. 3. Note two
sets of harmonics.
3.3. Aperiodic voice
Another variant of F0 irregularity is when it is taken
to the extreme vocal fold vibration is so irregular
that there is no periodicity and thus no perceived
pitch. See Fig. 5. Like multiply pulsed voice,
aperiodic voice lacks the prototypical property of
low F0; instead, the property of irregular F0 is
enhanced, and the voice is therefore noisy.
Figure 5: Waveform showing extreme
aperiodicity, phrase-finally by a female English
speaker.
3.4. Nonconstricted creak
This is a voice quality described by Slifka [38, 39].
F0 is low and irregular, as in prototypical creak; but
the glottis is spreading, not constricted, and therefore
airflow through the glottis is higher, not lower. This
kind of creak is attested utterance-finally, with the
vocal folds beginning to spread before the utterance
is over. The naturally-low subglottal pressure in this
position, combined with the spreading glottis, means
that conditions for sustaining voicing are not ideal.
The slow and irregular vibrations indicate voicing at
the edge of failing. See Fig. 6.
While this kind of creak, with its increasing
airflow, is necessarily somewhat breathy, it differs
from Laver’s [32] proposed “breathy creak”, said to
involve airflow through a posterior (arytenoid)
glottal gap, simultaneous with anterior creak.
Figure 6: Waveform showing nonconstricted
creak, phrase-final by a male English speaker. The
Contact Quotient from EGG is low in this token,
indicating little glottal constriction.
3.5. Tense/pressed voice
When the glottis is constricted, but the F0 is neither
low nor irregular, a tense or pressed voice quality is
heard. While not always considered a form of creaky
voice, it can function phonologically as such in
languages in which a creaky (or laryngealized)
phonation can co-occur with high tone. Here the
constricted glottis is criterial. See Fig. 7.
The discussion above is summarized in Table 1.
Figure 7: Waveform, spectrogram, and pitch track
of “creaky” voice with high tone in Mazatec
phonetically a tense or pressed voice quality.
Reduced amplitude may be due to constricted
glottis.
4. ACOUSTIC MEASURES OF PROPERTIES
OF CREAK
These will be considered primarily with reference to
the measures provided by our program VoiceSauce
([36, 37]), freely available and often used to study
phonation types in languages. In a few cases,
exploratory re-synthesis using the UCLA Voice
Synthesizer (e.g. [29]) has been compared. The five
properties listed in Table 1 are discussed in turn.
Table 1: Properties characterizing different kinds
of creak. Check mark means a property
characterizes a type; NO means it does not; blank
means variable or unknown.
Property
>
low
F0
irreg
F0
glottal
constr
damped
pulses
sub-
harms
Main
correlate
>
Type ˅
low
F0
high
noise
low
H1-
H2
low
noise;
narrow
BWs
high
SHR
proto-
typical
vocal fry
multiply
pulsed
aperiodic
NO
noncon-
stricted
NO
tense
NO
4.1. Low F0
Creaky voice usually has lower F0 than modal voice.
Low F0 has been shown to be a key correlate of
creaky voice in Hmong [13] and Mixtec [18]. Yet F0
can be difficult to estimate when irregular;
sometimes no F0 can be found. The STRAIGHT
pitchtracker [27] is fairly robust in the face of F0
irregularity. Another option, especially appropriate
for multiply-pulsed creak, is Sun’s method [40],
based on his Subharmonic-to-Harmonic ratio
measure (SHR, see below). This is specifically
designed to estimate a perceptual F0 in the face of
competing simultaneous harmonics. See also [24]
for additional discussion of methods for tracking
irregular F0. In the limit, if no F0 can be extracted,
the voice is aperiodic, and thus without the low-F0
property. Our re-synthesis also suggests that
lowering the F0 lowers Cepstral Peak Prominence, a
measure of noise (see 4.2).
4.2. Irregular F0
Creaky voice usually has less regular voicing
than modal voice. This variability can be measured
as pulse-to-pulse jitter, or as the standard deviation
of the F0, or by autocorrelation [2]. But such voicing
irregularity is perceived as noise, not distinct from
other kinds of noise [29]. Therefore irregular F0 can
be measured as spectral noise, by e.g. Harmonic-to-
Noise Ratios (HNR) across different frequency
bands, by [8]’s method, or normalized as in [25].
Low HNR values indicate less strong periodic
excitation relative to glottal noise due either to ill-
defined harmonics (as with irregular F0) or
prominent glottal noise (as with nonconstricted
creak). Note, however, that vocal fry will have a
relatively high HNR, since in fry the glottal pulses
are so sharply defined.
Irregular F0 via low HNR is a correlate of creaky
voice in Ju|’hoansi [33], Mazatec [16], Hmong [13],
English [13,14,15], and Taiwanese [34]. Our re-
synthesis suggests that adding jitter lowers the
Cepstral Peak Prominence (i.e., increases noise), but
also the amplitude of the higher formants (i.e.,
increases spectral tilt).
4.3. Constricted glottis
The most common measure of creak is the amplitude
difference between the first and second harmonics,
H1-H2 - see e.g. [21]. (This is best estimated by the
formant-corrected version H1*-H2*, as in [22],
[26]). This measure generally reflects glottal
constriction, with a lower value indicating greater
constriction. [30] used high-speed imaging of the
glottis to show that as long as there is no posterior
glottal gap, H1-H2 is usually closely related to the
glottal Open Quotient. And, [14] and others have
found that it is well correlated with Contact Quotient
measures from electroglottography. Creaky voice
generally has low values of H1-H2, because the
glottis is usually constricted. But in non-constricted
creak, H1-H2 will have higher, not lower, values
than modal voice.
Low H1-H2 has been shown to be a correlate of
creaky voice in Zapotec [4, 12], Ju|’hoansi [33],
Mazatec [6, 16], Hmong [3, 13], English [15],
Trique [10], Taiwanese [34], and of constricted tense
voice in Mpi [6], Chong [9] and Yi languages [31].
Constricted glottis may give rise to vibrations that
impart more energy to higher-frequency harmonics,
perhaps through a more abrupt closure [23]. At the
same time, low flow through the glottis means less
energy in H1. As a result, various measures of
harmonic amplitude differences generally have
lower values in creak (i.e., less spectral tilt). Such
results have been found for Mazatec [6, 16], English
[15], Zapotec [4], and Trique [10]. Our re-synthesis
suggests that a smaller H1-H2 also increases HNR
measures (i.e., lowers noise). However, none of
these are measures of constricted glottis per se.
4.4. Damping
Damping of glottal pulses plays out in two kinds of
measures. First, as noted in 4.2 above: unless the F0
is very irregular, the harmonics in damped pulses
should be well defined, such that harmonic-to-noise
ratios should be high. Second, due to the long closed
phase, formant bandwidths should be narrow (e.g.
low B1 values). We have so far been unable to
demonstrate this through re-synthesis, however.
4.5. Subharmonics in multiple pulsing
As already noted, multiply-pulsed creak has multiple
sets of harmonics. Generally one set is stronger and
dominates the harmonic spectrum, while the other
harmonics (“subharmonics” or “interharmonics”)
appear between these stronger ones. Sun’s
Subharmonic-to-Harmonic Ratio SHR [40]
measures the relative strengths of the two sets, and
has been used by Sun to characterize the strength of
period doubling. Multiply-pulsed creak will have
more subharmonics, so higher SHR values [17].
5. CONCLUSION
Prototypical creaky voice can be distinguished
acoustically by its lower F0, by its irregular F0
(which results in lower values of various harmonic-
to-noise measures), and by its lower H1 and H1-H2,
and other harmonic difference measures. Just one or
two of these prototypical properties apparently
suffices to make a sample creaky. Creak that is vocal
fry with a regular F0 could instead show higher
HNR together with lower formant bandwidths.
Creak that is multiply pulsed can lack a clear F0 but
instead show subharmonics (resulting in higher
values of SHR). Non-constricted creak can instead
show higher H1-H2, but still with a low and
irregular F0. Creak that is more like tense or pressed
voice can have a mid or high, and regular, F0.
We hope to convey that there is no
straightforward answer to the FAQ, “What is the
best acoustic measure for creaky voice?”. It entirely
depends on what kind(s) of creak the investigator
wants to identify. It cannot be expected that
measures such as H1*-H2*, or jitter, etc., will
always characterize creaky voice, since there are
special sub-types that are not glottally constricted, or
not irregular, etc. It is crucial to keep in mind that
when different acoustic measures seem to disagree
about the creakiness of a speech sample, the set of
measures as a whole is in fact giving valuable
information about the specific voice quality in the
sample.
6. ACKNOWLEDGMENTS
We thank NSF grants BCS-0720304 and IIS-
1018863, and NIH grant DC01797, for funding.
7. REFERENCES
[1] Allen, E., Hollein, H. 1973. A laminagraphic study of
pulse (vocal fry) phonation. Folia Phon. 25, 241-250.
[2] Ashby, M., Przedlacka, J. 2014. Measuring
incompleteness: Acoustic correlates of glottal
articulation. JIPA 44, 283-296.
[3] Andruski, J. 2006. Tone clarity in mixed
pitch/phonation-type tones. J. Phonetics 34, 388-404.
[4] Avelino, H. 2010. Acoustic and electroglottographic
analyses of nonpathological, nonmodal phonation. J.
Voice 24, 270-280.
[5] Batliner, A., Berger, S., Johne, B., Kießling, A. 1993.
MÜSLI: A classification scheme for laryngealizations.
Proc. ESCA Workshop on Prosody, Lund, 176-179.
[6] Blankenship, B. 2002. The timing of nonmodal
phonation in vowels. J. Phonetics 30, 163-91.
[7] Childers, D.G., Lee, C.K. 1991. Vocal quality factors:
Analysis, synthesis, and perception. J. Acoust. Soc.
Am. 90, 2394-2410.
[8] de Krom, G. 1993. A cepstrum-based technique for
determining a harmonic-to-noise ratio in speech
signals. J. Sp. Hear. Res. 36, 254-66.
[9] DiCanio, C. 2009. The phonetics of register in
Takhian Thong Chong. JIPA 39, 162188.
[10] DiCanio, C. 2012. Coarticulation between tone and
glottal consonants in Itunyoso Trique. J. Phonetics 40,
162-176.
[11] Edmondson, J.A, Esling, J. H. 2006. The valves of
the throat and their functioning in tone, vocal register
and stress: laryngoscopic case studies. Phonology 23,
157-191.
[12] Esposito, C. 2010. Variation in contrastive phonation
in Santa Ana Del Valle Zapotec. JIPA 40, 181-198.
[13] Garellek, M. 2012. The timing and sequencing of
coarticulated non-modal phonation in English and
White Hmong. J. Phonetics 40, 152-161.
[14] Garellek, M. 2014. Voice quality strengthening and
glottalization. J. Phonetics 45, 106-113.
[15] Garellek, M. (2015). Perception of glottalization and
phrase-final creak. J. Acoust. Soc. Am. 137, 822-831.
[16] Garellek, M., Keating, P. 2011. The acoustic
consequences of phonation and tone interactions in
Mazatec. JIPA 41, 185-205.
[17] Garellek, M., Keating, P. 2015. Phrase-final creak:
Articulation, acoustics, and distribution. Annual
Meeting of the Linguistic Society of America,
Portland, OR.
[18] Gerfen, C., Baker, K. 2005. The production and
perception of laryngealized vowels in Coatzospan
Mixtec. J. Phonetics 33, 311-334.
[19] Gerratt, B.R., Kreiman, J. 2001. Toward a taxonomy
of nonmodal phonation. J. Phonetics 29, 365-381.
[20] Gobl, C., Ní Chasaide, A. 2010. Voice source
variation and its communicative functions. In:
Hardcastle, W., Laver, J., Gibbon, F. (eds), The
Handbook of Phonetic Sciences (Second Edition).
Oxford: Blackwell, 378-423.
[21] Gordon, M., Ladefoged, P. 2001. Phonation types: A
cross-linguistic overview. J. Phonetics 29, 383-406.
[22] Hanson, H. M. 1995. Glottal characteristics of
female speakers. Ph.D. Dissertation, Harvard.
[23] Hanson, H. M, Stevens, K.N., Kuo, H.-K. J., Chen,
M.Y., Slifka, J. 2001. Towards models of phonation.
J. Phonetics 29, 451-480.
[24] Hedelin, P., Huber, D. 1990. Pitch period
determination of aperiodic speech signals. Proc.
ICASSP Albuquerque, 361-364.
[25] Hillenbrand, J., Cleveland, R., Erickson, R. 1994.
Acoustic correlates of breathy vocal quality. J. Sp.
Hear. Res. 37, 769-778.
[26] Iseli, M., Shue, Y.-L., Alwan, A. 2007. Age, sex, and
vowel dependencies of acoustic measures related to
the voice source. J. Acoust. Soc. Am. 121, 2283–2295.
[27] Kawahara, H., Katayose, H., de Cheveigné, A.,
Patterson, R. D. 1999. Fixed point analysis of
frequency to instantaneous frequency mapping for
accurate estimation of F0 and periodicity. Proc.
EUROSPEECH Budapest, 2781–2784.
[28] Klatt, D., Klatt, L. 1990. Analysis, synthesis, and
perception of voice quality variations among female
and male talkers. J. Acoust. Soc. Am. 87, 820-857.
[29] Kreiman, J., Gerratt, B.R. 2005. Perception of
aperiodicity in pathological voice. J. Acoust. Soc. Am.
117, 2201-2211.
[30] Kreiman J., Shue, Y.-L., Chen, G., Iseli, M., Gerratt,
B. R., Neubauer, J., Alwan, A. 2012. Variability in the
relationships among voice quality, harmonic
amplitudes, open quotient, and glottal area waveform
shape in sustained phonation. J. Acoust. Soc. Am. 132,
2625-2632.
[31] Kuang, J.J. 2013. Phonation in Tonal Contrasts.
Ph.D. dissertation, UCLA.
[32] Laver, J. 1980. The phonetic description of voice
quality. Cambridge: Cambridge University Press.
[33] Miller, A.L. 2007. Guttural vowels and guttural
coarticulation in Ju|’hoansi. J. Phonetics 35, 56-84.
[34] Pan, H., Chen, M., Lyu, S. 2011. Electroglottograph
and Acoustic Cues for Phonation Contrasts in Taiwan
Min Falling Tones. Proc. 12th INTERSPEECH
Firenze, 649-652.
[35] Redi, L., Shattuck-Hufnagel, S. 2001. Variation in
the realization of glottalization in normal speakers. J.
Phonetics 29, 407-429.
[36] Shue, Y.-L. 2010. The voice source in speech
production: Data, analysis and models. Ph.D.
Dissertation, UCLA.
[37] Shue, Y.-L., Keating, P., Vicenik, C., Yu, K. 2011.
VoiceSauce: A program for voice analysis. Proc. 17th
ICPhS Hong Kong, 1846-1849.
[38] Slifka, J. 2000. Respiratory constraints on speech
production at prosodic boundaries. Ph.D.
Dissertation, MIT.
[39] Slifka, J. 2006. Some physiological correlates to
regular and irregular phonation at the end of an
utterance. J. Voice 20, 171-186.
[40] Sun, X. 2002, Pitch determination and voice quality
analysis using Subharmonic-to-Harmonic Ratio. Proc.
ICASSP Orlando, 333-336.
[41] Whitehead, R. L., Metz, D. , Whitehead, B.H. 1984.
Vibratory patterns of the vocal folds during pulse
register phonation. J. Acoust. Soc. Am. 75, 1293-1297
... Voicelessness is defined by a lack of vocal fold vibration, while creaky voice (or creak) involves a glottal constriction, a low rate of vocal fold vibration (i.e., low pitch), and/or irregular F0 (Ladefoged 1971;Gordon and Ladefoged 2001; Garellek 2019, among others). Breathy voice involves voicing in addition to noise, and concentration of acoustic energy in the F3 region (Laver 1980;Keating et al. 2015;Garellek 2014Garellek , 2019Esling et al. 2019). These phonation qualities relate to the relative degree of the vocal fold aperture, which is most open for voicelessness, then breathy voice, and most constricted for modal voice, then creaky voice (see, for example, Gordon and Ladefoged 2001). ...
... Acoustic analysis involved inspection of the waveform and spectrogram for cues of modal and non-modal voicing. As in González et al. (2022), modal voice was characterized by periodicity, while creaky voice was characterized by one or more of the following: (i) irregular F0 ('aperiodicity'), (ii) F0 lowering, (iii) changes in pulse amplitude or shape ('diplophonia'), and/or (iv) presence of silence followed by a stop burst ('glottal stop') (Dilley et al. 1996;Docherty and Foulkes 2005;Gordon and Ladefoged 2001;Huber 1988;Keating et al. 2015;Ladefoged and Maddieson 1996;Redi and Shattuck-Hufnagel 2001). All token vowels were also examined for intervals involving lack of voicing-which were coded as devoiced-and for intervals involving noise in the F3 region, coded as breathy voice (Laver 1980;Keating et al. 2015;Garellek 2014Garellek , 2019Esling et al. 2019). ...
... As in González et al. (2022), modal voice was characterized by periodicity, while creaky voice was characterized by one or more of the following: (i) irregular F0 ('aperiodicity'), (ii) F0 lowering, (iii) changes in pulse amplitude or shape ('diplophonia'), and/or (iv) presence of silence followed by a stop burst ('glottal stop') (Dilley et al. 1996;Docherty and Foulkes 2005;Gordon and Ladefoged 2001;Huber 1988;Keating et al. 2015;Ladefoged and Maddieson 1996;Redi and Shattuck-Hufnagel 2001). All token vowels were also examined for intervals involving lack of voicing-which were coded as devoiced-and for intervals involving noise in the F3 region, coded as breathy voice (Laver 1980;Keating et al. 2015;Garellek 2014Garellek , 2019Esling et al. 2019). H2-H1 measurements were also taken via FFT spectra. ...
Article
Full-text available
This article provides a detailed examination of voice quality in word-final vowels in Spanish. The experimental task involved the pronunciation of words in two prosodic contexts by native Spanish speakers from diverse dialects. A total of 400 vowels (10 participants × 10 words × 2 contexts × 2 repetitions) were analyzed acoustically in Praat. Waveforms and spectrograms were inspected visually for voice, creak, breathy voice, and devoicing cues. In addition, the relative amplitude difference between the first two harmonics (H1–H2) was obtained via FFT spectra. The findings reveal that while creaky voice is pervasive, breathy voice is also common, and devoicing occurs in 11% of tokens. We identify multiple phonation types (up to three) within the same vowel, of which modal voice followed by breathy voice was the most common combination. While creaky voice was more frequent overall for males, modal voice tended to be more common in females. In addition, creaky voice was significantly more common at the end of higher prosodic constituents. The analysis of spectral tilt shows that H1–H2 clearly distinguishes breathy voice from modal voice in both males and females, while H1–H2 values consistently discriminate creaky and modal voice in male participants only.
... In the literature, a prototypical creaky voice is defined as having three key properties, namely a lower fundamental frequency than modal voice (i.e., low rate of vocal fold vibration), an irregular f0 (random or multiply pulsed) and a constricted glottis (the vocal fold are close together) manifested by low glottal airflow due to a small peak glottal opening followed by a long closed stage denoting a low spectral-to-noise ratio (Keating et al. 2015, Keating et al. 2023). However, as Gobl and Ni Chasaide (2010: 401) eloquently summarised "although every voice quality" (i.e., modal, breathy, whispery, creaky, lax and tense voice, in line with Laver's (1980) classification) "varies dynamically in the course of an utterance, creaky voice is particularly variable". ...
... However, as Gobl and Ni Chasaide (2010: 401) eloquently summarised "although every voice quality" (i.e., modal, breathy, whispery, creaky, lax and tense voice, in line with Laver's (1980) classification) "varies dynamically in the course of an utterance, creaky voice is particularly variable". In light of this, Keating et al. (2015) identify and describe an additional five phonetic representations of creaky voice, such as: ...
... The complexity of the procedure is elevated since it takes only one feature, either a low f0, an irregular f0 or a constriction of the glottis for the voice to be perceived as creaky (feature previously observed by Davidson 2019, Keating et al. 2023. In this regard, based on the classification proposed by Keating et al. (2015) (see section 2.2), Garellek (2019) provides a visual summary of different creak sub-categories, alongside the acoustic differences between modal, breathy and creaky phonation. ...
Article
This paper represents a preliminary acoustic analysis of filler particles in terms of voice quality (i.e., modal vs. creaky phonation). The main research questions addressed in this study revolve around which particular voice parameters are indicative of (non)modal phonation of fillers used by healthy speakers of Standard Romanian, and whether the function of the filler particle varies with different voice qualities. The analysis is carried out on Romanian connected speech data extracted from the Ro-Phon corpus (non-pathological speech), an open-access linguistic resource developed during our postdoctoral research project financed by UEFISCDI (2020 – 2022).
... Previously annotated voiced portions were resynthesized with a Klatt synthesizer that tracked the pitch, formants, and intensity profiles of the original production and then concatenated the resynthesized portions with neighboring (voiceless) portions with smoothed boundaries. The only difference between the modal and creaky versions was that the creaky resynthesis inserted double pulsing points in all the voiced portions, resulting in doubly-pulsed creak throughout the creaky utterance ( [24]; see Figure 1). Thus, the two versions of a given utterance differed solely in voice quality and were otherwise acoustically identical. ...
... Modal/creaky utterances of the same talker were concatenated into one sound file, so each talker had a collection of modal utterances and a collection of creaky utterances that were maximally similar. Acoustic analysis confirmed that the creaky utterances had significantly lower H1*-H2* and HNR (harmonic-to-noise ratio in the 0-3500 Hz spectrum) values than the corresponding modal stimuli (see Table 1), consistent with previously documented acoustic differences between modal and creaky productions [24]. ...
Chapter
Full-text available
While there is a growing literature on the social meanings of nonmodal voice qualities, most of the existing studies focus on English and use either naturally produced speech stimuli (which are hard to control acoustically) or a small set of fully synthesized stimuli. This paper reports a perceptual study of the social meanings of creaky voice in Mandarin Chinese, using a large set of resynthesized stimuli featuring 38 talkers (19F) and 6-10 pairs of sentences per talker that differed only in voice quality (creaky vs. modal). Sixty listeners (33F) answered 4 questions about the talker's demographic profile (age, gender, sexuality, education) and gave 19 ratings of personality traits (e.g., confident, professional, charismatic) and interactive potential (e.g., engagingness). Using factor analysis and mixed-effects modeling, our results showed that for male listeners, creaky voice significantly decreased the perceived warmth of male talkers but increased the perceived warmth of female talkers; creaky voice also led to more gender identification errors on female talkers by female listeners and made male talkers sound older. These findings point toward multifaceted social meanings of creaky voice in Mandarin, which extend beyond talker attractiveness and are closely linked to gender, both the talker's and the listener's.
... A limitation of the present study comes in the use of the f0-based approach. As summarised by [46], not all types of creak are characterised by low f0, with aperiodicity and multiple pulsing NoDerivatives 4.0 International (https://creativecommons.org/licenses/by-nc-nd/4.0/). also characterising some creak. ...
... It is important to stress here that, while creaky voice is typically characterized by increased glottal (and laryngeal) constriction (Esling et al. 2019;Garellek 2022), very low and/or irregular f0 can alone cue creak, whereas increased glottal constriction alone is unlikely to do so (Keating et al. 2015. Additionally, we know from research by Slifka (2000Slifka ( , 2006 that utterance-final creak in English can be produced with increased glottal spreading rather than constriction. ...
Article
Full-text available
We investigate utterance-final voice quality in bilinguals of English and Spanish, two languages which differ in the type of non-modal voice usually encountered at ends of utterances: American English often has phrase-final creak, whereas in Mexican Spanish, phrase-final voiced sounds are breathy or even devoiced. Twenty-one bilinguals from the San Diego-Tijuana border region were recorded (with electroglottography and audio) reading passages in English and Spanish. Ends of utterances were coded for their visual voice quality as "modal" (having no aspiration noise or voicing irregularity), "breathy" (having aspiration noise), "creaky" (having voicing irregularity), or "breathy-creaky" (having both aspiration noise and voicing irregularity). In utterance-final position, speakers showed more frequent use of both modal and creaky voice when speaking in English, and more frequent use of breathy and breathy-creaky voice when speaking in Spanish. We find no role of language dominance on the rates of these four voice qualities. The electroglottographic and acoustic analyses show that all voice qualities, even utterance-final creak, are produced with increased glottal spreading; the combination of distinct noise measures and amplitude of voicing can distinguish breathy, creaky, and breathy-creaky voice qualities from one another, and from modal voice.
... Results vary widely across studies, due in part to the lack of theoretical motivation for the particular instrumental parameters studied; differences in the methods, voices, and qualities studied; and differences in how individuals define and assess the target voices and qualities. For example, there are multiple kinds of creaky voice that listeners can easily distinguish, yet which are all labeled creaky (Gerratt and Kreiman, 2001;Keating et al., 2015). Similarly, a voice with a steeply falling source spectrum can be perceived as breathy even in the absence of turbulent noise (e.g., Garellek et al., 2013). ...
Article
The problem of characterizing voice quality has long caused debate and frustration. The richness of the available descriptive vocabulary is overwhelming, but the density and complexity of the information voices convey lead some to conclude that language can never adequately specify what we hear. Others argue that terminology lacks an empirical basis, so that language-based scales are inadequate a priori. Efforts to provide meaningful instrumental characterizations have also had limited success. Such measures may capture sound patterns but cannot at present explain what characteristics, intentions, or identity listeners attribute to the speaker based on those patterns. However, some terms continually reappear across studies. These terms align with acoustic dimensions accounting for variance across speakers and languages and correlate with size and arousal across species. This suggests that labels for quality rest on a bedrock of biology: We have evolved to perceive voices in terms of size/arousal, and these factors structure both voice acoustics and descriptive language. Such linkages could help integrate studies of signals and their meaning, producing a truly interdisciplinary approach to the study of voice.
Article
Purpose This study aimed to investigate the acoustic and electroglottographic (EGG) profiles of Mandarin deception, including global characteristics and the influence of gender. Method Thirty-six Mandarin speakers participated in an interactive interview game in which they provided both deceptive and truthful answers to 14 biographical questions. Acoustic and EGG signals of the participants' responses were simultaneously recorded; 20 acoustic and 14 EGG features were analyzed using binary logistic regression models. Results Increases in fundamental frequency ( F 0) mean, intensity mean, first formant (F1), fifth formant (F5), contact quotient (CQ), decontacting-time quotient (DTQ), and contact index (CI) as well as decreases in jitter, shimmer, harmonics-to-noise ratio (HNR), and fourth formant (F4) were significantly correlated with global deception. Cross-gender features included increases in intensity mean and F5 and decreases in jitter, HNR, and F4, whereas gender-specific features encompassed increases in F 0 mean, shimmer, F1, third formant, and DTQ, as well as decreases in F 0 maximum and CQ for female deception, and increases in CQ and CI and decreases in shimmer for male deception. Conclusions The results suggest that Mandarin deception could be tied to underlying pragmatic functions, emotional arousal, decreased glottal contact skewness, and more pressed phonation. Disparities in gender-specific features lend support to differences in the use of pragmatics, levels of deception-induced emotional arousal, skewness of glottal contact patterns, and phonation types.
Article
This paper explores a lesser studied topic in Romanian linguistic research related to the acoustic features (duration, pitch, F1/F2 values) and pragmatic functions of filler particles, often referred to as “filled pauses”, in spontaneous speech. From a theoretical perspective, we align our analysis with the recent definition proposed by Belz (2023), where “a phonetic exponent which is segmentally structured, semantically empty, syntactically unconstrained, and does not show an interjectional function is classified as a filler particle”. Moreover, as a way to reinforce the idea that filler particles are, from a phonetic perspective, extra-pausal phenomena, we classified fillers in terms of their positioning relative to the silent pause (i.e., pre-pausal, post-pausal, inter-pausal and concatenated). Prior to carrying out extensive quantitative analyses of filler particles on Romanian data, in this article we proposed performing an initial qualitative exploration of fillers in connected speech based on the Ro-Phon corpus. Our results reveal that: (1) the prototypical filler particle outputs in Romanian are vocalic (/ə, ɨ /), vocalic-nasal (/əm/) and nasal (/m/), (2) the length of the pause preceding a filler particle is systematically longer than the pause following it, (3) inter-pausal filler particles display the longest average duration while concatenated fillers were the shortest in our data, (4) in terms of formant frequencies, there is a greater degree of movement along the F1 axis compared to the average F2 frequencies extracted from both vocalic and vocalic-nasal tokens, indicative of an acoustic continuum present within the central vowels /ɨ/ and /ə/, (5) all fillers display a low, flat f0 contour, with a similar frequency as that of neighbouring phones, and that (6) filler particles perform various and often cumulative discursive roles, ranging from a cognitive function (indicative of planning processes), marker of a repair (self-initiated, content-based repairs), to a discourse management function used to signal upcoming new (sensitive) information within the narrative sequence. Future studies aim at extending the data and performing in-depth quantitative analyses related to duration, frequency distribution, voice and vowel quality of filler particles in non-pathological native and non-native Romanian speech data.
Article
Full-text available
American English has several linguistic sources of creaky voice. Two common sources are /t/-glottalization (where /t/ is produced as a glottal stop and/or with creaky voice, as in "button") and phrase-final creak. Both /t/-glottalization and phrase-final creak have similar acoustic properties, but they can co-occur in English. The goal of this study is to determine whether /t/-glottalization and phrase-final creak are perceived distinctly. Sixteen English listeners were asked to identify words in a two-alternative forced choice task. The auditory targets were (near-) minimal pairs, in which one word could have /t/-glottalization (e.g., "button") but the other could not (e.g., "bun"). Stimuli were presented with and without phrase-final creak. Listeners made few identification errors overall, even when /t/-glottalization co-occurred with phrase-final creak, suggesting that /t/-glottalization and phrase-final creak remain perceptually distinct to English listeners. This supports the view that creaky voice is not a single category, but one comprised of distinct voice qualities.
Article
Full-text available
The autocorrelation function, a measure of regularity in the speech signal, is applied in demarcating the seemingly diffuse intervals of glottalization which accompany or replace voiceless oral stops in elicited recordings from 22 young speakers of Southern British English. It is shown that a local minimum in autocorrelation characterizes almost all instances heard as intervocalic glottal stops; an annotation procedure is developed and used to gather data on glottalization gestures, including duration, f0, energy and autocorrelation. The same measure is used to assess regularity of vocal fold vibration in an interval just prior to the formation of the total closure for instances of syllable-final /t/, and confirms significantly lower autocorrelation in a group auditorily judged ‘pre-glottalized’. Implications are considered both for normal speech perception and for expert phonetic judgments.
Article
Full-text available
San Felipe Jalapa de Díaz (Jalapa) Mazatec is unusual in possessing a three-way phonation contrast and three-way level tone contrast independent of phonation. This study investigates the acoustics of how phonation and tone interact in this language, and how such interactions are maintained across variables like speaker sex, vowel timecourse, and presence of aspiration in the onset. Using a large number of words from the recordings of Mazatec made by Paul Kirk and Peter Ladefoged in the 1980s and 1990s, the results of our acoustic and statistical analysis support the claim that spectral measures like H1-H2 and mid-range spectral measures like H1-A2 best distinguish each phonation type, though other measures like Cepstral Peak Prominence are important as well. This is true regardless of tone and speaker sex. The phonation type contrasts are strongest in the first third of the vowel and then weaken towards the end. Although the tone categories remain distinct from one another in terms of F0 throughout the vowel, for laryngealized phonation the tone contrast in F0 is partially lost in the initial third. Consistent with phonological work on languages that cross-classify tone and phonation type (i.e. ‘laryngeally complex’ languages, Silverman 1997), this study shows that the complex orthogonal three-way phonation and tone contrasts do remain acoustically distinct according to the measures studied, despite partial neutralizations in any given measure.
Article
Full-text available
This paper investigates the realization of contrastive tone in three non-modal phonation contexts (creaky phonation, glottal closure, and breathy phonation) in Itunyoso Trique, an Oto-Manguean language spoken in Oaxaca, Mexico. The study examines how coarticulatory glottalization (creaky phonation, glottal closure) coincides with coarticulatory pitch perturbations and spectral tilt changes on neighboring vowels. The onset of laryngeally induced F0 perturbation effects and the timing of changes in spectral tilt were examined using acoustic data from six speakers of the language. The results show that in contexts where substantial non-modal phonation spreads onto the adjacent vowel, greater pitch effects are observed. In contexts where abrupt glottal closure occurs, less coarticulatory changes in spectral tilt and pitch are observed on adjacent vowels. In addition, strong tonal effects are observed for certain spectral measures. These findings are discussed in relation to the literature on tonogenesis and coarticulatory pitch effects.
Article
Lexical tone identity is often determined by a complex of acoustic cues. In Green Mong, a Hmong‐Mien language of Southeast Asia, a small subset of tones is characterized by phonation type in addition to pitch height, pitch contour, and duration, which characterize the remaining tones of the language. In tones that incorporate multiple cues to tonal identity, what makes a tone clear, or easy to recognize? This study examines acoustic and perceptual data to address this question. Six native speakers of Green Mong were asked to produce 132 phonological CV words in sentence context, using a conversational speaking style. Seventeen native speakers of the language were then asked to categorize three tones which have similar falling contours, but are differentiated by phonation type (breathy, creaky, and modal). Tokens that were correctly identified by 100% of the listeners were compared with tokens that were relatively poorly identified. Data indicate that the breathy‐ and creaky‐voiced tones are less susceptible to identification errors than the modal‐voiced tone. However, the clearest tokens of the three tones are also differentiated by details of pitch contour shape, and by duration. Similarities and differences between acoustic cue values for the best and worst tokens will be discussed.
Article
Thesis (Ph. D.)—Harvard--Massachusetts Institute of Technology Division of Health Sciences and Technology, 2000.
Article
The purpose of the present study was to investigate the vibratory patterns of the vocal folds during pulse register phonation.Glottal area–time functions were calculated from three high speed laryngeal films (4000 frames/second) obtained during phonation of the schwa vowel in pulse register by a normally hearing and speaking adult female. The results for the first film indicated that each of 35 consecutive vibratory cycles of the vocal folds consisted of a single opening/closing gesture followed by a lengthy closed phase. The analysis of the second film revealed that each of 33 consecutive vibratory cycles consisted of a double opening/closing vocal fold pattern, followed by a long closed phase. For the third film, the results indicated that each of 26 consecutive vibratory cycles of the vocal folds consisted of either a double or triple opening/closing gesture followed by a lengthy closed period. From these data, it appears that one of the physiological descriptors of pulse phonation is multiple, as well as single, vocal fold vibratory patterning.