Conference PaperPDF Available

Exploring the sound structure of novel vocalizations

Authors:
  • Leibniz Zentrum Allgemeine Sprachwissenschaft

Abstract and Figures

When humans speak or animals vocalize, they can produce sounds that are further combined into larger sequences. The flexibility of sound combinations into larger meaningful sequences is one of the hallmarks of human language. To some extent, this has also been found in other species, like chimpanzees and birds. The current study investigates the structure of sounds when speakers are asked to communicate the meaning of 20 selected concepts without using language. Our results show that the structure of sounds between pauses is frequently limited to 1-3 sounds. This structure is less complex than when humans use their native language. The acoustic distance between sounds depends largely on the concept apart from concepts referring to animals, which show a higher diversity of involved sounds. This exploratory analysis might provide evidence of how the structure of sound could have changed from simple to complex in evolution.
Content may be subject to copyright.
Exploring the sound structure of novel vocalizations
Susanne Fuchs*1,ˇ
S´
arka Kadav´
a*1,2,3, Wim Pouw3, Bradley Walker4, Nicolas Fay4, Bodo Winter5,
and Aleksandra ´
Cwiek1
*Shared first authorship, corresponding authors: fuchs|kadava@leibniz-zas.de
1Leibniz-Zentrum Allgemeine Sprachwissenschaft, Berlin, Germany
2Linguistik, Georg-August Universit¨
at, G¨
ottingen, Germany
3Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands
4School of Psychological Science, The University of Western Australia, Perth, Australia
5Depart. of English Language and Linguistics, Uni of Birmingham, Birmingham, United Kingdom
When humans speak or animals vocalize, they can produce sounds that are further combined
into larger sequences. The flexibility of sound combinations into larger meaningful sequences
is one of the hallmarks of human language. To some extent, this has also been found in other
species, like chimpanzees and birds. The current study investigates the structure of sounds
when speakers are asked to communicate the meaning of 20 selected concepts without using
language. Our results show that the structure of sounds between pauses is frequently limited to
1–3 sounds. This structure is less complex than when humans use their native language. The
acoustic distance between sounds depends largely on the concept apart from concepts referring
to animals, which show a higher diversity of involved sounds. This exploratory analysis might
provide evidence of how the structure of sound could have changed from simple to complex in
evolution.
1. Introduction
Human speech is composed of small units: sounds that are meaning-
distinguishing (phonemes). Several sounds combine into syllables, words, and
phrases that carry meaning(s). The sequential order of sounds into larger se-
quences is a milestone in speech acquisition, and already young infants can start
producing sequences of vocalization before they acquire their mother tongue
(Wermke, Robb, & Schluter, 2021). Even when language is acquired, nonver-
bal vocalizations are present in adult communication and are an emerging field of
study at the boundaries between non-human and human communication (Pisanski,
Bryant, Cornec, Anikin, & Reby, 2022). That means sequences of sounds are not
a property of human communication alone but are also found in non-human ani-
mals like birds (Sainburg, Theilman, Thielk, & Gentner, 2019; Doupe & Kuhl,
1999; Favaro et al., 2020), meerkats (Rauber, Kranstauber, & Manser, 2020),
chimpanzees (Girard-Buttoz et al., 2022). Comparative approaches between hu-
man and non-human animal vocalization deserve bottom-up methodologies rather
than human-centric analyses (Hoeschele, Wagner, & Mann, 2023). What has been
called a syllable in non-human vocalization refers to sound(s) produced between
pauses. In human speech production, similar chunks have often been termed inter-
pausal units (Bigi & Priego-Valverde, 2019; Prakash & Murthy, 2019). They refer
to speech that is realized between pauses.
In this exploratory study, we are interested in sounds realized in novel vocal-
izations during a charade game, i.e., in a situation where the use of actual words
of the participant’s language is ‘forbidden’. This paradigm has been used to inves-
tigate the origin and evolution of language (Fay et al., 2022; ´
Cwiek et al., 2021;
Perlman & Lupyan, 2018).
This paper aims to explore how many sounds are realized between pauses in
non-linguistic vocalizations. Furthermore, we investigate the diversity of sounds
realized within different concepts, by assessing the distance between them in a
multi-variable acoustic space.
2. Methodology
2.1. Corpus creation
The present study uses a subset of data collected in a larger study in which partic-
ipants were recorded performing a series of concepts in three conditions. In the
three conditions, participants are asked to communicate a set of concepts using
either (1) only gestures, (2) only non-linguistic vocalizations and other sounds, or
(3) a combination of gestures and vocalizations. Here, we focus on a subset of the
vocalization recordings. We have not analyzed the vocalizations that are produced
in the multimodal condition because we assume that first, they are not stand-alone
carriers of the meaning, and second, their forms are shaped by the coordination
with body motion.
The recordings analyzed here were produced by 62 first-year psychology stu-
dents at the University of Western Australia (43 female, 17 male, 2 non-binary;
aged 17–33, M = 20.21, SD = 3.36). All were speakers of English. Of these, 28
participated in person and 34 remotely via Microsoft Teams, due to COVID-19
restrictions. Participants were allocated 60 concepts to communicate (20 in each
modality condition), sampled from a list of 200 concepts comprising the 100-item
Leipzig-Jakarta list of basic vocabulary (Tadmor, 2009) plus 100 other basic con-
cepts chosen based on their sensory and modality preferences (Lynott, Connell,
Brysbaert, Brand, & Carney, 2020). They were asked to communicate each con-
cept using the specified modality (and without using language) so that another
person would be able to view the recording and guess the concept from a list of
options. If the participants could not think of a way to communicate a concept,
they were permitted to skip it.
2.2. Concept extraction
For the exploratory analysis, we focused on a variety of concepts that might re-
flect different degrees of concreteness and abstraction (see 1). For example, the
concept maybe is rather abstract or logical than smoke. We chose these different
concepts to have a wider semantic potential, but have not added categories to the
concepts, because a dichotomy between concreteness vs. abstraction has currently
been questioned (Banks et al., 2023).
Table 1. Concepts used in this study. L-J
corresponds to the Leipzig-Jakarta list.
Concept List No. of speakers
happy other 6
sad other 7
bad other 7
scared other 5
good L-J 6
angry other 7
disgusted other 7
dog L-J 6
cat other 6
bird L-J 5
fish L-J 5
fly L-J 8
old L-J 4
spoon other 5
egg L-J 6
ash L-J 3
stone/rock L-J 6
smoke L-J 4
maybe other 8
not L-J 7
Our analysis only included concepts
for which initially at least 5 partici-
pants produced vocalizations. For three
concepts we excluded acoustic trials as
they contained a considerable amount
of background noise that made an anal-
ysis unreliable.
2.3. Acoustic
annotation procedures
The acoustic data were labeled in Praat
6.1.51 (Boersma & Weenink, 2021) by
three annotators who are phoneticians
by training. Following Swets, Fuchs,
Krivokapi´
c, and Petrone (2021), all
silent intervals longer than 100 ms were
treated as pauses and labeled with ‘p’.
Apart from placing boundaries next to
pauses, the annotators additionally la-
beled successive sounds without pauses.
The following criteria were used in the
decision-making process for separating
the speech stream into two or more
sounds: a) two (or more) prominent amplitude peaks in the amplitude envelope
were present, b) changes in spectral characteristics (e.g., formant structures) were
present, and c) sounds were perceptually distinct. Variations in fundamental fre-
quency, e.g., a downward and then upward motion, were only considered as two
sounds when they also showed spectral differences in higher frequency ranges
and/or differences in the amplitude envelope. All sounds were labeled with an
initial ‘s’ and successive numbers when they occurred in a sequence. The first
annotator (a1) created the annotation criteria and labeled the data. Annotator 2
(a2) used the available TextGrids from a1 and changed the boundaries when she
disagreed. Both agreed on 94.6 percent of the number of sounds. Hereafter, a1
inspected all acoustic files again where disagreement was found and confirmed the
changes. Annotator 3 (a3) started labeling from scratch without having TextGrids
available. Inter-rater agreement between a2 and a3 was 96.7 percent concerning
the overall number of labeled sounds. The temporal differences between the onset
of a given sound labeled by a2 and its closed temporal neighbor labeled by a3
were calculated. The same was done for the offset of a sound. The differences
were on average 0.048s (median = 0.018s) for the onset and 0.088s (median =
0.027s) for the offset. These differences are influenced by the number of sounds
an annotator labeled for a given concept, which makes the calculation of inter-
rater agreement challenging. We think that for the current exploratory analysis,
the overall agreement is reasonable. We decided to take a2’s segmentation for
further analysis.
Figure 1. Example for acoustic annotation of the concept smoke. All segments are labeled as ‘s’ and
pauses as ‘p’. The red line depicts the intensity curve.
2.4. Analyses of acoustic similarity
Initially, all audio files were cut into segment-sized files using a custom Python
script. Acoustic analysis was performed on these sounds, using the analyze() func-
tion of the soundgen package in R(Anikin, 2019). The output of this function
consists of more than 100 acoustic parameters as listed in the documentation (e.g.,
f0, amplitude, formant values, entropy, and their respective mean, median, and
standard deviation). Some of these acoustic parameters are present or absent in
the recorded sounds, e.g., voicing. However, the presence of voicing is redundant
with intensity because voiced sounds are louder than voiceless ones and intensity
values can always be calculated. That means, some acoustic parameters are highly
correlated and redundant with others. For this reason, we excluded parameters re-
sulting in NA values in the post-processing. Moreover, we excluded voice quality
parameters (e.g., flux), because these parameters may have been very sensitive to
background noise, which occurred in some speakers. All final parameters were
averaged for the whole time series of a sound, and we used mean and standard
deviation for further explorations. We ended up with a multidimensional dataset
consisting of 45 acoustic parameters. For the analysis of acoustic similarity, we
calculated the Euclidean distance between the vector of acoustic parameters of
each sound, to all other sounds. As a result, we got a distance matrix that allowed
us to extract an average distance between sounds within a trial of a concept and
compare it to other concepts.
3. Results and Discussion
3.1. Structural similarity
To explore structural similarity, we analyzed if certain sounds occurring between
pauses appear alone or in successive order. When speakers try to communicate
concepts using novel vocalizations, they frequently realize a relatively small num-
ber of sounds between two pauses: 1 sound occurred 208 times, 2 sounds = 80
times, 3 sounds = 35 times, 4 sounds = 24 times, 5 sounds = 11 times, 6 sounds =
3 times, 8 sounds = 4 times, 9 sounds = 1 time, 10 sounds = 1 time, 16 sounds =
2 times, 18 sounds = 1 time. That means structurally most concepts (208 cases in
our dataset) are realized with only one sound <s>that is surrounded by pauses.
In 80 cases we found realizations of two successive sounds <ss>and in 35 cases
participants produced three successive sounds <sss>without being interrupted
by a pause. If the data are split by concept, vocalizations for cat,dog, and bird
(all within a broader class of animals) also have more than three successive sound
combinations, probably mirroring onomatopoeia. For the rest of the data, no con-
clusions can be drawn, because the number of sounds between pauses is concept-
specific.
If pauses are taken into account, sounds were combined flexibly, for exam-
ple, for four sounds we could get combinations such as <s|s|s|s>or <ss|ss>or
<ss|s|s>where |marks a pause.
3.2. Acoustic similarity
Similar sounds may be repeated, like in imitating ‘coo-coo’, or they may be of
different acoustic quality, like in imitating a cat’s ‘meow’. For this reason, we
were further interested in examining the similarity between sounds that make up
a novel vocalization.
To have a first look into the diversity of sounds, we analyzed their average
acoustic distance within each trial. We preferred this data-driven approach in
contrast to labeling the data to phonemic features because it allows us to include
sounds that may not occur in the English phoneme inventory, e.g., whistles or
clicks. It represents continuous acoustic data instead of putting categorical labels
to it, which could also be biased by the native language of the annotator.
Figure 2 depicts the results. We can see that the different concepts vary in their
average acoustic distance between sounds. Some abstract concepts like not consist
of sounds that are closer to each other in distance (i.e., more similar), while dog
has a larger average acoustic distance between the sounds. Those concepts with
several successive sounds (e.g., <sss>) are also the ones with the largest average
distance.
Figure 2. Average acoustic distances between sounds within a single trial displayed by concept,
boxplots, and half-violins in purple display data distribution, black dots correspond to single trials.
Each concept is displayed at the x-axis and ordered by alphabet.
In summary, the structure of novel vocalizations obtained from a charade game
most often contains either one, two, or three successive sounds that are not sep-
arated by pauses. This may to some extent be similar to infant’s vocalization
(Wermke et al., 2021) and non-human species. It is different from human speech
production, where already syllables or morphemes can consist of three sounds.
Those are combined into larger chunks that are not interrupted by pauses. Our
findings suggest that novel vocalizations have a rather simple sound structure that
is complexified (i.e., more and probably shorter sounds are realized in a sequence)
during language evolution.
4. Supplementary Materials
Dataset and scripts are available on https://github.com/sarkadava/
Evolang2024_SoundSimilarity.
Acknowledgements
We like to thank the reviewers of Evolang, the participants of the study, and
Melissa Ebert for data annotation. This work has been supported by a grant from
the German Research Council (FU791/9-1).
References
Anikin, A. (2019). Soundgen: An open-source tool for synthesizing nonverbal
vocalizations. Behavior research methods,51, 778–792.
Banks, B., Borghi, A. M., Fargier, R., Fini, C., Jonauskaite, D., Mazzuca, C.,
Montalti, M., Villani, C., & Woodin, G. (2023). Consensus paper: Current
perspectives on abstract concepts and future research directions. Journal of
Cognition,6(1).
Bigi, B., & Priego-Valverde, B. (2019). Search for inter-pausal units: application
to cheese! corpus. In 9th language & technology conference: Human lan-
guage technologies as a challenge for computer science and linguistics (pp.
289–293).
Boersma, P., & Weenink, D. (2021). Praat: doing phonetics by computer [com-
puter program](2011). Version,5(3), 74.
´
Cwiek, A., Fuchs, S., Draxler, C., Asu, E. L., Dediu, D., Hiovain, K., Kawahara,
S., Koutalidis, S., Krifka, M., Lippus, P., et al.. (2021). Novel vocalizations
are understood across cultures. Scientific Reports,11(1), 10108.
Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common
themes and mechanisms. Annual Review of Neuroscience,22(1), 567–631.
Favaro, L., Gamba, M., Cresta, E., Fumagalli, E., Bandoli, F., Pilenga, C., Isaja,
V., Mathevon, N., & Reby, D. (2020). Do penguins’ vocal sequences con-
form to linguistic laws? Biology letters,16(2), 20190589.
Fay, N., Walker, B., Ellison, T. M., Blundell, Z., De Kleine, N., Garde, M., Lister,
C. J., & Goldin-Meadow, S. (2022). Gesture is the primary modality for lan-
guage creation. Proceedings of the Royal Society B,289(1970), 20220066.
Girard-Buttoz, C., Zaccarella, E., Bortolato, T., Friederici, A. D., Wittig, R. M., &
Crockford, C. (2022). Chimpanzees produce diverse vocal sequences with
ordered and recombinatorial properties. Communications Biology,5(1),
410.
Hoeschele, M., Wagner, B., & Mann, D. C. (2023). Lessons learned in animal
acoustic cognition through comparisons with humans. Animal Cognition,
26(1), 97–116.
Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The lan-
caster sensorimotor norms: multidimensional measures of perceptual and
action strength for 40,000 english words. Behavior Research Methods,52,
1271–1291.
Perlman, M., & Lupyan, G. (2018). People can create iconic vocalizations to
communicate various meanings to na¨
ıve listeners. Scientific reports,8(1),
2634.
Pisanski, K., Bryant, G. A., Cornec, C., Anikin, A., & Reby, D. (2022). Form
follows function in human nonverbal vocalisations. Ethology Ecology &
Evolution,34(3), 303–321.
Prakash, J. J., & Murthy, H. A. (2019). Analysis of inter-pausal units in indian
languages and its application to text-to-speech synthesis. IEEE/ACM Trans-
actions on Audio, Speech, and Language Processing,27(10), 1616–1628.
Rauber, R., Kranstauber, B., & Manser, M. B. (2020). Call order within vocal
sequences of meerkats contains temporary contextual and individual infor-
mation. BMC biology,18, 1–11.
Sainburg, T., Theilman, B., Thielk, M., & Gentner, T. Q. (2019). Parallels in the
sequential organization of birdsong and human speech. Nature communi-
cations,10(1), 3636.
Swets, B., Fuchs, S., Krivokapi´
c, J., & Petrone, C. (2021). A cross-linguistic study
of individual differences in speech planning. Frontiers in Psychology,12,
655516.
Tadmor, U. (2009). Loanwords in the world’s languages: Findings and results.
Loanwords in the world’s languages: A comparative handbook,55, 75.
Wermke, K., Robb, M. P., & Schluter, P. J. (2021). Melody complexity of infants’
cry and non-cry vocalisations increases across the first six months. Scientific
reports,11(1), 4137.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Abstract concepts are relevant to a wide range of disciplines, including cognitive science, linguistics, psychology, cognitive, social, and affective neuroscience, and philosophy. This consensus paper synthesizes the work and views of researchers in the field, discussing current perspectives on theoretical and methodological issues, and recommendations for future research. In this paper, we urge researchers to go beyond the traditional abstract-concrete dichotomy and consider the multiple dimensions that characterize concepts (e.g., sensorimotor experience, social interaction, conceptual metaphor), as well as the mediating influence of linguistic and cultural context on conceptual representations. We also promote the use of interactive methods to investigate both the comprehension and production of abstract concepts, while also focusing on individual differences in conceptual representations. Overall, we argue that abstract concepts should be studied in a more nuanced way that takes into account their complexity and diversity, which should permit us a fuller, more holistic understanding of abstract cognition.
Article
Full-text available
Humans are an interesting subject of study in comparative cognition. While humans have a lot of anecdotal and subjective knowledge about their own minds and behaviors, researchers tend not to study humans the way they study other species. Instead, comparisons between humans and other animals tend to be based on either assumptions about human behavior and cognition, or very different testing methods. Here we emphasize the importance of using insider knowledge about humans to form interesting research questions about animal cognition while simultaneously stepping back and treating humans like just another species as if one were an alien researcher. This perspective is extremely helpful to identify what aspects of cognitive processes may be interesting and relevant across the animal kingdom. Here we outline some examples of how this objective human-centric approach has helped us to move forward knowledge in several areas of animal acoustic cognition (rhythm, harmonicity, and vocal units). We describe how this approach works, what kind of benefits we obtain, and how it can be applied to other areas of animal cognition. While an objective human-centric approach is not useful when studying traits that do not occur in humans (e.g., magnetic spatial navigation), it can be extremely helpful when studying traits that are relevant to humans (e.g., communication). Overall, we hope to entice more people working in animal cognition to use a similar approach to maximize the benefits of being part of the animal kingdom while maintaining a detached and scientific perspective on the human species.
Article
Full-text available
Until recently, human nonverbal vocalisations such as cries, laughs, screams, moans, and groans have received relatively little attention in the human behavioural sciences. Yet these vocal signals are ubiquitous in human social interactions across diverse cultures and may represent a missing link between relatively fixed nonhuman animal vocalisations and highly flexible human speech. Here, we review converging empirical evidence that the acoustic structure (“forms”) of these affective vocal sounds in humans reflect their evolved biological and social “functions”. Human nonverbal vocalisations thus largely parallel the form-function mapping found in the affective calls of other animals, such as play vocalisations, distress cries, and aggressive roars, pointing to a homologous nonverbal vocal communication system shared across mammals, including humans. We aim to illustrate how this form-function approach can provide a solid framework for making predictions, including about cross-species and cross-cultural universals or variations in the production and perception of nonverbal vocalisations. Despite preliminary evidence that key features of human vocalisations may indeed be universal and develop reliably across distinct cultures, including small-scale societies, we emphasise the important role of vocal control in their production among humans. Unlike most other terrestrial mammals including nonhuman primates, people can flexibly manipulate vocalisations, from conversational laughter and fake pleasure moans to exaggerated roar-like threat displays. We discuss how human vocalisations may thus represent the cradle of vocal control, a precursor of human speech articulation, providing important insight into the origins of speech. Finally, we describe how ground-breaking parametric synthesis technologies are now allowing researchers to create highly naturalistic, yet fully experimentally controlled vocal stimuli to directly test hypotheses about form and function in nonverbal vocalisations, opening the way for a new era of voice sciences.
Article
Full-text available
Linguistic communication requires speakers to mutually agree on the meanings of words, but how does such a system first get off the ground? One solution is to rely on iconic gestures: visual signs whose form directly resembles or otherwise cues their meaning without any previously established correspondence. However, it is debated whether vocalizations could have played a similar role. We report the first extensive cross-cultural study investigating whether people from diverse linguistic backgrounds can understand novel vocalizations for a range of meanings. In two comprehension experiments, we tested whether vocalizations produced by English speakers could be understood by listeners from 28 languages from 12 language families. Listeners from each language were more accurate than chance at guessing the intended referent of the vocalizations for each of the meanings tested. Our findings challenge the often-cited idea that vocalizations have limited potential for iconic representation, demonstrating that in the absence of words people can use vocalizations to communicate a variety of meanings.
Article
Full-text available
In early infancy, melody provides the most salient prosodic element for language acquisition and there is huge evidence for infants’ precocious aptitudes for musical and speech melody perception. Yet, a lack of knowledge remains with respect to melody patterns of infants’ vocalisations. In a search for developmental regularities of cry and non-cry vocalisations and for building blocks of prosody (intonation) over the first 6 months of life, more than 67,500 melodies (fundamental frequency contours) of 277 healthy infants from monolingual German families were quantitatively analysed. Based on objective criteria, vocalisations with well-identifiable melodies were grouped into those exhibiting a simple (single-arc) or complex (multiple-arc) melody pattern. Longitudinal analysis using fractional polynomial multi-level mixed effects logistic regression models were applied to these patterns. A significant age (but not sex) dependent developmental pattern towards more complexity was demonstrated in both vocalisation types over the observation period. The theoretical concept of melody development (MD-Model) contends that melody complexification is an important building block on the path towards language. Recognition of this developmental process will considerably improve not only our understanding of early preparatory processes for language acquisition, but most importantly also allow for the creation of clinically robust risk markers for developmental language disorders.
Article
Full-text available
Although previous research has shown that there exist individual and cross-linguistic differences in planning strategies during language production, little is known about how such individual differences might vary depending on which language a speaker is planning. The present series of studies examines individual differences in planning strategies exhibited by speakers of American English, French, and German. Participants were asked to describe images on a computer monitor while their eye movements were monitored. In addition, we measured participants' working memory capacity and speed of processing. The results indicate that in the present study, English and German were planned less incrementally (further in advance) prior to speech onset compared to French, which was planned more incrementally (not as far in advance). Crucially, speed of processing predicted the scope of planning for French speakers, but not for English or German speakers. These results suggest that the different planning strategies that are invoked by syntactic choices available in different languages are associated with the tendency for speakers to rely on different cognitive support systems as they plan sentences.
Article
Full-text available
Background: The ability to recombine smaller units to produce infinite structures of higher-order phrases is unique to human language, yet evidence of animals to combine multiple acoustic units into meaningful combinations increases constantly. Despite increasing evidence for meaningful call combinations across contexts, little attention has been paid to the potential role of temporal variation of call type composition in longer vocal sequences in conveying information about subtle changes in the environment or individual differences. Here, we investigated the composition and information content of sentinel call sequences in meerkats (Suricata suricatta). While being on sentinel guard, a coordinated vigilance behaviour, meerkats produce long sequences composed of six distinct sentinel call types and alarm calls. We analysed recordings of sentinels to test if the order of the call types is graded and whether they contain additional group-, individual-, age- or sex-specific vocal signatures. Results: Our results confirmed that the six distinct types of sentinel calls in addition to alarm calls were produced in a highly graded way, likely referring to changes in the perceived predation risk. Transitions between call types one step up or down the a priory assumed gradation were over-represented, while transitions over two or three steps were significantly under-represented. Analysing sequence similarity within and between groups and individuals demonstrated that sequences composed of the most commonly emitted sentinel call types showed high within-individual consistency whereby adults and females had higher consistency scores than subadults and males respectively. Conclusions: We present a novel type of combinatoriality where the order of the call types contains temporary contextual information, and also relates to the identity of the caller. By combining different call types in a graded way over long periods, meerkats constantly convey meaningful information about subtle changes in the external environment, while at the same time the temporal pattern of the distinct call types contains stable information about caller identity. Our study demonstrates how complex animal call sequences can be described by simple rules, in this case gradation across acoustically distinct, but functionally related call types, combined with individual-specific call patterns.
Article
Full-text available
Information compression is a general principle of human language: the most frequent words are shorter in length (Zipf's Law of Brevity) and the duration of constituents decreases as the size of the linguistic construct increases (Menzerath-Altmann Law). Vocal sequences of non-human primates have been shown to conform to both these laws, suggesting information compression might be a more general principle. Here, we investigated whether display songs of the African penguin, which mediate recognition, intersex-ual mate choice and territorial defence, conform with these laws. Display songs are long, loud sequences combining three types of syllables. We found that the shortest type of syllable was the most frequent (with the shortest syllable being repeated stereotypically, potentially favouring signal redundancy in crowded environments). We also found that the average duration of the song's constituents was negatively correlated with the size of the song (a consequence of increasing the relative number of the shortest syllable type, rather than reducing the duration across all syllable types, thus preserving the communication of size-related information in the duration of the longest syllable type). Our results provide the first evidence for conformity to Zipf's and Menzerath-Altmann Laws in the vocal sequences of a non-primate species, indicating that these laws can coexist with selection pressures specific to the species' ecology.
Article
Full-text available
Sensorimotor information plays a fundamental role in cognition. However, the existing materials that measure the sensorimotor basis of word meanings and concepts have been restricted in terms of their sample size and breadth of sensorimotor experience. Here we present norms of sensorimotor strength for 39,707 concepts across six perceptual modalities (touch, hearing, smell, taste, vision, and interoception) and five action effectors (mouth/throat, hand/arm, foot/leg, head excluding mouth/throat, and torso), gathered from a total of 3,500 individual participants using Amazon’s Mechanical Turk platform. The Lancaster Sensorimotor Norms are unique and innovative in a number of respects: They represent the largest-ever set of semantic norms for English, at 40,000 words × 11 dimensions (plus several informative cross-dimensional variables), they extend perceptual strength norming to the new modality of interoception, and they include the first norming of action strength across separate bodily effectors. In the first study, we describe the data collection procedures, provide summary descriptives of the dataset, and interpret the relations observed between sensorimotor dimensions. We then report two further studies, in which we (1) extracted an optimal single-variable composite of the 11-dimension sensorimotor profile (Minkowski 3 strength) and (2) demonstrated the utility of both perceptual and action strength in facilitating lexical decision times and accuracy in two separate datasets. These norms provide a valuable resource to researchers in diverse areas, including psycholinguistics, grounded cognition, cognitive semantics, knowledge representation, machine learning, and big-data approaches to the analysis of language and conceptual representations. The data are accessible via the Open Science Framework (http://osf.io/7emr6/) and an interactive web application (https://www.lancaster.ac.uk/psychology/lsnorms/).
Article
Full-text available
Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes.