Conference PaperPDF Available

Exploring the sound structure of novel vocalizations

March 2024

March 2024

Conference: Evolang
At: Madison WI

Authors:

Susanne Fuchs

Leibniz Zentrum Allgemeine Sprachwissenschaft

Šárka Kadavá

Leibniz-Centre General Linguistics

Wim Pouw

Radboud University

Show all 7 authorsHide

When humans speak or animals vocalize, they can produce sounds that are further combined into larger sequences. The flexibility of sound combinations into larger meaningful sequences is one of the hallmarks of human language. To some extent, this has also been found in other species, like chimpanzees and birds. The current study investigates the structure of sounds when speakers are asked to communicate the meaning of 20 selected concepts without using language. Our results show that the structure of sounds between pauses is frequently limited to 1-3 sounds. This structure is less complex than when humans use their native language. The acoustic distance between sounds depends largely on the concept apart from concepts referring to animals, which show a higher diversity of involved sounds. This exploratory analysis might provide evidence of how the structure of sound could have changed from simple to complex in evolution.

Concepts used in this study. L-J corresponds to the Leipzig-Jakarta list.

…

Figures - uploaded by Susanne Fuchs

Content may be subject to copyright.

Content uploaded by Susanne Fuchs

Content may be subject to copyright.

Exploring the sound structure of novel vocalizations

Susanne Fuchs*1,ˇ

S´

arka Kadav´

a*1,2,3, Wim Pouw3, Bradley Walker4, Nicolas Fay4, Bodo Winter5,

and Aleksandra ´

Cwiek1

*Shared ﬁrst authorship, corresponding authors: fuchs|kadava@leibniz-zas.de

1Leibniz-Zentrum Allgemeine Sprachwissenschaft, Berlin, Germany

2Linguistik, Georg-August Universit¨

at, G¨

ottingen, Germany

3Donders Institute for Brain, Cognition and Behaviour, Nijmegen, The Netherlands

4School of Psychological Science, The University of Western Australia, Perth, Australia

5Depart. of English Language and Linguistics, Uni of Birmingham, Birmingham, United Kingdom

When humans speak or animals vocalize, they can produce sounds that are further combined

into larger sequences. The ﬂexibility of sound combinations into larger meaningful sequences

is one of the hallmarks of human language. To some extent, this has also been found in other

species, like chimpanzees and birds. The current study investigates the structure of sounds

when speakers are asked to communicate the meaning of 20 selected concepts without using

language. Our results show that the structure of sounds between pauses is frequently limited to

1–3 sounds. This structure is less complex than when humans use their native language. The

acoustic distance between sounds depends largely on the concept apart from concepts referring

to animals, which show a higher diversity of involved sounds. This exploratory analysis might

provide evidence of how the structure of sound could have changed from simple to complex in

evolution.

1. Introduction

Human speech is composed of small units: sounds that are meaning-

distinguishing (phonemes). Several sounds combine into syllables, words, and

phrases that carry meaning(s). The sequential order of sounds into larger se-

quences is a milestone in speech acquisition, and already young infants can start

producing sequences of vocalization before they acquire their mother tongue

(Wermke, Robb, & Schluter, 2021). Even when language is acquired, nonver-

bal vocalizations are present in adult communication and are an emerging ﬁeld of

study at the boundaries between non-human and human communication (Pisanski,

Bryant, Cornec, Anikin, & Reby, 2022). That means sequences of sounds are not

a property of human communication alone but are also found in non-human ani-

mals like birds (Sainburg, Theilman, Thielk, & Gentner, 2019; Doupe & Kuhl,

1999; Favaro et al., 2020), meerkats (Rauber, Kranstauber, & Manser, 2020),

chimpanzees (Girard-Buttoz et al., 2022). Comparative approaches between hu-

man and non-human animal vocalization deserve bottom-up methodologies rather

than human-centric analyses (Hoeschele, Wagner, & Mann, 2023). What has been

called a syllable in non-human vocalization refers to sound(s) produced between

pauses. In human speech production, similar chunks have often been termed inter-

pausal units (Bigi & Priego-Valverde, 2019; Prakash & Murthy, 2019). They refer

to speech that is realized between pauses.

In this exploratory study, we are interested in sounds realized in novel vocal-

izations during a charade game, i.e., in a situation where the use of actual words

of the participant’s language is ‘forbidden’. This paradigm has been used to inves-

tigate the origin and evolution of language (Fay et al., 2022; ´

Cwiek et al., 2021;

Perlman & Lupyan, 2018).

This paper aims to explore how many sounds are realized between pauses in

non-linguistic vocalizations. Furthermore, we investigate the diversity of sounds

realized within different concepts, by assessing the distance between them in a

multi-variable acoustic space.

2. Methodology

2.1. Corpus creation

The present study uses a subset of data collected in a larger study in which partic-

ipants were recorded performing a series of concepts in three conditions. In the

three conditions, participants are asked to communicate a set of concepts using

either (1) only gestures, (2) only non-linguistic vocalizations and other sounds, or

(3) a combination of gestures and vocalizations. Here, we focus on a subset of the

vocalization recordings. We have not analyzed the vocalizations that are produced

in the multimodal condition because we assume that ﬁrst, they are not stand-alone

carriers of the meaning, and second, their forms are shaped by the coordination

with body motion.

The recordings analyzed here were produced by 62 ﬁrst-year psychology stu-

dents at the University of Western Australia (43 female, 17 male, 2 non-binary;

aged 17–33, M = 20.21, SD = 3.36). All were speakers of English. Of these, 28

participated in person and 34 remotely via Microsoft Teams, due to COVID-19

restrictions. Participants were allocated 60 concepts to communicate (20 in each

modality condition), sampled from a list of 200 concepts comprising the 100-item

Leipzig-Jakarta list of basic vocabulary (Tadmor, 2009) plus 100 other basic con-

cepts chosen based on their sensory and modality preferences (Lynott, Connell,

Brysbaert, Brand, & Carney, 2020). They were asked to communicate each con-

cept using the speciﬁed modality (and without using language) so that another

person would be able to view the recording and guess the concept from a list of

options. If the participants could not think of a way to communicate a concept,

they were permitted to skip it.

2.2. Concept extraction

For the exploratory analysis, we focused on a variety of concepts that might re-

ﬂect different degrees of concreteness and abstraction (see 1). For example, the

concept maybe is rather abstract or logical than smoke. We chose these different

concepts to have a wider semantic potential, but have not added categories to the

concepts, because a dichotomy between concreteness vs. abstraction has currently

been questioned (Banks et al., 2023).

Table 1. Concepts used in this study. L-J

corresponds to the Leipzig-Jakarta list.

Concept List No. of speakers

happy other 6

sad other 7

bad other 7

scared other 5

good L-J 6

angry other 7

disgusted other 7

dog L-J 6

cat other 6

bird L-J 5

ﬁsh L-J 5

ﬂy L-J 8

old L-J 4

spoon other 5

egg L-J 6

ash L-J 3

stone/rock L-J 6

smoke L-J 4

maybe other 8

not L-J 7

Our analysis only included concepts

for which initially at least 5 partici-

pants produced vocalizations. For three

concepts we excluded acoustic trials as

they contained a considerable amount

of background noise that made an anal-

ysis unreliable.

2.3. Acoustic

annotation procedures

The acoustic data were labeled in Praat

6.1.51 (Boersma & Weenink, 2021) by

three annotators who are phoneticians

by training. Following Swets, Fuchs,

Krivokapi´

c, and Petrone (2021), all

silent intervals longer than 100 ms were

treated as pauses and labeled with ‘p’.

Apart from placing boundaries next to

pauses, the annotators additionally la-

beled successive sounds without pauses.

The following criteria were used in the

decision-making process for separating

the speech stream into two or more

sounds: a) two (or more) prominent amplitude peaks in the amplitude envelope

were present, b) changes in spectral characteristics (e.g., formant structures) were

present, and c) sounds were perceptually distinct. Variations in fundamental fre-

quency, e.g., a downward and then upward motion, were only considered as two

sounds when they also showed spectral differences in higher frequency ranges

and/or differences in the amplitude envelope. All sounds were labeled with an

initial ‘s’ and successive numbers when they occurred in a sequence. The ﬁrst

annotator (a1) created the annotation criteria and labeled the data. Annotator 2

(a2) used the available TextGrids from a1 and changed the boundaries when she

disagreed. Both agreed on 94.6 percent of the number of sounds. Hereafter, a1

inspected all acoustic ﬁles again where disagreement was found and conﬁrmed the

changes. Annotator 3 (a3) started labeling from scratch without having TextGrids

available. Inter-rater agreement between a2 and a3 was 96.7 percent concerning

the overall number of labeled sounds. The temporal differences between the onset

of a given sound labeled by a2 and its closed temporal neighbor labeled by a3

were calculated. The same was done for the offset of a sound. The differences

were on average 0.048s (median = 0.018s) for the onset and 0.088s (median =

0.027s) for the offset. These differences are inﬂuenced by the number of sounds

an annotator labeled for a given concept, which makes the calculation of inter-

rater agreement challenging. We think that for the current exploratory analysis,

the overall agreement is reasonable. We decided to take a2’s segmentation for

further analysis.

Figure 1. Example for acoustic annotation of the concept smoke. All segments are labeled as ‘s’ and

pauses as ‘p’. The red line depicts the intensity curve.

2.4. Analyses of acoustic similarity

Initially, all audio ﬁles were cut into segment-sized ﬁles using a custom Python

script. Acoustic analysis was performed on these sounds, using the analyze() func-

tion of the soundgen package in R(Anikin, 2019). The output of this function

consists of more than 100 acoustic parameters as listed in the documentation (e.g.,

f0, amplitude, formant values, entropy, and their respective mean, median, and

standard deviation). Some of these acoustic parameters are present or absent in

the recorded sounds, e.g., voicing. However, the presence of voicing is redundant

with intensity because voiced sounds are louder than voiceless ones and intensity

values can always be calculated. That means, some acoustic parameters are highly

correlated and redundant with others. For this reason, we excluded parameters re-

sulting in NA values in the post-processing. Moreover, we excluded voice quality

parameters (e.g., ﬂux), because these parameters may have been very sensitive to

background noise, which occurred in some speakers. All ﬁnal parameters were

averaged for the whole time series of a sound, and we used mean and standard

deviation for further explorations. We ended up with a multidimensional dataset

consisting of 45 acoustic parameters. For the analysis of acoustic similarity, we

calculated the Euclidean distance between the vector of acoustic parameters of

each sound, to all other sounds. As a result, we got a distance matrix that allowed

us to extract an average distance between sounds within a trial of a concept and

compare it to other concepts.

3. Results and Discussion

3.1. Structural similarity

To explore structural similarity, we analyzed if certain sounds occurring between

pauses appear alone or in successive order. When speakers try to communicate

concepts using novel vocalizations, they frequently realize a relatively small num-

ber of sounds between two pauses: 1 sound occurred 208 times, 2 sounds = 80

times, 3 sounds = 35 times, 4 sounds = 24 times, 5 sounds = 11 times, 6 sounds =

3 times, 8 sounds = 4 times, 9 sounds = 1 time, 10 sounds = 1 time, 16 sounds =

2 times, 18 sounds = 1 time. That means structurally most concepts (208 cases in

our dataset) are realized with only one sound <s>that is surrounded by pauses.

In 80 cases we found realizations of two successive sounds <ss>and in 35 cases

participants produced three successive sounds <sss>without being interrupted

by a pause. If the data are split by concept, vocalizations for cat,dog, and bird

(all within a broader class of animals) also have more than three successive sound

combinations, probably mirroring onomatopoeia. For the rest of the data, no con-

clusions can be drawn, because the number of sounds between pauses is concept-

speciﬁc.

If pauses are taken into account, sounds were combined ﬂexibly, for exam-

ple, for four sounds we could get combinations such as <s|s|s|s>or <ss|ss>or

<ss|s|s>where |marks a pause.

3.2. Acoustic similarity

Similar sounds may be repeated, like in imitating ‘coo-coo’, or they may be of

different acoustic quality, like in imitating a cat’s ‘meow’. For this reason, we

were further interested in examining the similarity between sounds that make up

a novel vocalization.

To have a ﬁrst look into the diversity of sounds, we analyzed their average

acoustic distance within each trial. We preferred this data-driven approach in

contrast to labeling the data to phonemic features because it allows us to include

sounds that may not occur in the English phoneme inventory, e.g., whistles or

clicks. It represents continuous acoustic data instead of putting categorical labels

to it, which could also be biased by the native language of the annotator.

Figure 2 depicts the results. We can see that the different concepts vary in their

average acoustic distance between sounds. Some abstract concepts like not consist

of sounds that are closer to each other in distance (i.e., more similar), while dog

has a larger average acoustic distance between the sounds. Those concepts with

several successive sounds (e.g., <sss>) are also the ones with the largest average

distance.

Figure 2. Average acoustic distances between sounds within a single trial displayed by concept,

boxplots, and half-violins in purple display data distribution, black dots correspond to single trials.

Each concept is displayed at the x-axis and ordered by alphabet.

In summary, the structure of novel vocalizations obtained from a charade game

most often contains either one, two, or three successive sounds that are not sep-

arated by pauses. This may to some extent be similar to infant’s vocalization

(Wermke et al., 2021) and non-human species. It is different from human speech

production, where already syllables or morphemes can consist of three sounds.

Those are combined into larger chunks that are not interrupted by pauses. Our

ﬁndings suggest that novel vocalizations have a rather simple sound structure that

is complexiﬁed (i.e., more and probably shorter sounds are realized in a sequence)

during language evolution.

4. Supplementary Materials

Dataset and scripts are available on https://github.com/sarkadava/

Evolang2024_SoundSimilarity.

Acknowledgements

We like to thank the reviewers of Evolang, the participants of the study, and

Melissa Ebert for data annotation. This work has been supported by a grant from

the German Research Council (FU791/9-1).

References

Anikin, A. (2019). Soundgen: An open-source tool for synthesizing nonverbal

vocalizations. Behavior research methods,51, 778–792.

Banks, B., Borghi, A. M., Fargier, R., Fini, C., Jonauskaite, D., Mazzuca, C.,

Montalti, M., Villani, C., & Woodin, G. (2023). Consensus paper: Current

perspectives on abstract concepts and future research directions. Journal of

Cognition,6(1).

Bigi, B., & Priego-Valverde, B. (2019). Search for inter-pausal units: application

to cheese! corpus. In 9th language & technology conference: Human lan-

guage technologies as a challenge for computer science and linguistics (pp.

289–293).

Boersma, P., & Weenink, D. (2021). Praat: doing phonetics by computer [com-

puter program](2011). Version,5(3), 74.

Cwiek, A., Fuchs, S., Draxler, C., Asu, E. L., Dediu, D., Hiovain, K., Kawahara,

S., Koutalidis, S., Krifka, M., Lippus, P., et al.. (2021). Novel vocalizations

are understood across cultures. Scientiﬁc Reports,11(1), 10108.

Doupe, A. J., & Kuhl, P. K. (1999). Birdsong and human speech: Common

themes and mechanisms. Annual Review of Neuroscience,22(1), 567–631.

Favaro, L., Gamba, M., Cresta, E., Fumagalli, E., Bandoli, F., Pilenga, C., Isaja,

V., Mathevon, N., & Reby, D. (2020). Do penguins’ vocal sequences con-

form to linguistic laws? Biology letters,16(2), 20190589.

Fay, N., Walker, B., Ellison, T. M., Blundell, Z., De Kleine, N., Garde, M., Lister,

C. J., & Goldin-Meadow, S. (2022). Gesture is the primary modality for lan-

guage creation. Proceedings of the Royal Society B,289(1970), 20220066.

Girard-Buttoz, C., Zaccarella, E., Bortolato, T., Friederici, A. D., Wittig, R. M., &

Crockford, C. (2022). Chimpanzees produce diverse vocal sequences with

ordered and recombinatorial properties. Communications Biology,5(1),

410.

Hoeschele, M., Wagner, B., & Mann, D. C. (2023). Lessons learned in animal

acoustic cognition through comparisons with humans. Animal Cognition,

26(1), 97–116.

Lynott, D., Connell, L., Brysbaert, M., Brand, J., & Carney, J. (2020). The lan-

caster sensorimotor norms: multidimensional measures of perceptual and

action strength for 40,000 english words. Behavior Research Methods,52,

1271–1291.

Perlman, M., & Lupyan, G. (2018). People can create iconic vocalizations to

communicate various meanings to na¨

ıve listeners. Scientiﬁc reports,8(1),

2634.

Pisanski, K., Bryant, G. A., Cornec, C., Anikin, A., & Reby, D. (2022). Form

follows function in human nonverbal vocalisations. Ethology Ecology &

Evolution,34(3), 303–321.

Prakash, J. J., & Murthy, H. A. (2019). Analysis of inter-pausal units in indian

languages and its application to text-to-speech synthesis. IEEE/ACM Trans-

actions on Audio, Speech, and Language Processing,27(10), 1616–1628.

Rauber, R., Kranstauber, B., & Manser, M. B. (2020). Call order within vocal

sequences of meerkats contains temporary contextual and individual infor-

mation. BMC biology,18, 1–11.

Sainburg, T., Theilman, B., Thielk, M., & Gentner, T. Q. (2019). Parallels in the

sequential organization of birdsong and human speech. Nature communi-

cations,10(1), 3636.

Swets, B., Fuchs, S., Krivokapi´

c, J., & Petrone, C. (2021). A cross-linguistic study

of individual differences in speech planning. Frontiers in Psychology,12,

655516.

Tadmor, U. (2009). Loanwords in the world’s languages: Findings and results.

Loanwords in the world’s languages: A comparative handbook,55, 75.

Wermke, K., Robb, M. P., & Schluter, P. J. (2021). Melody complexity of infants’

cry and non-cry vocalisations increases across the ﬁrst six months. Scientiﬁc

reports,11(1), 4137.

ResearchGate has not been able to resolve any citations for this publication.

Consensus Paper: Current Perspectives on Abstract Concepts and Future Research Directions

Article

Full-text available

Oct 2023

Abstract concepts are relevant to a wide range of disciplines, including cognitive science, linguistics, psychology, cognitive, social, and affective neuroscience, and philosophy. This consensus paper synthesizes the work and views of researchers in the field, discussing current perspectives on theoretical and methodological issues, and recommendations for future research. In this paper, we urge researchers to go beyond the traditional abstract-concrete dichotomy and consider the multiple dimensions that characterize concepts (e.g., sensorimotor experience, social interaction, conceptual metaphor), as well as the mediating influence of linguistic and cultural context on conceptual representations. We also promote the use of interactive methods to investigate both the comprehension and production of abstract concepts, while also focusing on individual differences in conceptual representations. Overall, we argue that abstract concepts should be studied in a more nuanced way that takes into account their complexity and diversity, which should permit us a fuller, more holistic understanding of abstract cognition.

Lessons learned in animal acoustic cognition through comparisons with humans

Article

Full-text available

Dec 2022
ANIM COGN

Humans are an interesting subject of study in comparative cognition. While humans have a lot of anecdotal and subjective knowledge about their own minds and behaviors, researchers tend not to study humans the way they study other species. Instead, comparisons between humans and other animals tend to be based on either assumptions about human behavior and cognition, or very different testing methods. Here we emphasize the importance of using insider knowledge about humans to form interesting research questions about animal cognition while simultaneously stepping back and treating humans like just another species as if one were an alien researcher. This perspective is extremely helpful to identify what aspects of cognitive processes may be interesting and relevant across the animal kingdom. Here we outline some examples of how this objective human-centric approach has helped us to move forward knowledge in several areas of animal acoustic cognition (rhythm, harmonicity, and vocal units). We describe how this approach works, what kind of benefits we obtain, and how it can be applied to other areas of animal cognition. While an objective human-centric approach is not useful when studying traits that do not occur in humans (e.g., magnetic spatial navigation), it can be extremely helpful when studying traits that are relevant to humans (e.g., communication). Overall, we hope to entice more people working in animal cognition to use a similar approach to maximize the benefits of being part of the animal kingdom while maintaining a detached and scientific perspective on the human species.

Form follows function in human nonverbal vocalisations

Article

Full-text available

Feb 2022

Until recently, human nonverbal vocalisations such as cries, laughs, screams, moans, and groans have received relatively little attention in the human behavioural sciences. Yet these vocal signals are ubiquitous in human social interactions across diverse cultures and may represent a missing link between relatively fixed nonhuman animal vocalisations and highly flexible human speech. Here, we review converging empirical evidence that the acoustic structure (“forms”) of these affective vocal sounds in humans reflect their evolved biological and social “functions”. Human nonverbal vocalisations thus largely parallel the form-function mapping found in the affective calls of other animals, such as play vocalisations, distress cries, and aggressive roars, pointing to a homologous nonverbal vocal communication system shared across mammals, including humans. We aim to illustrate how this form-function approach can provide a solid framework for making predictions, including about cross-species and cross-cultural universals or variations in the production and perception of nonverbal vocalisations. Despite preliminary evidence that key features of human vocalisations may indeed be universal and develop reliably across distinct cultures, including small-scale societies, we emphasise the important role of vocal control in their production among humans. Unlike most other terrestrial mammals including nonhuman primates, people can flexibly manipulate vocalisations, from conversational laughter and fake pleasure moans to exaggerated roar-like threat displays. We discuss how human vocalisations may thus represent the cradle of vocal control, a precursor of human speech articulation, providing important insight into the origins of speech. Finally, we describe how ground-breaking parametric synthesis technologies are now allowing researchers to create highly naturalistic, yet fully experimentally controlled vocal stimuli to directly test hypotheses about form and function in nonverbal vocalisations, opening the way for a new era of voice sciences.

Novel vocalizations are understood across cultures

Article

Full-text available

May 2021

Linguistic communication requires speakers to mutually agree on the meanings of words, but how does such a system first get off the ground? One solution is to rely on iconic gestures: visual signs whose form directly resembles or otherwise cues their meaning without any previously established correspondence. However, it is debated whether vocalizations could have played a similar role. We report the first extensive cross-cultural study investigating whether people from diverse linguistic backgrounds can understand novel vocalizations for a range of meanings. In two comprehension experiments, we tested whether vocalizations produced by English speakers could be understood by listeners from 28 languages from 12 language families. Listeners from each language were more accurate than chance at guessing the intended referent of the vocalizations for each of the meanings tested. Our findings challenge the often-cited idea that vocalizations have limited potential for iconic representation, demonstrating that in the absence of words people can use vocalizations to communicate a variety of meanings.

Melody complexity of infants’ cry and non-cry vocalisations increases across the first six months

Article

Full-text available

Feb 2021

In early infancy, melody provides the most salient prosodic element for language acquisition and there is huge evidence for infants’ precocious aptitudes for musical and speech melody perception. Yet, a lack of knowledge remains with respect to melody patterns of infants’ vocalisations. In a search for developmental regularities of cry and non-cry vocalisations and for building blocks of prosody (intonation) over the first 6 months of life, more than 67,500 melodies (fundamental frequency contours) of 277 healthy infants from monolingual German families were quantitatively analysed. Based on objective criteria, vocalisations with well-identifiable melodies were grouped into those exhibiting a simple (single-arc) or complex (multiple-arc) melody pattern. Longitudinal analysis using fractional polynomial multi-level mixed effects logistic regression models were applied to these patterns. A significant age (but not sex) dependent developmental pattern towards more complexity was demonstrated in both vocalisation types over the observation period. The theoretical concept of melody development (MD-Model) contends that melody complexification is an important building block on the path towards language. Recognition of this developmental process will considerably improve not only our understanding of early preparatory processes for language acquisition, but most importantly also allow for the creation of clinically robust risk markers for developmental language disorders.

A Cross-Linguistic Study of Individual Differences in Speech Planning

Article

Full-text available

May 2021

Although previous research has shown that there exist individual and cross-linguistic differences in planning strategies during language production, little is known about how such individual differences might vary depending on which language a speaker is planning. The present series of studies examines individual differences in planning strategies exhibited by speakers of American English, French, and German. Participants were asked to describe images on a computer monitor while their eye movements were monitored. In addition, we measured participants' working memory capacity and speed of processing. The results indicate that in the present study, English and German were planned less incrementally (further in advance) prior to speech onset compared to French, which was planned more incrementally (not as far in advance). Crucially, speed of processing predicted the scope of planning for French speakers, but not for English or German speakers. These results suggest that the different planning strategies that are invoked by syntactic choices available in different languages are associated with the tendency for speakers to rely on different cognitive support systems as they plan sentences.

Call order within vocal sequences of meerkats contains temporary contextual and individual information

Article

Full-text available

Sep 2020
BMC BIOL

Background: The ability to recombine smaller units to produce infinite structures of higher-order phrases is unique to human language, yet evidence of animals to combine multiple acoustic units into meaningful combinations increases constantly. Despite increasing evidence for meaningful call combinations across contexts, little attention has been paid to the potential role of temporal variation of call type composition in longer vocal sequences in conveying information about subtle changes in the environment or individual differences. Here, we investigated the composition and information content of sentinel call sequences in meerkats (Suricata suricatta). While being on sentinel guard, a coordinated vigilance behaviour, meerkats produce long sequences composed of six distinct sentinel call types and alarm calls. We analysed recordings of sentinels to test if the order of the call types is graded and whether they contain additional group-, individual-, age- or sex-specific vocal signatures. Results: Our results confirmed that the six distinct types of sentinel calls in addition to alarm calls were produced in a highly graded way, likely referring to changes in the perceived predation risk. Transitions between call types one step up or down the a priory assumed gradation were over-represented, while transitions over two or three steps were significantly under-represented. Analysing sequence similarity within and between groups and individuals demonstrated that sequences composed of the most commonly emitted sentinel call types showed high within-individual consistency whereby adults and females had higher consistency scores than subadults and males respectively. Conclusions: We present a novel type of combinatoriality where the order of the call types contains temporary contextual information, and also relates to the identity of the caller. By combining different call types in a graded way over long periods, meerkats constantly convey meaningful information about subtle changes in the external environment, while at the same time the temporal pattern of the distinct call types contains stable information about caller identity. Our study demonstrates how complex animal call sequences can be described by simple rules, in this case gradation across acoustically distinct, but functionally related call types, combined with individual-specific call patterns.

Do penguins' vocal sequences conform to linguistic laws?

Article

Full-text available

Feb 2020

Information compression is a general principle of human language: the most frequent words are shorter in length (Zipf's Law of Brevity) and the duration of constituents decreases as the size of the linguistic construct increases (Menzerath-Altmann Law). Vocal sequences of non-human primates have been shown to conform to both these laws, suggesting information compression might be a more general principle. Here, we investigated whether display songs of the African penguin, which mediate recognition, intersex-ual mate choice and territorial defence, conform with these laws. Display songs are long, loud sequences combining three types of syllables. We found that the shortest type of syllable was the most frequent (with the shortest syllable being repeated stereotypically, potentially favouring signal redundancy in crowded environments). We also found that the average duration of the song's constituents was negatively correlated with the size of the song (a consequence of increasing the relative number of the shortest syllable type, rather than reducing the duration across all syllable types, thus preserving the communication of size-related information in the duration of the longest syllable type). Our results provide the first evidence for conformity to Zipf's and Menzerath-Altmann Laws in the vocal sequences of a non-primate species, indicating that these laws can coexist with selection pressures specific to the species' ecology.

The Lancaster Sensorimotor Norms: multidimensional measures of perceptual and action strength for 40,000 English words

Article

Full-text available

Dec 2019
BEHAV RES METHODS

Sensorimotor information plays a fundamental role in cognition. However, the existing materials that measure the sensorimotor basis of word meanings and concepts have been restricted in terms of their sample size and breadth of sensorimotor experience. Here we present norms of sensorimotor strength for 39,707 concepts across six perceptual modalities (touch, hearing, smell, taste, vision, and interoception) and five action effectors (mouth/throat, hand/arm, foot/leg, head excluding mouth/throat, and torso), gathered from a total of 3,500 individual participants using Amazon’s Mechanical Turk platform. The Lancaster Sensorimotor Norms are unique and innovative in a number of respects: They represent the largest-ever set of semantic norms for English, at 40,000 words × 11 dimensions (plus several informative cross-dimensional variables), they extend perceptual strength norming to the new modality of interoception, and they include the first norming of action strength across separate bodily effectors. In the first study, we describe the data collection procedures, provide summary descriptives of the dataset, and interpret the relations observed between sensorimotor dimensions. We then report two further studies, in which we (1) extracted an optimal single-variable composite of the 11-dimension sensorimotor profile (Minkowski 3 strength) and (2) demonstrated the utility of both perceptual and action strength in facilitating lexical decision times and accuracy in two separate datasets. These norms provide a valuable resource to researchers in diverse areas, including psycholinguistics, grounded cognition, cognitive semantics, knowledge representation, machine learning, and big-data approaches to the analysis of language and conceptual representations. The data are accessible via the Open Science Framework (http://osf.io/7emr6/) and an interactive web application (https://www.lancaster.ac.uk/psychology/lsnorms/).

Parallels in the sequential organization of birdsong and human speech

Article

Full-text available

Aug 2019

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes.

Exploring the sound structure of novel vocalizations

Abstract and Figures

Recommended publications

'If we want to develop AI that helps people, we need all the brainpower we can get.'

What do we mean when we say gestures are more expressive than vocalizations? An experimental and sim...

Sounds Full of Meaning and the Evolution of Language

When Gestures Enter the Game, Prosody Breaks the Rules

Is gesture-speech physics at work in rhythmic pointing? Evidence from Polish counting-out rhymes

Is Gesture-Speech Physics At Work In Rhythmic Pointing? Evidence From Polish Counting-out Rhymes