Content uploaded by Michael Pleyer
Author content
All content in this area was uploaded by Michael Pleyer on May 14, 2024
Content may be subject to copyright.
85
ISSN: 1911-4745
Toward Interdisciplinary Integration in the Study of
Comparative Cognition: Insights from Studying the
Evolution of Multimodal Communication
Elizabeth Qing Zhang
Jiangsu Normal University
Michael Pleyer
Nicolaus Copernicus University in Toruń
In this article, we highlight the importance of interdisciplinary integration in the study of comparative
cognition. Specically, we argue that the study of comparative cognition can benet from broadening
its focus and integrating information from diverse subelds and including collaborations from other
elds. We take the evolution of multimodal communication as an example to illustrate that an interdis-
ciplinary integration of linguistics, animal behavior, cognitive neuroscience, and genetics provides a
more comprehensive picture of this phenomenon.
Keywords: multimodal communication, interdisciplinary integration, linguistics, animal behavior,
cognitive neuroscience
DOI:10.3819/CCBR.2024.190017 Volume 19, 2024
One of the most promising advancements in com-
parative cognition in recent years is that comparative
psychologists have shifted to working on a broader range
of species and cognitive abilities and have emphasized a
team-science approach (e.g., Guillette & Sturdy, 2020).
However, a remaining challenge for comparative cogni-
tion as a whole is that it has not paid enough attention to
the integration of information from different disciplines.
Recently, there have increasingly been calls for an “in-
tegrative comparative cognition,” such as by Burmeister
and Liu (2020). They argued that neurobiological and
neurogenomic studies can shed important light on the
cognitive phenotypes that are the subject of comparative
cognition. In this article, we highlight the importance of
taking a broader range of elds into consideration using
the evolution of multimodal communication as an example
Zhang and Pleyer
TOWARD INTERDISCIPLINARY INTEGRATION IN COMPARA-
TIVE COGNITION
to illustrate the benets this interdisciplinary approach
would have on the eld of comparative cognition.
Human communication is fundamentally multi-
modal. Vocal and visual cues are integrated in human
linguistic interaction—for example, in phenomena such as
co-speech gestures and facial expressions that accompany
vocalizations in spoken language, as well as the iconicity,
sound symbolism, and cross-modal correspondences that
motivate many aspects of language structure in both spo-
ken and signed languages (e.g., Dingemanse et al., 2015;
Vigliocco et al., 2014). Appreciation of human communi-
cation as a multimodal phenomenon supports the idea that
language itself has a multimodal origin (e.g., Fröhlich et
al., 2019; Levinson & Holler, 2014). Indeed, comparative
research on nonhuman animals shows that multimodality
is a ubiquitous property of many animal communication
86
COMPARATIVE COGNITION & BEHAVIOR REVIEWS
Zhang and Pleyer
systems (Ota et al., 2015; Partan & Marler, 1999), suggest-
ing evolutionary continuity and deep evolutionary roots of
multimodal communication.
However, studies on the evolution of multimodal com-
munication have focused mainly on studying nonhuman
primates (e.g., Fröhlich & van Schaik, 2018; Liebal et al.,
2014), likely because of their close evolutionary relation-
ship to humans. Nonhuman primates show simultaneous
production of communication signals in the manual, facial,
and vocal modalities (Genty et al., 2014; Micheletta et al.,
2013). A study of chimpanzees found that about half of all
vocalizations were produced in combination with another
communicative modality (Taglialatela et al., 2015). On the
other hand, existing studies on multimodal communication
in diverse species suggest that multimodality has even
deeper evolutionary roots. However, although a wealth of
studies are on multimodal communication in different spe-
cies, few studies have explicitly addressed the evolutionary
continuity of multimodality outside of primates. That is, so
far, studies showing the ubiquity of multimodality in the
animal kingdom have not been properly integrated into an
account of the evolution of multimodal communication.
Furthermore, apart from the behavioral and cognitive lev-
els, neuroscientic and genetic (genomic) studies can also
provide revealing insights into the evolutionary continuity
of multimodal communication.
For this reason, we argue that the study of the evo-
lution of multimodal communication is in need of inter-
disciplinary integration, which we believe is an important
future challenge for the eld of comparative cognition
and behavior.
On one hand, the eld requires a combination of
various research elds that explore the role of multimodal-
ity in humans in both naturalistic and laboratory settings
(e.g., Macuch Silva et al., 2020; Rasenberg et al., 2022),
as well as research in comparative cognition on the role of
multimodality in a wide range of nonhuman species. For
example, combinations of tactile, olfactory, acoustic, and
visual cues have been reported in fruit ies (Ewing, 1983),
sh (Tavolga, 1956), and birds (Dalziell et al., 2013; Ota et
al., 2015). In songbirds, courtship displays integrate songs
with hops, head motions and beak movements (Williams,
2001). In addition, song type repertoire is coordinated tem-
porally with a dance-like movement repertoire (Dalziell &
Peters, 2013). Overall, these data indicate that multimodal
communication has a deep phylogenetic origin dating
back to invertebrates.
On the other hand, the study of the evolution of mul-
timodality would also prot from interdisciplinary insight
from neuroscience and genetics. From a neuroscientic
perspective, the hippocampus and basal ganglia both
represent conserved subcortical structures that are found
in all vertebrates. Homologous neural structures have also
been proposed for invertebrates (Lin et al., 2013; Wolff &
Strausfeld, 2015). The basal ganglia are mostly involved
in action selection, motor control, and cognitive functions
such as procedural learning and memory (Graybiel, 2005).
Functions of the hippocampus include declarative learning
and memory, navigation, and episodic memory (Voss et
al., 2017). Concerning communication, studies on vocal
production learning in animals, especially songbirds, have
demonstrated a crucial role for the basal ganglia in song
learning (Jarvis, 2019). Detailed comparisons of the neural
circuitry of songbirds and humans have also shown that
certain analogous (potentially homologous) cortico-basal
ganglia-thalamo-cortical circuits are essential to vocal
learning (Pfenning et al., 2014). Regarding the hippo-
campus, studies on human patients with amnesia suggest
that it is also vital for the production of gestures (Hilliard
et al. 2017). As the hippocampus is involved in spatial
cognition, this suggests a hippocampal contribution to
the evolution of the use of gestures in humans (Levinson,
2023). Therefore, there is suggestive evidence that the
connection between the hippocampus and the basal ganglia
could underlie multimodal communication across species.
Still, more interdisciplinary work is needed in this domain,
representing an important challenge for future work on the
evolution of multimodal communication.
Last, genetic studies have the potential to serve as an
important puzzle piece in unraveling the evolution of mul-
timodal communication from an interdisciplinary perspec-
tive. As a case in point, the integration of speech and gesture
might be inuenced by the human version of the FOXP21
1. We follow the convention that capitalized FOXP2 “refers to
the human gene, Foxp2 refers to the gene in mice and FoxP2
refers to the gene in other species” (Schatton & Scharff, 2017,
p. 26).
86
Author Note: Elizabeth Qing Zhang, School of Linguistic Sci-
ences and Arts, Jiangsu Normal University, Shanghai Road 101,
Tongshan New District, 221116, Xuzhou, Jiangsu, China.
Correspondence concerning this article should be addressed to
Elizabeth Qing Zhang at zqelizabeth@gmail.com.
Acknowledgments: Michael Pleyer was supported by project
No. 2021/43/P/HS2/02729 co-funded by the National Science
Centre and the European Union’s Horizon 2020 research and
innovation programme under the Marie Skłodowska-Curie
grant agreement no. 945339.
87
Volume 19, 2024
TOWARD INTERDISCIPLINARY INTEGRATION IN COMPARATIVE COGNITION
gene. In evolution, FoxP2 represents a conserved tran-
scription factor among vertebrates; there is also indicative
data in invertebrates. Drosophila possesses a homolog of
FoxP2: FoxP, which is responsible for sex-specic walking
and ight as well as pulse-song structure (Lawton et al.,
2014). Studies in vertebrates also indicate a connection of
FoxP2 to both the basal ganglia and hippocampus, and their
interaction in multimodal communication. When Foxp2 is
knocked out in mice, infant mice will produce abnormal
ultrasonic vocalizations (Shu et al., 2005). Further studies
in mice have demonstrated that heterozygous mutations
of Foxp2 impair sensorimotor association learning (Kurt
et al., 2012). Also in mice, Zbtb20, a repressor of Foxp2,
has been found to bind to and repress cortical layer marker
genes (including Foxp2) in the developing hippocampus
(Nielsen et al., 2014). Knockdown studies in songbirds, in
which the expression of particular genes is reduced, also
show a connection of FoxP2 to vocalizations. Specically,
if FoxP2 expression is knocked down in Area X in juvenile
zebra nches, this affects the completeness and accuracy
of song production learning (Haesler et al., 2007). More
research is needed to untangle the possible inuence of
human FOXP2 on the evolution of specically human
multimodal communication. Human FOXP2 has incorpo-
rated two xed amino acid changes in a broadly dened
transcription suppression domain (Zhang et al., 2002).
These two amino acid changes (N325S, T303N) occurred
at some point since the evolutionary split from the lineage
of chimpanzees and bonobos (Enard et al., 2002) and were
likely present before the split of Neanderthals and Homo
sapiens (Krause et al., 2007), which is currently estimated
to have happened between 800 thousand years ago and 400
thousand years ago (cf. Endicott et al., 2010; Harvati &
Reyes-Centeno, 2022). This suggests evolutionary conti-
nuity of (humanlike) multimodal communication, possibly
dating back to Homo heidelbergensis (Dediu & Levinson,
2018). However, an ongoing debate concerns the timing of
the evolution of FOXP2 and other possible subtle changes
that might have occurred since the split from the Neander-
thal lineage (see, e.g., Fisher, 2019). Animal studies offer
important further insights here, as mice injected with a
humanized version of FOXP2 showed a reduced dopamine
level, increased dendritic length, and long-term synaptic
depression (Enard et al., 2009), suggesting a role of human
FOXP2 in altering the basal ganglia structure and function.
Moreover, mice with a humanized version of FOXP2 also
show an accelerated transition from declarative learning to
procedural learning (Schreiweis et al., 2014). As the neural
bases of declarative and procedural performance are the
hippocampus and basal ganglia, respectively, this suggests
a key role of FOXP2 in better connecting the basal ganglia
and hippocampus; this connection represents an important
aspect of human multimodal communication.
The main thrust of this commentary rests on two
aspects: On one hand, it represents a call to take multi-
modality seriously in the study of communication in
different species and to include a wider range of species
in comparative cognition studies of multimodal communi-
cation. On the other hand, it is a call for interdisciplinary
integration. Using the evolution of multimodal communi-
cation as an example, we want to make a case for the idea
that comparative cognition can benet from broadening
its focus, integrating information from different subelds
and including collaborators from other elds. As we have
shown, the integration of insights from elds such as the
language sciences, animal communication, neuroscience,
and genetics has the potential to make important contribu-
tions to the study of multimodality and its evolution. We
hope that, in the future, such interdisciplinary integration
will lead to further exciting discoveries and the develop-
ment of interspecies frameworks for the study of multi-
modal communication. More generally, our discussion of
the evolution of multimodal communication serves as an
example of how broader interdisciplinary collaboration
within and outside comparative cognition can potentially
greatly move the eld forward.
References
Burmeister, S. S., & Liu, Y. (2020). Integrative compar-
ative cognition: Can neurobiology and neurogenomics
inform comparative analyses of cognitive phenotype?
Integrative and Comparative Biology, 60(4), 925–928.
https://doi.org/10.1093/icb/icaa113
Dalziell, A. H., Peters, R. A., Cockburn, A., Dorland, A.
D., Maisey, A. C., & Magrath, R. D. (2013). Dance
choreography is coordinated with song repertoire in
a complex avian display. Current Biology, 23(12),
1132–1135. https://doi.org/10.1016/j.cub.2013.05.018
Dediu, D., & Levinson, S. C. (2018). Neanderthal lan-
guage revisited: Not only us. Current Opinion in Be-
havioral Sciences, 21, 49–55. https://doi.org/10.1016/j.
cobeha.2018.01.001
Dingemanse, M., Blasi, D. E., Lupyan, G., Christiansen,
M. H., & Monaghan, P. (2015). Arbitrariness, iconicity,
and systematicity in language. Trends in Cognitive
Sciences, 19(10), 603–615. https://doi.org/10.1016/j.
tics.2015.07.013
88
COMPARATIVE COGNITION & BEHAVIOR REVIEWS
Zhang and Pleyer
Enard, W., Gehre, S., Hammerschmidt, K., Hölter, S. M.,
Blass, T., Somel, M., … Pääbo, S. (2009). A humanized
version of Foxp2 affects cortico-basal ganglia circuits in
mice. Cell, 137(5), 961–971. https://doi.org/10.1016/j.
cell.2009.03.041
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L.,
Wiebe, V., Kitano, T., … Pääbo, S. (2002). Molecular
evolution of FOXP2, a gene involved in speech and
language. Nature, 418(6900), 869–872. https://doi.
org/10.1038/nature01025
Endicott, P., Ho, S. Y., & Stringer, C. (2010). Using ge-
netic evidence to evaluate four palaeoanthropological
hypotheses for the timing of Neanderthal and modern
human origins. Journal of Human Evolution, 59(1),
87–95. https://doi.org/10.1016/j.jhevol.2010.04.005
Ewing, A. W. (1983). Functional aspects of drosophila
courtship. Biological Reviews, 58(2), 275–292. https://
doi.org/10.1111/j.1469-185X.1983.tb00390.x
Fisher, S. E. (2019). Human genetics: The evolving story
of FOXP2. Current Biology, 29(2), R65–R67. https://
doi.org/10.1016/j.cub.2018.11.047
Fröhlich, M., Sievers, C., Townsend, S. W., Gruber, T., &
van Schaik, C. P. (2019). Multimodal communication
and language origins: Integrating gestures and vocal-
izations. Biological Reviews, 94(5), 1809–1829. https://
doi.org/10.1111/brv.12535
Fröhlich, M., & van Schaik, C. P. (2018). The function
of primate multimodal communication. Animal
Cognition, 21, 619–629. https://doi.org/10.1007/
s10071-018-1197-8
Genty, E., Clay, Z., Hobaiter, C., & Zuberbühler, K.
(2014). Multi-modal use of a socially directed call in
bonobos. PlOS ONE, 9(1), Article e84738. https://doi.
org/10.1371/journal.pone.0084738
Graybiel, A. M. (2005). The basal ganglia: Learning
new tricks and loving it. Current Opinion in Neuro-
biology, 15(6), 638–644. https://doi.org/10.1016/j.
conb.2005.10.006
Guillette, L. M., & Sturdy, C. B. (2020). Unifying psy-
chological and biological approaches to understanding
animal cognition. Canadian Journal of Experimental
Psychology/Revue canadienne de psychologie expéri-
mentale, 74(3). https://doi.org/10.1037/cep0000233
Haesler, S., Rochefort, C., Georgi, B., Licznerski, P.,
Osten, P., & Scharff, C. (2007). Incomplete and inac-
curate vocal imitation after knockdown of FoxP2 in
songbird basal ganglia nucleus area X. PLOS Biology,
5(12), 2885–2897. https://doi.org/10.1371/journal.
pbio.0050321
Harvati, K., & Reyes-Centeno, H. (2022). Evolution of
homo in the middle and late Pleistocene. Journal of
Human Evolution, 173, Article 103279. https://doi.
org/10.1016/j.jhevol.2022.103279
Hilliard, C., Cook, S. W., & Duff, M. C. (2016). Hippocam-
pal declarative memory supports gesture production:
Evidence from amnesia. Cortex, 85, 25–36. https://doi.
org/10.1016/j.cortex.2016.09.015
Jarvis, E. D. (2019). Evolution of vocal learning and spo-
ken language. Science, 366(6461), 50–54. https://doi.
org/10.1126/science.aax0287
Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W.,
Green, R. E., Burbano, H. A., . . . Pääbo, S. (2007). The
derived FOXP2 variant of modern humans was shared
with Neandertals. Current Biology, 17(21), 1908–1912.
https://doi.org/10.1016/j.cub.2007.10.008
Kurt, S., Fisher, S. E., & Ehret, G. (2012). Foxp2 mutations
impair auditory-motor association learning. PlOS ONE,
7(3), Article e33130. https://doi.org/10.1371/journal.
pone.0033130
Lawton, K. J., Wassmer, T. L., & Deitcher, D. L. (2014).
Conserved role of Drosophila melanogaster FoxP in mo-
tor coordination and courtship song. Behavioural Brain
Research, 268, 213–221. https://doi.org/10.1016/j.
bbr.2014.04.009
Levinson, S. C. (2023). Gesture, spatial cognition and
the evolution of language. Philosophical Transactions
of the Royal Society of London. Series B, Biological
Sciences, 378(1875), Article 20210481. https://doi.
org/10.1098/rstb.2021.0481
Levinson, S. C., & Holler, J. (2014). The origin of human
multi-modal communication. Philosophical Trans-
actions of the Royal Society B: Biological Sciences,
369(1651), Article 20130302. https://doi.org/10.1098/
rstb.2013.0302
89
Volume 19, 2024
TOWARD INTERDISCIPLINARY INTEGRATION IN COMPARATIVE COGNITION
Liebal, K., Waller, B. M., Burrows, A. M., & Slocombe,
K. E. (2014). Primate communication: A multimodal
approach. Cambridge University Press. https://doi.
org/10.1017/CBO9781139018111
Lin, C. Y., Chuang, C. C., Hua, T. E., Chen, C. C., Dick-
son, B. J., Greenspan, R. J., & Chiang, A. S. (2013).
A comprehensive wiring diagram of the protocerebral
bridge for visual information processing in the Dro-
sophila brain. Cell Reports, 3(5), 1739–1753. https://
doi.org/10.1016/j.celrep.2013.04.022
Macuch Silva, V., Holler, J., Ozyurek, A., & Roberts, S. G.
(2020). Multimodality and the origin of a novel com-
munication Lisystem in face-to-face interaction. Royal
Society Open Science, 7(1), Article 182056. https://doi.
org/10.1098/rsos.182056
Micheletta, J., Engelhardt, A., Matthews, L., Agil, M., &
Waller, B. M. (2013). Multicomponent and multimodal
lip smacking in crested macaques (Macaca nigra).
American Journal of Primatology, 75(7), 763–773.
https://doi.org/10.1002/ajp.22105
Nielsen, J. V., Thomassen, M., Møllgård, K., Noraberg, J.,
& Jensen, N. A. (2014). Zbtb20 denes a hippocampal
neuronal identity through direct repression of genes
that control projection neuron development in the iso-
cortex. Cerebral Cortex, 24(5), 1216–1229. https://doi.
org/10.1093/cercor/bhs400
Ota, N., Gahr, M., & Soma, M. (2015). Tap dancing birds:
The multimodal mutual courtship display of males and
females in a socially monogamous songbird. Scientic
Reports, 5(1), Article 16614. https://doi.org/10.1038/
srep16614
Partan, S., & Marler, P. (1999). Communication goes
multimodal. Science, 283(5406), 1272–1273. https://
doi.org/10.1126/science.283.5406.1272
Pfenning, A. R., Hara, E., Whitney, O., Rivas, M. V., Wang,
R., Roulhac, P. L., Howard, J. T., Wirthlin, M., Lovell,
P. V., Ganapathy, G., Mouncastle, J., Moseley, M. A.,
Thompson, J. W., Soderblom, E. J., Iriki, A., Kato, M.,
Gilbert, M. T. P., Zhang, G., Bakken, T., … Jarvis, E.
D. (2014). Convergent transcriptional specializations in
the brains of humans and song-learning birds. Science,
346(6215), Article 1256846. https://doi.org/10.1126/
science.1256846
Rasenberg, M., Pouw, W., Özyürek, A., & Dingemanse,
M. (2022). The multimodal nature of communicative
efciency in social interaction. Scientic Reports,
12(1), Article 19111. https://doi.org/10.1038/
s41598-022-22883-w
Schatton, A., & Scharff, C. (2017). Next stop: Language.
The ‘FOXP2’ gene’s journey through time. Mètode Sci-
ence Studies Journal, 7, 25–33. https://doi.org/10.7203/
metode.7.7248
Schreiweis, C., Bornschein, U., Burguière, E., Kerimoglu,
C., Schreiter, S., Dannemann, M., … Graybiel, A. M.
(2014). Humanized Foxp2 accelerates learning by en-
hancing transitions from declarative to procedural per-
formance. Proceedings of the National Academy of Sci-
ences, 111(39), 14253–14258. https://doi.org/10.1073/
pnas.1414542111
Shu, W., Cho, J. Y., Jiang, Y., Zhang, M., Weisz, D., Elder,
G. A., … Buxbaum, J. D. (2005). Altered ultrasonic vo-
calization in mice with a disruption in the Foxp2 gene.
Proceedings of the National Academy of Sciences of the
United States of America, 102(27), 9643–9648. https://
doi.org/10.1073/pnas.0503739102
Taglialatela, J. P., Russell, J. L., Pope, S. M., Morton, T.,
Bogart, S., Reamer, L. A., … Hopkins, W. D. (2015).
Multimodal communication in chimpanzees. American
Journal of Primatology, 77(11), 1143–1148. https://doi.
org/10.1002/ajp.22449
Tavolga, W. N. (1956). Visual, chemical and sound stimuli
as cues in the sex discriminatory behavior of the gobiid
sh Bathygobius soporator. Zoologica, 41(2), 49–64.
https://doi.org/10.5962/p.203402
Vigliocco, G., Perniss, P., & Vinson, D. (2014). Language
as a multimodal phenomenon: Implications for lan-
guage learning, processing and evolution. Philosoph-
ical Transactions of the Royal Society B: Biological
Sciences, 369(1651), Article 20130292. https://doi.
org/10.1098/rstb.2013.0292
Voss, J. L., Bridge, D. J., Cohen, N. J., & Walker, J. A.
(2017). A closer look at the hippocampus and memory.
Trends in Cognitive Sciences, 21(8), 577–588. https://
doi.org/10.1016/j.tics.2017.05.008
Williams, H. (2001). Choreography of song, dance and
beak movements in the zebra nch (Taeniopygia gutta-
ta). The Journal of Experimental Biology, 204(Pt. 20),
3497–506. https://doi.org/10.1242/jeb.204.20.3497
90
COMPARATIVE COGNITION & BEHAVIOR REVIEWS
Zhang and Pleyer
Wolff, G. H., & Strausfeld, N. J. (2015). Genealogical
correspondence of mushroom bodies across inverte-
brate phyla. Current Biology, 25(1), 38–44. https://doi.
org/10.1016/j.cub.2014.10.049
Zhang, J., Webb, D. M., & Podlaha, O. (2002). Accelerated
protein evolution and origins of human-specic features:
FOXP2 as an example. Genetics, 162(4), 1825–1835.
https://doi.org/10.1093/genetics/162.4.1825