Figure 2 - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Spectrograms of three characteristic killer whale sounds (sampling rate = 44.1 kHz, FFT-size = 4,096 samples (≈100 ms), hop-size = 441 samples (≈10 ms)).

Spectrograms of three characteristic killer whale sounds (sampling rate = 44.1 kHz, FFT-size = 4,096 samples (≈100 ms), hop-size = 441 samples (≈10 ms)).

Source publication
Article
Full-text available
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a sm...

Contexts in source publication

Context 1
... 32 . The diverse vocal repertoire of killer whales comprises clicks, whistles, and pulsed calls 33 . Like other odontocetes, killer whales produce echolocation clicks, used for navigation and localization, which are short pulses of variable duration (between 0.1 and 25 ms) and a click-repetition-rate from a few pulses to over 300 per second 33 (Fig. 2a). Whistles are narrow band tones with no or few harmonic components at frequencies typically between 1.5 and 18 kHz and durations from 50 ms up to 12 s 33 (Fig. 2b). As recently shown, whistles extend into the ultrasonic range with observed fundamental frequencies ranging up to 75 kHz in three Northeast Atlantic populations but not in ...
Context 2
... used for navigation and localization, which are short pulses of variable duration (between 0.1 and 25 ms) and a click-repetition-rate from a few pulses to over 300 per second 33 (Fig. 2a). Whistles are narrow band tones with no or few harmonic components at frequencies typically between 1.5 and 18 kHz and durations from 50 ms up to 12 s 33 (Fig. 2b). As recently shown, whistles extend into the ultrasonic range with observed fundamental frequencies ranging up to 75 kHz in three Northeast Atlantic populations but not in the Northeast Pacific 34 . Whistles are most commonly used during close-range social interactions. There are variable and stereotyped whistles [35][36][37] . Pulsed ...
Context 3
... are most commonly used during close-range social interactions. There are variable and stereotyped whistles [35][36][37] . Pulsed calls, the most common and intensively studied vocalization of killer whales, typically show sudden and patterned shifts in frequency, based on the pulse repetition rate, which is usually between 250 and 2000 Hz 33 (Fig. 2c). Pulsed calls are classified into discrete, variable, and aberrant calls 33 . Some highly stereotyped whistles and pulsed calls are believed to be culturally transmitted through vocal learning 36,[38][39][40][41] . Mammal-hunting killer whales in the Northeast Pacific produce echolocation clicks, pulsed calls and whistles at ...

Citations

... Convolutional Neural Networks (CNNs), which use image processing techniques to read and classify acoustic data into predetermined classes, have been shown to excel at detecting signals which vary in respect to time, frequency, and amplitude 13,14,15,16 . Using broadband acoustic data as input, CNNs have achieved high accuracies at the task of detecting dolphin vocalisations within variable marine environments 17,18,19,20 . The use of CNNs for monitoring long-term dolphin occurrence in diverse acoustic habitats, beyond proof of concept, is still in its infancy; for such a tool to become part of the standard analysis toolkit we must demonstrate its reliability compared to established methods of detection. ...
... Within the realm of acoustic environmental research, AI methodologies have been effectively leveraged in a diverse array of studies. These include the detection of orca whales [39] and dolphins [40], the classification of fish [41], owl species [42], and bird songs [43]. Additionally, some research has integrated deep learning with traditional machine learning (ML) approaches for enhanced classification of anurans, birds [44], and robust bird classification [45]. ...
Article
Full-text available
Bats play a pivotal role in maintaining ecological balance, and studying their behaviors offers vital insights into environmental health and aids in conservation efforts. Determining the presence of various bat species in an environment is essential for many bat studies. Specialized audio sensors can be used to record bat echolocation calls that can then be used to identify bat species. However, the complexity of bat calls presents a significant challenge, necessitating expert analysis and extensive time for accurate interpretation. Recent advances in neural networks can help identify bat species automatically from their echolocation calls. Such neural networks can be integrated into a complete end-to-end system that leverages recent internet of things (IoT) technologies with long-range, low-powered communication protocols to implement automated acoustical monitoring. This paper presents the design and implementation of such a system that uses a tiny neural network for interpreting sensor data derived from bat echolocation signals. A highly compact convolutional neural network (CNN) model was developed that demonstrated excellent performance in bat species identification, achieving an F1-score of 0.9578 and an accuracy rate of 97.5%. The neural network was deployed, and its performance was evaluated on various alternative edge devices, including the NVIDIA Jetson Nano and Google Coral.
... The analysis of acoustic data is usually based on the use of spectrograms featuring high temporal and spectral resolutions. Computer vision techniques have delivered promising results in recent years with regard to automated spectrogram classification mainly implementing deep learning techniques such as convolutional neural networks (CNNs; Allen et al., 2021;Bergler et al., 2019;Dugan et al., 2014;Halkias et al., 2013;Poupard et al., 2021;Stowell et al., 2019). A main requirement for achieving good results with deep learning techniques is the availability of large annotated datasets to train, validate and test models, which contain as much of the existing variability of soundscapes as possible. ...
... Lisa Yang Center for Conservation Bioacoustics, 2023). Even more publications report on developed and tested models that can be used to analyze marine passive acoustic data to detect marine animal vocalizations (Allen et al., 2021;Belghith et al., 2018;Bergler et al., 2019;Best et al., 2020Best et al., , 2022Bohnenstiehl, 2023;Kirsebom et al., 2020;Madhusudhana et al., 2021;Miller et al., 2023;Rasmussen &Širović, 2021;Rycyk et al., 2022;Shiu et al., 2020;Vickers et al., 2021;White et al., 2022;Zhong et al., 2020Zhong et al., , 2021. However, there is only a small number of actual applications of these models to long-term data (Allen et al., 2021;Best et al., 2022;Bohnenstiehl, 2023;Lammers et al., 2023;Rycyk et al., 2022). ...
Article
Full-text available
Passive acoustic monitoring (PAM) is commonly used to obtain year-round continuous data on marine soundscapes harboring valuable information on species distributions or ecosystem dynamics. This continuously increasing amount of data requires highly efficient automated analysis techniques in order to exploit the full potential of the available data. Here, we propose a benchmark, which consists of a public dataset, a well-defined task and evaluation procedure to develop and test automated analysis techniques. This benchmark focuses on the special case of detecting animal vocalizations in a real-world dataset from the marine realm. We believe that such a benchmark is necessary to monitor the progress in the development of new detection algorithms in the field of marine bioacoustics. We ultimately use the proposed benchmark to test three detection approaches, namely ANIMAL-SPOT, Koogu and a simple custom sequential convolutional neural network (CNN), and report performances. We report the performance of the three detection approaches in a blocked cross-validation fashion with 11 site-year blocks for a multi-species detection scenario in a large marine passive acoustic dataset. Performance was measured with three simple metrics (i.e., true classification rate, noise misclassification rate and call misclassification rate) and one combined fitness metric, which allocates more weight to the minimization of false positives created by noise. Overall, ANIMAL-SPOT performed the best with an average fitness metric of 0.6, followed by the custom CNN with an average fitness metric of 0.57 and finally Koogu with an average fitness metric of 0.42. The presented benchmark is an important step to advance in the automatic processing of the continuously growing amount of PAM data that are collected throughout the world's oceans. To ultimately achieve usability of developed algorithms, the focus of future work should be laid on the reduction of the false positives created by noise.
... CNN's algorithms perform well for acoustic classification (Hershey et al., 2017), including the identification of a growing number of species vocalizations such as crickets and cicadas (Dong et al., 2018), birds and frogs (LeBien et al., 2020), fish (Mishachandar & Vairamuthu, 2021), and lately marine mammals (Usman et al., 2020). Recent applications of deep learning to the study of marine soundscapes include automated detectors for killer whales (Bergler et al., 2019) and humpback whales (Allen et al., 2021), the detection of North Atlantic right whales under changing environmental conditions (Vickers et al., 2021), and the detection of echolocation click trains produced by toothed whales (Roch et al., 2021). ...
Article
Full-text available
Passive Acoustic Monitoring (PAM) is emerging as a solution for monitoring species and environmental change over large spatial and temporal scales. However, drawing rigorous conclusions based on acoustic recordings is challenging, as there is no consensus over which approaches are best suited for characterizing marine acoustic environments. Here, we describe the application of multiple machine‐learning techniques to the analysis of two PAM datasets. We combine pre‐trained acoustic classification models (VGGish, NOAA and Google Humpback Whale Detector), dimensionality reduction (UMAP), and balanced random forest algorithms to demonstrate how machine‐learned acoustic features capture different aspects of the marine acoustic environment. The UMAP dimensions derived from VGGish acoustic features exhibited good performance in separating marine mammal vocalizations according to species and locations. RF models trained on the acoustic features performed well for labeled sounds in the 8 kHz range; however, low‐ and high‐frequency sounds could not be classified using this approach. The workflow presented here shows how acoustic feature extraction, visualization, and analysis allow establishing a link between ecologically relevant information and PAM recordings at multiple scales, ranging from large‐scale changes in the environment (i.e., changes in wind speed) to the identification of marine mammal species.
... Deep learning models, particularly deep learning neural networks are increasingly popular in the field of acoustic monitoring of wildlife in marine and forested ecosystems [19][20][21][22][23][24][25][26]. These models are capable of processing and analyzing large amounts of animal vocalizations amidst background noise, allowing them to detect and classify different species of animals [20,25]. ...
... However, one of the main challenges of using these models is the collection of high-quality datasets. To overcome this, researchers use a combination of automated and manual techniques to collect and annotate data, ensuring their models are trained on accurate and representative datasets [19,22,25,26]. In marine ecosystems, CNNs are used to monitor the vocalization of killer and humpback whales [19,20], while in forest ecosystems, CNN architecture has been used to analyze the call patterns of frogs [22,23]. ...
... To overcome this, researchers use a combination of automated and manual techniques to collect and annotate data, ensuring their models are trained on accurate and representative datasets [19,22,25,26]. In marine ecosystems, CNNs are used to monitor the vocalization of killer and humpback whales [19,20], while in forest ecosystems, CNN architecture has been used to analyze the call patterns of frogs [22,23]. Other AI methods used in the forest ecosystem include SVM classifiers [24,25] and multi-label learning approaches to identify rival frog species, and GMM for detecting the activity of frogs through their calls [26]. ...
Article
Full-text available
Artificial intelligence (AI) has become a significantly growing field in the environmental sector due to its ability to solve problems, make decisions, and recognize patterns. The significance of AI in wildlife acoustic monitoring is particularly important because of the vast amounts of data that are available in this field, which can be leveraged for computer vision and interpretation. Despite the increasing use of AI in wildlife ecology, its future in acoustic wildlife monitoring remains uncertain. To assess its potential and identify future needs, a scientific literature review was conducted on 54 works published between 2015 and March 2022. The results of the review showed a significant rise in the utilization of AI techniques in wildlife acoustic monitoring over this period, with birds (N = 26) gaining the most popularity, followed by mammals (N = 12). The most commonly used AI algorithm in this field was Convolutional Neural Network, which was found to be more accurate and beneficial than previous categorization methods in acoustic wildlife monitoring. This highlights the potential for AI to play a crucial role in advancing our understanding of wildlife populations and ecosystems. However, the results also show that there are still gaps in our understanding of the use of AI in wildlife acoustic monitoring. Further examination of previously used AI algorithms in bioacoustics research can help researchers better understand patterns and identify areas for improvement in autonomous wildlife monitoring. In conclusion, the use of AI in wildlife acoustic monitoring is a rapidly growing field with a lot of potential. While significant progress has been made in recent years, there is still much to be done to fully realize the potential of AI in this field. Further research is needed to better understand the limitations and opportunities of AI in wildlife acoustic monitoring, and to develop new algorithms that can improve the accuracy and usefulness of this technology.
... By sensing and analysing underwater animal sounds, such as sounds produced by whales for communication purposes or caused by their movement patterns, conclusions can be drawn with respect to their population, behaviour, and habitat [35][36][37]. Large bioacoustic archives like the Orchive [38,39] represent a useful data foundation for a CA system [40,41]. ...
Article
Full-text available
Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition – a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence – is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches.
... CNNs can learn to discriminate spectro-temporal information directly from a labelled spectrogram, used as an image input. The success of CNNs within the marine bioacoustic field has been demonstrated by studies for binary and multi-class species classification (Belgith et al., 2018;Harvey, 2018;Liu et al., 2018;Bergler et al., 2019;Bermant et al., 2019;Shiu et al., 2020;Yang et al., 2020;Zhong et al., 2020;Allen et al., 2021;White et al., 2022). ...
... More recent methods use simple Convolutional Neural Networks (Con-vNets) [18,19,20,21] and Residual Neural Networks (ResNets) [22,23,24] to detect and classify single calls, and also Recurrent Neural Networks (RNNs) to separate echolocation calls from social calls [25]. This is in line with neighboring research fields, where ConvNets and RNNs are used to classify vocalizations of whales [26,27,28] or birds [29,30,31,32,33,34], for example. ...
... While the application of machine learning (ML) methods to detect wildlife calls is a well-established and active field of research (Stowell et al., 2019), the majority of existing studies have used large, labelled datasets to train classifiers using supervised learning approaches. Examples include the use of publicly available wildlife call datasets (Sankupellay and Konovalov, 2018;Stowell et al., 2019), annotation by citizen scientists (Mac Aodha et al., 2018) or existing labelled datasets (Bergler et al., 2019;Towsey et al., 2012). However, these approaches may not be suitable for the detection of a species where labelled training data are not available and are expensive to obtain due to the cost of human-generated labels. ...
... However, the collection and analysis of long-term data on variation in broadband vocalizations, such as bray calls, have been constrained both by the capacity and longevity of underwater sound recorders and the need for manual analysis [18,19]. These constraints are now being overcome by the availability of relatively low-cost archival sound recorders [20,21] and the development of automatic detectors based on deep learning techniques [22][23][24]. ...
... Instead of single acoustic devices, we deployed arrays of echolocation data loggers and broadband sound recorders to characterize both occurrence and foraging activity within two known foraging areas [25]. We used two different proxies for foraging: (i) echolocation buzzes, identified by modelling echolocation inter-click intervals (ICIs) [17] and (ii) bray calls, automatically detected using deep learning techniques, building upon the methodology of Bergler et al. [22]. We hypothesised that dolphins would remain longer within each of these foraging areas when the detection rates of foraging proxies within the encounter increased. ...
... Bray calls were identified using DOLPHIN-SPOT, a deep convolutional neural network (CNN)-based bray detector following the methodology of Bergler et al. [22]. DOLPHIN-SPOT produces an output that royalsocietypublishing.org/journal/rsos R. Soc. ...
Article
Full-text available
Area-restricted search (ARS) behaviour is commonly used to characterize spatio-temporal variation in foraging activity of predators, but evidence of the drivers underlying this behaviour in marine systems is sparse. Advances in underwater sound recording techniques and automated processing of acoustic data now provide opportunities to investigate these questions where species use different vocalizations when encountering prey. Here, we used passive acoustics to investigate drivers of ARS behaviour in a population of dolphins and determined if residency in key foraging areas increased following encounters with prey. Analyses were based on two independent proxies of foraging: echolocation buzzes (widely used as foraging proxies) and bray calls (vocalizations linked to salmon predation attempts). Echolocation buzzes were extracted from echolocation data loggers and bray calls from broadband recordings by a convolutional neural network. We found a strong positive relationship between the duration of encounters and the frequency of both foraging proxies, supporting the theory that bottlenose dolphins engage in ARS behaviour in response to higher prey encounter rates. This study provides empirical evidence for one driver of ARS behaviour and demonstrates the potential for applying passive acoustic monitoring in combination with deep learning-based techniques to investigate the behaviour of vocal animals.