Figure - available from: Scientific Reports
This content is subject to copyright. Terms and conditions apply.
Configuration of collar mounted acoustic recording unit (ARU), (a) after being fit on an adult female giant panda (b) and schematic drawing the ARU and all its components (1. audio recorder; 2. aluminum cap; 3. aluminum box; 4. leather belting; 5. condom; 6. gauze; 7. external microphone; 8. duct tape strips).

Configuration of collar mounted acoustic recording unit (ARU), (a) after being fit on an adult female giant panda (b) and schematic drawing the ARU and all its components (1. audio recorder; 2. aluminum cap; 3. aluminum box; 4. leather belting; 5. condom; 6. gauze; 7. external microphone; 8. duct tape strips).

Source publication
Article
Full-text available
For translocated animals, behavioral competence may be key to post-release survival. However, monitoring behavior is typically limited to tracking movements or inferring behavior at a gross scale via collar-mounted sensors. Animal-bourne acoustic monitoring may provide a unique opportunity to monitor behavior at a finer scale. The giant panda is an...

Similar publications

Preprint
Full-text available
The Hudson Bay Lowlands contain the most extensive wetlands and thickest peat deposits in Canada. The region is home to unique concentrations of wildlife, most notably polar bears, caribou, and migratory birds. Bears rely on inland denning habitat, caribou are tied to peatland vegetation, and birds intensively graze coastal herbaceous salt marsh an...

Citations

... Pioneering efforts are directed toward developing a small, versatile, efficient deep network for acoustic recognition on resource-limited edge devices. Additionally, a key component of many intelligent Internet of Things (IoT) applications, including predictive maintenance [1,2], surveillance [3,4], and ecosystem monitoring [5,6], is audio classification. With several possible applications, including audio surveillance [7] and smart room monitoring [8], environmental sound categorization (ESC) is a significant Sci 2024, 6, 21 2 of 12 study topic in human-computer interaction. ...
Article
Full-text available
Audio classification using deep learning models, which is essential for applications like voice assistants and music analysis, faces challenges when deployed on edge devices due to their limited computational resources and memory. Achieving a balance between performance, efficiency, and accuracy is a significant obstacle to optimizing these models for such constrained environments. In this investigation, we evaluate diverse deep learning architectures, including Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM), for audio classification tasks on the ESC 50, UrbanSound8k, and Audio Set datasets. Our empirical findings indicate that Mel spectrograms outperform raw audio data, attributing this enhancement to their synergistic alignment with advanced image classification algorithms and their congruence with human auditory perception. To address the constraints of model size, we apply model-compression techniques, notably magnitude pruning, Taylor pruning, and 8-bit quantization. The research demonstrates that a hybrid pruned model achieves a commendable accuracy rate of 89 percent, which, although marginally lower than the 92 percent accuracy of the uncompressed CNN, strikingly illustrates an equilibrium between efficiency and performance. Subsequently, we deploy the optimized model on the Raspberry Pi 4 and NVIDIA Jetson Nano platforms for audio classification tasks. These findings highlight the significant potential of model-compression strategies in enabling effective deep learning applications on resource-limited devices, with minimal compromise on accuracy.
... Pioneering efforts are directed toward developing a small, versatile, efficient deep network for acoustic recognition on resource-limited edge devices. Besides, a key component of many intelligent Internet of Things (IoT) applications, including predictive maintenance [1,2], surveillance [3,4], and ecosystem monitoring [5,6], is audio classification. With several possible applications, including audio surveillance [7] and smart room monitoring [8], environmental sound categorization (ESC) is a significant study topic in human-computer interaction. ...
Preprint
Full-text available
Audio classification using deep learning models, essential for applications like voice assistants and music analysis, faces challenges when deployed on edge devices due to their limited computational resources and memory. Achieving a balance between performance, efficiency, and accuracy is a significant obstacle in optimizing these models for such constrained environments. In this investigation, we evaluate diverse deep learning architectures, including Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM), for audio classification tasks on the ESC 50, UrbanSound8k, and Audio Set datasets. Our empirical findings indicate that Mel Spectrograms outperform raw audio data, attributing this enhancement to their synergistic alignment with advanced image classification algorithms and their congruence with human auditory perception. To address the constraints of model size, we apply model compression techniques, notably magnitude pruning, Taylor Pruning, and 8-bit quantization. The research demonstrates that a hybrid pruned model achieves a commendable accuracy rate of 89 percent, which, although marginally lower than the 92 percent accuracy of the uncompressed CNN, strikingly illustrates an equilibrium between efficiency and performance. Subsequently, we deploy the optimized model on Raspberry Pi 4 and NVIDIA Jetson Nano platforms for audio classification tasks. These findings highlight the significant potential of model compression strategies in enabling effective deep learning applications on resource-limited devices, with minimal compromise on accuracy.
... A. Owen et al., 2005M. A. Owen et al., , 2016Charlton et al., 2019), and different behavioral contexts are associated with sonic events, each offering a unique acoustic signature (Yan et al., 2019) allowing classification and also containing additional fine details about the individual emitting those sounds (sex, size, age, identity, intent, hormone level). The low caloric and nutritional value of bamboo mean that pandas must spend upwards of 14 hours a day feeding. ...
Article
Full-text available
There is a strong disconnect between humans and other species in our societies. Zoos particularly expose this disconnect by displaying the asymmetry between visitors in search of entertainment, and animals often suffering from a lack of meaningful interactions and natural behaviors. In zoos, many species are unable to mate, raise young, or exhibit engagement behaviors. Enrichment is a way to enhance their quality of life, enabling them to express natural behaviors and reducing stereotypies. Prior work on sound-based enrichment and interactivity suggest that a better understanding of animals’ sensory needs and giving them options to shape their surroundings can yield substantial benefits. However, current zoo management and conservation practices lack tools and frameworks to leverage innovative technology to improve animal well-being and zookeepers’ ability to care for them. Ethical considerations are called for in developing such interventions as human understanding of animals’ worlds is still limited, and assumptions can have detrimental consequences. Based on several interventions, four principles are proposed to guide a more systematic implementation of sonic enrichment in zoos. The goal is to lay the groundwork for the design of the zoos of the future, with a focus on sounds, for the benefit of the animals.
... Intelligent sound recognition is receiving strong interest in a growing number of application areas, from technical safety [1][2][3] , surveillance and urban monitoring [4][5][6] to environmental applications [7,8] . ...
... z rank = sort( (z l )) (8) z prune = argmin (z rank ) ...
Article
Significant efforts are being invested to bring state-of-the-art classification and recognition to edge devices with extreme resource constraints (memory, speed, and lack of GPU support). Here, we demonstrate the first deep network for acoustic recognition that is small, flexible and compression-friendly yet achieves state-of-the-art performance for raw audio classification. Rather than handcrafting a once-off solution, we present a generic pipeline that automatically converts a large deep convolutional network via compression and quantization into a network for resource-impoverished edge devices. After introducing ACDNet, which produces above state-of-the-art accuracy on ESC-10 (96.65%), ESC-50 (87.10%), UrbanSound8K (84.45%) and AudioEvent (92.57%), we describe the compression pipeline and show that it allows us to achieve 97.22% size reduction and 97.28% FLOP reduction while maintaining close to state-of-the-art accuracy 96.25%, 83.65%, 78.27% and 89.69% on these datasets. We describe a successful implementation on a standard off-the-shelf microcontroller and, beyond laboratory benchmarks, report successful tests on real-world datasets.
... Such influences could result in enormous inaccuracies in abundance and occupancy estimates if not considered. Estimates of species-specific rates of vocalizations could benefit from collars with microphones, which have been successfully applied in several taxa (Buil et al., 2019;Couchoux et al., 2015;Yan et al., 2019) Second, for genetic studies and many other biology fields, it is obligatory to publish data associated with publications in public databases. The same should be standard for acoustic data as acoustic reference data will be critical to train efficient algorithms for fast analyses of the data. ...
Article
Full-text available
Developing new cost-effective methods for monitoring the distribution and abundance of species is essential for conservation biology. Passive acoustic monitoring (PAM) has long been used in marine mammals and has recently been postulated to be a promising method to improve monitoring of terrestrial wildlife as well. Because Madagascar’s lemurs are among the globally most threatened taxa, this study was designed to assess the applicability of an affordable and open-source PAM device to estimate the density of pale fork-marked lemurs (Phaner pallescens). Using 12 playback experiments and one fixed transect of four automated acoustic recorders during one night of the dry season in Kirindy Forest, we experimentally estimated the detection space for Phaner and other lemur vocalizations. Furthermore, we manually annotated more than 10,000 vocalizations of Phaner from a single location and used bout rates from previous studies to estimate density within the detection space. To truncate detections beyond 150 m, we applied a sound pressure level (SPL) threshold filtering out vocalizations below SPL 50 (dB re 20 μPa). During the dry season, vocalizations of Phaner can be detected with confidence beyond 150 m by a human listener. Within our fixed truncated detection area corresponding to an area of 0.07 km2 (detection radius of 150 m), we estimated 10.5 bouts per hour corresponding to a density of Phaner of 38.6 individuals/km2. Our density estimates are in line with previous estimates based on individually marked animals conducted in the same area. Our findings suggest that PAM also could be combined with distance sampling methods to estimate densities. We conclude that PAM is a promising method to improve the monitoring and conservation of Phaner and many other vocally active primates.
... A UDIO classification is a fundamental building block of many smart Internet of Things (IoT) applications such as predictive maintenance [1]- [3], surveillance [4], [5], and ecosystem monitoring [6], [7]. Smart sensors driven by microcontroller units (MCUs) are at the core of these applications. ...
... The data is recorded at 16kHz with a bit depth of 16bit. The subsets of the AudioEvent datasets are as follows:S 10 = {0,2,5,9,11,12,17,21,25, 26} S 20 = S 10 ∪ {1,4,7,8,14,15,19,20,23, 27} S 28 = The whole AudioEvent dataset ...
Article
Full-text available
Deep learning has celebrated resounding successes in many application areas of relevance to the Internet of Things (IoT), such as computer vision and machine listening. These technologies must ultimately be brought directly to the edge to fully harness the power of deep leaning for the IoT. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization, and the recent advancement of XNOR-Net. This study examines the suitability of these techniques for audio classification on microcontrollers. We present an application of XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades, and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring approximately 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets. To the best of our knowledge, this is the first study to apply XNOR to end-to-end audio classification and evaluate it in the context of alternative techniques. All codes are publicly available on GitHub.
... These systems are often invasive techniques, such as collar-mounted acoustic sensors (e.g. Lynch et al., 2013;Yan et al., 2019;Wijers et al., 2021); we focus instead here on PAM, given its historic and increasing use in conservation. Next, we review how PAM has been used to address related topics across taxonomic groups under these themes (behaviour, ecology, and conservation). ...
Chapter
Animals share acoustic space to communicate vocally. The employment of passive acoustic monitoring to establish a better understanding of acoustic communities has emerged as an important tool in assessing overall diversity and habitat integrity as well as informing species conservation strategies. This chapter aims to review how traditional and more emerging bioacoustic techniques can address conservation issues. Acoustic data can be used to estimate species occupancy, population abundance, and animal density. More broadly, biodiversity can be assessed via acoustic diversity indices, using the number of acoustically conspicuous species. Finally, changes to the local soundscape provide an early warning of habitat disturbance, including habitat loss and fragmentation. Like other emerging technologies, passive acoustic monitoring (PAM) benefits from an interdisciplinary collaboration between biologists, engineers, and bioinformaticians to develop detection algorithms for specific species that reduce time-consuming manual data mining. The chapter also describes different methods to process, visualize, and analyse acoustic data, from open source to commercial software. The technological advances in bioacoustics turning heavy, non-portable, and expensive hardware and labour and time-intensive methods for analysis into new small, movable, affordable, and automated systems, make acoustic sensors increasingly popular among conservation biologists for all taxa.
... A UDIO classification is a fundamental building block of many smart IoT applications such as predictive maintenance [1]- [3], surveillance [4], and ecosystem monitoring [5], [6]. Smart sensors driven by microcontroller units (MCUs) are at the core of these applications. ...
Preprint
Full-text available
Deep Learning has celebrated resounding successes in many application areas of relevance to the Internet-of-Things, for example, computer vision and machine listening. To fully harness the power of deep leaning for the IoT, these technologies must ultimately be brought directly to the edge. The obvious challenge is that deep learning techniques can only be implemented on strictly resource-constrained edge devices if the models are radically downsized. This task relies on different model compression techniques, such as network pruning, quantization and the recent advancement of XNOR-Net. This paper examines the suitability of these techniques for audio classification in microcontrollers. We present an XNOR-Net for end-to-end raw audio classification and a comprehensive empirical study comparing this approach with pruning-and-quantization methods. We show that raw audio classification with XNOR yields comparable performance to regular full precision networks for small numbers of classes while reducing memory requirements 32-fold and computation requirements 58-fold. However, as the number of classes increases significantly, performance degrades and pruning-and-quantization based compression techniques take over as the preferred technique being able to satisfy the same space constraints but requiring about 8x more computation. We show that these insights are consistent between raw audio classification and image classification using standard benchmark sets.To the best of our knowledge, this is the first study applying XNOR to end-to-end audio classification and evaluating it in the context of alternative techniques. All code is publicly available on GitHub.
... Drones (Hodgson et al., 2018), GPS transmitters (Fischer et al., 2018), acoustic-monitoring (Yan et al., 2019), and environmental DNA (eDNA) sequencing (Hunter, Hoban, Bruford, Segelbacher, & Bernatchez, 2018), are among the many new technologies augmenting remote, non-invasive monitoring. Genetic insights from eDNA, paleogenomics (ancient DNA), population genomics, and functional genomics, offer robust data to enhance translocation planning and implementation ranging from identifying suitable source populations to opportunities to facilitate adaptation to threats such as introduced disease (Box 3). ...
Article
Full-text available
Conservation translocations (reintroductions, reinforcements, ecological replacements, and assisted colonization) have played a vital and necessary role in conserving endangered species and ecosystems. Yet concerns over potential unintended ecological consequences frequently hinder the progress of translocation activities. We reviewed the history of U.S. translocations to ask: how often were intended benefits the result versus harmful unintended consequences? We found that translocations played a key role in recovery for 30% (14 of 47) of U.S. delisted taxa. Translocations have been performed, are planned, or are part of continuing recovery actions for 70% (1,112 of 1,580) of listed threatened and endangered taxa. Of the 1,014 total taxa we found with recorded conservation translocations spanning 125 years, we found only one restricted instance that caused a loss of biodiversity. All other reports of negative consequences were caused by translocations performed for economic and cultural interests in the absence of conservation‐based governance. Examples included fish stocking for sport and biological control programs for agricultural pests. We included biological control programs in this analysis because they can be and often are used as conservation tools, to directly benefit ecosystems. In addition, they are often raised as examples of harmful unintended results during the conservation planning process. However, only 1.4% (42) of 3,014 biological control agents released globally have caused ecosystem‐level deleterious impacts. All of these were initially released before the 1980s and conservation‐based practice and governance in recent decades have reduced off‐target impacts from biological control practice. Two themes emerged from our review: (a) conservation translocations routinely yielded their intended benefits without producing unintended harm, and (b) when ecological damage did occur, it was in the absence of conservation practice and regulation. This evidence shows that well‐planned translocation efforts produce ecosystem benefits, which should be weighed against the costs of inaction when deliberating conservation strategies.
... Intelligent sound recognition is receiving strong interest in a growing number of application areas, from technical safety [10,19,53], surveillance and urban monitoring [40] to environmental applications [43,51]. ...
Preprint
Full-text available
Significant efforts are being invested to bring the classification and recognition powers of desktop and cloud systemsdirectly to edge devices. The main challenge for deep learning on the edge is to handle extreme resource constraints(memory, CPU speed and lack of GPU support). We present an edge solution for audio classification that achieves close to state-of-the-art performance on ESC-50, the same benchmark used to assess large, non resource-constrained networks. Importantly, we do not specifically engineer thenetwork for edge devices. Rather, we present a universalpipeline that converts a large deep convolutional neuralnetwork (CNN) automatically via compression and quantization into a network suitable for resource-impoverishededge devices. We first introduce a new sound classification architecture, ACDNet, that produces above state-of-the-art accuracy on both ESC-10 and ESC-50 which are 96.75% and 87.05% respectively. We then compress ACDNet using a novel network-independent approach to obtain an extremely small model. Despite 97.22% size reduction and 97.28% reduction in FLOPs, the compressed network still achieves 82.90% accuracy on ESC-50, staying close to the state-of-the-art. Using 8-bit quantization, we deploy ACD-Net on standard microcontroller units (MCUs). To the best of our knowledge, this is the first time that a deep network for sound classification of 50 classes has successfully been deployed on an edge device. While this should be of interestin its own right, we believe it to be of particular impor-tance that this has been achieved with a universal conver-sion pipeline rather than hand-crafting a network for mini-mal size.