Figure - available from: Mathematical Problems in Engineering
This content is subject to copyright. Terms and conditions apply.
The schematic diagram of MhaNN-SVM.

The schematic diagram of MhaNN-SVM.

Source publication
Article
Full-text available
Sound classification is a broad area of research that has gained much attention in recent years. The sound classification systems based on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have undergone significant enhancements in the recognition capability of models. However, their computational complexity and inadequate e...

Citations

... Returning to the classification of sound/noise, which can be classically used as the base for the detection danger from the noise around an individual/population, [15] demonstrates that sound categorization performance can still be improved by swapping out the recurrent architecture for a parallel processing structure during feature extraction. The research processes the huge data and uses it to develop the model using Deep Learning Algorithms, namely CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory). ...
... In this research, the audio analysis technique adopted is the Fourier Transform and Mel-Spectrogram (similar to [31]), and the audio was sampled at 44.1kHz (just like in [32]) for further processing. Post-cleaning, the sound data is subjected to three different deep learning models (1D-CNN, 2D-CNN, and LSTM) for the classification of sound from a person's surroundings (the likes of which have been used in various pieces of research cited above, for example: [15]), and to detect a threat from it. If the threat is detected, then an automatic alert message is sent to the registered help or the emergency services. ...
... • Starting with [1], it provides the necessary solution for the problem of detection of threat around an individual, but it comes with a bulky hardware, which poses a difficulty in carrying it around for regular use. • On the other hand, in [2], [15] and [32], the researchers use several techniques to analyze audio signals, but they don't further their work to provide a practical solution to the problem of danger detection around an individual. • The research done in [3] is somewhat close to what has been achieved in this research, where the researchers have built a system to provide real time feedback from a person's surroundings, however, this also comes with an additional hardware component in addition to a smartphone. ...
Article
Full-text available
Today, humans pose the greatest threat to society by getting involved in robbery, assault, or homicide activities. Such circumstances threaten the people working alone at night in remote areas especially women. Any such kind of threat in real time is always associated with a sound/noise which may be used for an early detection. Numerous existing measures are available but none of them sounds efficient due to lack of accuracy, delays in exact prediction of threat. Hence a novel software-based prototype is developed to detect threats from a person’s surrounding sound/noise and automatically alert the registered contacts of victims by sending email, SMS, WhatsApp messages through their smartphones without any other hardware components. Audio signals from Kaggle dataset are visualized, analyzed using Exploratory Data Analytics (EDA) techniques. By feeding EDA outcomes into various Deep Learning models: Long short-term memory (LSTM), Convolutional Neural Networks (CNN) yields accuracy of 96.6% in classifying the audio-events.
... Recently, the attention mechanism has been applied to MGC. Yang [44] continue to study the global dependencies of long audio sequences, employing parallel structures instead of recurrent architecture, multi-head attention as feature extractors, and SVM as classifiers. Their models show considerable generalization ability. ...
... y label = max(y 1 , ..., y j , ..., y k ) [37] Convnet features 89.80% Hybrid model [50] MFCC, SSD, etc 90.00% net1 [35] Mel-spectrogram 90.70% BRNN + PCNNA [14] STFT 90.00% CRNN with GLR [1] Mel-spectrogram 87.79% MhaNN-SVM [44] Mel-spectrogram 88.40% MS-SincResNet [36] Raw waveform 91.49% CNN-5 ...
Article
Full-text available
Music genre classification (MGC) is an indispensable branch of music information retrieval. With the prevalence of end-to-end learning, the research on MGC has made some breakthroughs. However, the limited receptive field of convolutional neural network (CNN) cannot capture a correlation between temporal frames of sounding at any moment and sound frequencies of all vibrations in the song. Meanwhile, time–frequency information of channels is not equally important. In order to deal with the above problems, we apply dual parallel attention (DPA) in CNN-5 to focus on global dependencies. First, we propose parallel channel attention (PCA) to build global time–frequency dependencies in the song and study the influence of different weighting methods for PCA. Next, we design dual parallel attention, which focuses on global time–frequency dependencies in the song and adaptively calibrates contribution of different channels to feature map. Then, we analyzed the effect of applying different numbers and positions of DPA in CNN-5 for performance and compared DPA with multiple attention mechanisms. The results on GTZAN dataset demonstrated that the proposed method achieves a classification accuracy of 91.4%, and DPA has the highest performance.
... Previously, researchers used mathematical techniques like standard statistical pattern recognition (SPR), Gaussian classifier (GS), and Gaussian Mixture Model (GMM) to classify music genres. In the age of AI, researchers have been using various techniques of Machine learning like Multi-Class Support Vector Machine (SVM) [6], K-Nearest Neighbors (KNN) [7], Linear Kernel SVM, Polynomial Kernel SVM, Decision Tree, Random Forest, Ada Boost, Naïve Bayes, Linear Discriminant Analysis (LDA) classifier, Logistic Regression, and Sigmoid Kernel SVM [6], [8]- [10]. In modern days, deep learning is not only solving the problem of computer vision but also dealing with sequencing and time series problems [11] [12]. ...
... Previously, researchers used mathematical techniques like standard statistical pattern recognition (SPR), Gaussian classifier (GS), and Gaussian Mixture Model (GMM) to classify music genres. In the age of AI, researchers have been using various techniques of Machine learning like Multi-Class Support Vector Machine (SVM) [6], K-Nearest Neighbors (KNN) [7], Linear Kernel SVM, Polynomial Kernel SVM, Decision Tree, Random Forest, Ada Boost, Naïve Bayes, Linear Discriminant Analysis (LDA) classifier, Logistic Regression, and Sigmoid Kernel SVM [6], [8]- [10]. In modern days, deep learning is not only solving the problem of computer vision but also dealing with sequencing and time series problems [11] [12]. ...
Preprint
Full-text available
Music has been considered an inseparable part of our culture and tradition. In this work, we created a dataset with six Hindustani music genres: Abhang, Bhajan, Thumri, Tappa, Ghazal, and Kajri, each of which contains 100 songs in wave(.wav) format. To classify the Hindustani music genres, we employ the mel frequency ceptral coefficients features, which contain timbral information, and the Recurrent Neural Network-Long Short Term Memory. Our best three models achieved an average accuracy of 86% when trained on various feature sets with MFCC values of 18, 26, and 39. Furthermore, we use uniform manifold approximation and projection to transform and visualise higher-dimensional feature set data into two-dimensional space. Based on the results, we can infer that Hindustani music has more intricate melodies than western music, and feeding 18 MFCC features to the deep neural network is the optimum strategy to obtain better accuracy. Increasing the hop length from 512 to 1024 reduces the input dimension size, which facilitates the RNN-LSTM model. As a result, the performance of the RNN-LSTM models has been slightly improved. Our RNN-LSTM models’ test set accuracy decreased by 5% when we took 5 segments. Additionally, we evaluated and compared our model to six genres of the GTZAN dataset and achieved 90% accuracy.
... By using a set number of environmental sound signals as training samples, the classifier training module determines the values of the parameters in the classifier [2][3][4][5][6][7][8]. The primary classifiers utilized in the algorithms include k-nearest neighbor (KNN)-based classifiers [9][10][11][12][13], multilayer perceptron (MLP)-based classifiers [14,15], convolutional recurrent neural network (CRNN)-based classifiers [16][17][18][19][20], convolutional neural network (CNN)-based classifiers [21][22][23][24][25][26][27], support vector machine (SVM)-based classifiers [28][29][30][31][32][33], and Gaussian mixed model (GMM)-based classifiers [34][35][36]. The information that a model can provide generally comes from two aspects, the information contained in the training data and the prior information that people supply. ...
Article
Full-text available
Environmental sound classification is an important branch of acoustic signal processing. In this work, a set of sound classification features based on audio signal perception and statistical analysis are proposed to describe the signal from multiple aspects of the time and frequency domain. Energy features, spectral entropy features, zero crossing rate (ZCR), and mel-frequency cepstral coefficient (MFCC) are combined to form joint signal analysis (JSA) features to improve the signal expression of the features. Then, based on the JSA, a novel region joint signal analysis feature (RJSA) for environment sound classification is also proposed. It can reduce feature extraction computation and improve feature stability, robustness, and classification accuracy. Finally, a sound classification framework based on the boosting ensemble learning method is provided to improve the classification accuracy and model generalization. The experimental results show that compared with the highest classification accuracy of the baseline algorithm, the environmental sound classification algorithm based on our proposed RJSA features and ensemble learning methods improves the classification accuracy, and the accuracy of the LightGBM-based sound classification algorithm improves by 14.6%.
... Environmental sound classification (ESC) has been used for different purposes in recent years and therefore it attracts the attention of researchers. In these studies, vehicle interior and exterior sounds (Hashimoto and Takao 1990;Västfjäll et al. 2002;Wang et al. 2013), internal organ sounds (Mishra, Menon, and Mukherjee 2018;Patidar and Pachori 2014;Randhawa and Singh 2015;Wang et al. 2013), fault detection from device sound (Gramatikov et al. 2016;Scanlon, Kavanagh, and Boland 2012), air conditioner and refrigerator noise (Jeon et al. 2011), aircraft and helicopter sounds (Akhtar, Elshafei-Abmed, and Ahmed 2001;More and Davies 2010), ambient sounds (Huang et al. 2019;Lurz et al. 2017), animal sounds (Ko et al. 2019), underwater monitoring (Mayer, Magno, and Benini 2019), gender recognition and classification using music (Chang, Chen, and Lee 2021;Liu et al. 2021), sound classification using acoustic properties (Nanni et al. 2017;Yang and Zhao 2021) have been used. ESC has limited information in terms of time and frequency properties. ...
Article
Full-text available
Nowadays, environmental sound classification (ESC) has become one of the most studied research areas. Sound signals that are indistinguishable from the human auditory systems have been classified by computer-aided systems and machine learning methods. Therefore, ESC has been used in signal processing and sound forensics applications. A novel ESC type is presented in this paper, and it is named as vehicle interior sound classification (VISC). VISC is defined as one of the sub-branches of the ESC, and it is utilized as sound-based biometrics for vehicles. A hand-crafted feature-based VISC method is presented. The proposed method has multileveled feature generation by using maximum pooling and the proposed local quintet magnitude pattern (LQMP), feature selection with iterative neighborhood component analysis (INCA), and classification phases. A novel VISC dataset was collected from YouTube and the proposed LQMP and INCA based method applied to the collected sounds. The results denoted that following: the accuracy, F1-score, and geometric mean of the proposed LQMP and INCA based VISC method were calculated as 98.38%,98.23%, and 98.21% by using support vector machine classifier respectively. The contribution of the proposed VISC method is to denote that the vehicles can be classified by using sound.
... In the aspect of audio recognition, Yang and Zhao [17] proposed an acoustic scene classification method based on the support vector machine (SVM), which enhanced the sound texture to improve the classification accuracy. Greco et al. [18] proposed a voice recognition system based on the heuristic deep learning method. ...
Article
Full-text available
Solfeggio is an important basic course for music majors, and audio recognition training is one of the important links. With the improvement of computer performance, audio recognition has been widely used in smart wearable devices. In recent years, the development of deep learning has accelerated the research process of audio recognition. However, there is a lot of sound interference in music teaching environment, which leads to the performance of the audio classifier that cannot meet the actual demand. In order to solve this problem, an improved audio recognition system based on YOLO-v4 is proposed, which mainly improves the network structure. First, Mel frequency cepstrum number is used to process the original audio and extract the corresponding features. Then, try to apply the YOLO-v4 model in the field of deep learning to the field of audio recognition and improve it by combining with the spatial pyramid pool module to strengthen the generalization ability of data in different audio formats. Second, the stacking method in ensemble learning is used to fuse the independent submodels of two different channels. Experimental results show that compared with other deep learning technologies, the improved YOLO-v4 model can improve the performance of audio recognition, and it has better performance in processing data of different audio formats, which shows better generalization ability.
... Time, frequency, and coefficient are the three domains in which features can be extracted. Traditional feature extraction [11,14] procedures, on the other hand, become more computationally complex as the number of audio signals in the dataset grows. For indexing audio signals in a dataset, audio classification is a popular method. ...
... Deep learning techniques [18] utilized for recognizing emotions [20,21] are done inception net for tackling emotion recognition problem [19], several databases have been studied, and IEMOCAP database is used as dataset for carrying out and training the model using TensorFlow. The recognition capability of sound categorization systems based on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) [11,12] has improved significantly. Extracting physical and perceptual characteristics from a sound and using these characteristics to determine which of a set of classes the sound is most likely to belong to [9,13]. ...
... Many scientific problems and fields have witnessed great developments through the use of deep learning, which has led to its improvement and increase in its achievement rate, for example, computer vision, natural language processing, and also in the field of sound area, like music recommendation and speech recognition [7]- [9].The sound classification systems based on deep neural networks such as CNNs have undergone important improvements in the recognition and classification capability of models. None the less, their complexity of computational and inappropriate exploration of universal dependencies for long sequences restrict perfections in their results of classification [10]. Many researches in the recently years have been achieved in the automatic sound classification and detection area in outdoor environments. ...
Article
Full-text available
In this research, different audio feature extraction techniques are implemented and classification approaches are presented to classify seven types of wind. We applied features techniques such as Zero Crossing Rate (ZCR), Fast Fourier Transformation (FFT), Linear predictive coding (LPC), and Perceptual Linear Prediction (PLP). We know that some of these methods are good with human voices, but we tried to apply them here to characterize the wind audio content. The CNN classification method is implemented to determine the class of input wind sound signal. Experimental results show that each of these extraction feature methods give different results, but classification accuracy that are obtained by using PLP features return the best results.