The schematic diagram of MhaNN-SVM.

Live Event Detection for People’s Safety using NLP and Deep Learning

Article

Full-text available

Jan 2024

Today, humans pose the greatest threat to society by getting involved in robbery, assault, or homicide activities. Such circumstances threaten the people working alone at night in remote areas especially women. Any such kind of threat in real time is always associated with a sound/noise which may be used for an early detection. Numerous existing measures are available but none of them sounds efficient due to lack of accuracy, delays in exact prediction of threat. Hence a novel software-based prototype is developed to detect threats from a person’s surrounding sound/noise and automatically alert the registered contacts of victims by sending email, SMS, WhatsApp messages through their smartphones without any other hardware components. Audio signals from Kaggle dataset are visualized, analyzed using Exploratory Data Analytics (EDA) techniques. By feeding EDA outcomes into various Deep Learning models: Long short-term memory (LSTM), Convolutional Neural Networks (CNN) yields accuracy of 96.6% in classifying the audio-events.

Parallel attention of representation global time–frequency correlation for music genre classification

Article

Full-text available

Jun 2023
MULTIMED TOOLS APPL

Music genre classification (MGC) is an indispensable branch of music information retrieval. With the prevalence of end-to-end learning, the research on MGC has made some breakthroughs. However, the limited receptive field of convolutional neural network (CNN) cannot capture a correlation between temporal frames of sounding at any moment and sound frequencies of all vibrations in the song. Meanwhile, time–frequency information of channels is not equally important. In order to deal with the above problems, we apply dual parallel attention (DPA) in CNN-5 to focus on global dependencies. First, we propose parallel channel attention (PCA) to build global time–frequency dependencies in the song and study the influence of different weighting methods for PCA. Next, we design dual parallel attention, which focuses on global time–frequency dependencies in the song and adaptively calibrates contribution of different channels to feature map. Then, we analyzed the effect of applying different numbers and positions of DPA in CNN-5 for performance and compared DPA with multiple attention mechanisms. The results on GTZAN dataset demonstrated that the proposed method achieves a classification accuracy of 91.4%, and DPA has the highest performance.

Classification of Indian Classical Music (Hindustani Music) Genres through MFCCs Features using RNN-LSTM Model

Preprint

Full-text available

Dec 2022

Music has been considered an inseparable part of our culture and tradition. In this work, we created a dataset with six Hindustani music genres: Abhang, Bhajan, Thumri, Tappa, Ghazal, and Kajri, each of which contains 100 songs in wave(.wav) format. To classify the Hindustani music genres, we employ the mel frequency ceptral coefficients features, which contain timbral information, and the Recurrent Neural Network-Long Short Term Memory. Our best three models achieved an average accuracy of 86% when trained on various feature sets with MFCC values of 18, 26, and 39. Furthermore, we use uniform manifold approximation and projection to transform and visualise higher-dimensional feature set data into two-dimensional space. Based on the results, we can infer that Hindustani music has more intricate melodies than western music, and feeding 18 MFCC features to the deep neural network is the optimum strategy to obtain better accuracy. Increasing the hop length from 512 to 1024 reduces the input dimension size, which facilitates the RNN-LSTM model. As a result, the performance of the RNN-LSTM models has been slightly improved. Our RNN-LSTM models’ test set accuracy decreased by 5% when we took 5 segments. Additionally, we evaluated and compared our model to six genres of the GTZAN dataset and achieved 90% accuracy.

Environmental Sound Classification Algorithm Based on Region Joint Signal Analysis Feature and Boosting Ensemble Learning

Article

Full-text available

Nov 2022

Environmental sound classification is an important branch of acoustic signal processing. In this work, a set of sound classification features based on audio signal perception and statistical analysis are proposed to describe the signal from multiple aspects of the time and frequency domain. Energy features, spectral entropy features, zero crossing rate (ZCR), and mel-frequency cepstral coefficient (MFCC) are combined to form joint signal analysis (JSA) features to improve the signal expression of the features. Then, based on the JSA, a novel region joint signal analysis feature (RJSA) for environment sound classification is also proposed. It can reduce feature extraction computation and improve feature stability, robustness, and classification accuracy. Finally, a sound classification framework based on the boosting ensemble learning method is provided to improve the classification accuracy and model generalization. The experimental results show that compared with the highest classification accuracy of the baseline algorithm, the environmental sound classification algorithm based on our proposed RJSA features and ensemble learning methods improves the classification accuracy, and the accuracy of the LightGBM-based sound classification algorithm improves by 14.6%.

Vehicle Interior Sound Classification Based on Local Quintet Magnitude Pattern and Iterative Neighborhood Component Analysis

Article

Full-text available

Oct 2022
APPL ARTIF INTELL

Nowadays, environmental sound classification (ESC) has become one of the most studied research areas. Sound signals that are indistinguishable from the human auditory systems have been classified by computer-aided systems and machine learning methods. Therefore, ESC has been used in signal processing and sound forensics applications. A novel ESC type is presented in this paper, and it is named as vehicle interior sound classification (VISC). VISC is defined as one of the sub-branches of the ESC, and it is utilized as sound-based biometrics for vehicles. A hand-crafted feature-based VISC method is presented. The proposed method has multileveled feature generation by using maximum pooling and the proposed local quintet magnitude pattern (LQMP), feature selection with iterative neighborhood component analysis (INCA), and classification phases. A novel VISC dataset was collected from YouTube and the proposed LQMP and INCA based method applied to the collected sounds. The results denoted that following: the accuracy, F1-score, and geometric mean of the proposed LQMP and INCA based VISC method were calculated as 98.38%,98.23%, and 98.21% by using support vector machine classifier respectively. The contribution of the proposed VISC method is to denote that the vehicles can be classified by using sound.

Research on Audio Recognition Based on the Deep Neural Network in Music Teaching

Article

Full-text available

May 2022
Comput Intell Neurosci

Solfeggio is an important basic course for music majors, and audio recognition training is one of the important links. With the improvement of computer performance, audio recognition has been widely used in smart wearable devices. In recent years, the development of deep learning has accelerated the research process of audio recognition. However, there is a lot of sound interference in music teaching environment, which leads to the performance of the audio classifier that cannot meet the actual demand. In order to solve this problem, an improved audio recognition system based on YOLO-v4 is proposed, which mainly improves the network structure. First, Mel frequency cepstrum number is used to process the original audio and extract the corresponding features. Then, try to apply the YOLO-v4 model in the field of deep learning to the field of audio recognition and improve it by combining with the spatial pyramid pool module to strengthen the generalization ability of data in different audio formats. Second, the stacking method in ensemble learning is used to fuse the independent submodels of two different channels. Experimental results show that compared with other deep learning technologies, the improved YOLO-v4 model can improve the performance of audio recognition, and it has better performance in processing data of different audio formats, which shows better generalization ability.

A Novel Model for Emotion Detection with Multilayer Perceptron Neural Network

Conference Paper

Full-text available

May 2022

Wind Sounds Classification Using Different Audio Feature Extraction Techniques

Article

Full-text available

Apr 2022

In this research, different audio feature extraction techniques are implemented and classification approaches are presented to classify seven types of wind. We applied features techniques such as Zero Crossing Rate (ZCR), Fast Fourier Transformation (FFT), Linear predictive coding (LPC), and Perceptual Linear Prediction (PLP). We know that some of these methods are good with human voices, but we tried to apply them here to characterize the wind audio content. The CNN classification method is implemented to determine the class of input wind sound signal. Experimental results show that each of these extraction feature methods give different results, but classification accuracy that are obtained by using PLP features return the best results.

Classification of Musical Instruments’ Sound using kNN and CNN

Conference Paper

Feb 2024

Classification of Indonesian Music Genres Using the Support Vector Machine Method

Conference Paper

Oct 2022

The schematic diagram of MhaNN-SVM.

Citations