Comparison of different window sizes and frame shifts

Comparison of different window sizes and frame shifts

Source publication
Article
Full-text available
In recent years, the use of convolutional neural networks has been successful in the task of cough recognition. This method mainly converts audio clips into the form of spectrograms, and then uses convolutional neural networks for classification, which prompts us to seek better input representation For more effective training to achieve better perf...

Similar publications

Article
Full-text available
Sign language video understanding requires capturing both spatial and temporal information in sign language video clips. We propose Lightweight Sign Transformer Framework, which is a two-stream lightweight network incorporating transformer architecture that consists of RGB flow and RGB difference. It leverages the latest advances in computer vision...
Article
Full-text available
Human body posture recognition has become the focus of many researchers in recent years. Recognition of body posture is used in various applications, including surveillance, security, and health monitoring. However, these systems that determine the body’s posture through video clips, images, or data from sensors have many challenges when used in th...
Article
Full-text available
In recent years, research on emotion classification based on physiological signals has actively attracted scholars’ attention worldwide. Several studies and experiments have been conducted to analyze human emotions based on physiological signals, including the use of electrocardiograms (ECGs), electroencephalograms (EEGs), and photoplethysmograms (...

Citations

... A special type of features are the custom features, sorted in the "Deep learning raw data" category. They correspond to the rising field of deep learning ( [29,[45][46][47][48][49][50]52,[54][55][56]78,80,[84][85][86][87]110,126,129,137,143,144,[148][149][150]154,155,157,171,172,183,[185][186][187]). Deep neural networks take as input large raw data, such as spectrogram images, and provide as output or intermediate results a limited set of machine-interpretable values containing pertinent information, the features. ...
... The Hidden Markov Models and derivatives category encompasses the Hidden Markov Models ( [69,76,127]), inherited from the speech recognition research and particularly adapted to model times series [176], as well as its implementation where the observation probabilities are generated by Gaussian Mixtures ( [14,68,133,145,[175][176][177]180]) and by deep neural networks ( [143,180]). The Neural Networks category encompasses all the neural networks, from the most simple ones with limited size and limited hidden layers, or supposed so ( [22,27,69,111,132,156,162,174,182]) to the stateof-the-art deep networks ( [29,55,85,87,110,148,149,183]), including the Convolutional Neural Networks ( [45][46][47]52,54,78,80,86,126,129,137,144,154,155,171,[185][186][187]), the Time Delay Neural Networks ( [84,150]), the Recurrent Neural Networks [49,50,56,155,172], the Generative Adversarial Networks [80] and the Octonion Neural Networks [157]. The other classifiers are the Logistic Regression ( [48,58,75,83,90,91,[93][94][95]100,147,163]), the k-Nearest Neighbors (kNN) ( [49,81,104,119,120,124,125,138,141,142,146,164]), the Principal Component Analysis [33], the Fuzzy C-means [99], the Gaussian Mixture Models ( [78,98,102,103,128,168,173]), the Discriminant Analysis [30,116] and the Naïve Bayes [153]. ...
Article
Full-text available
Cough is a very common symptom and the most frequent reason for seeking medical advice. Optimized care goes inevitably through an adapted recording of this symptom and automatic processing. This study provides an updated exhaustive quantitative review of the field of cough sound acquisition, automatic detection in longer audio sequences and automatic classification of the nature or disease. Related studies were analyzed and metrics extracted and processed to create a quantitative characterization of the state-of-the-art and trends. A list of objective criteria was established to select a subset of the most complete detection studies in the perspective of deployment in clinical practice. One hundred and forty-four studies were short-listed, and a picture of the state-of-the-art technology is drawn. The trend shows an increasing number of classification studies, an increase of the dataset size, in part from crowdsourcing, a rapid increase of COVID-19 studies, the prevalence of smartphones and wearable sensors for the acquisition, and a rapid expansion of deep learning. Finally, a subset of 12 detection studies is identified as the most complete ones. An unequaled quantitative overview is presented. The field shows a remarkable dynamic, boosted by the research on COVID-19 diagnosis, and a perfect adaptation to mobile health.
Article
Full-text available
The goal of music genre classification is to identify the genre of given feature vectors representing certain characteristics of music clips. In addition, to improve the accuracy of music genre classification, considerable research has been conducted on extracting spectral features, which contain critical information for genre classification, from music clips and feeding these features into training models. In particular, recent studies argue that classification accuracy can be enhanced by employing multiple spectral features simultaneously. Consequently, fusing information from multiple spectral features is a critical consideration in designing music genre classification models. Hence, this paper provides a short survey of recent studies on music genre classification and compares the performance of the most recent CNN-based models with a newly devised model that employs a late fusion strategy for the multiple spectral features. Our empirical study of 12 public datasets, including Ballroom, ISMIR04, and GTZAN, showed that the late fusion CNN model outperforms other compared methods. Additionally, we performed an in-depth analysis to validate the effectiveness of the late fusion strategy in music genre classification.
Chapter
According to the World Health Organization (WHO), since the discovery of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), there have been a total of 530,896,347 confirmed cases of COVID-19 of which 6,301,020 deaths have been reported as of June 2022. The virus (SARS-CoV-2) primarily affects the respiratory system causing dry cough to be prominent among other symptoms. Sound and the type of cough are said to contain useful features that can contribute to the diagnosis of a disease. Artificial Intelligence (AI) and signal processing show promising potential in the prediction of pulmonary diseases. This provides a platform for a non-invasive method of screening for COVID-19. In turn, this allows for a faster way to screen for the presence of the virus promoting early detection. This paper studies the usage of AI and signal processing methods for the detection of COVID-19 using cough sounds collected from crowdsourced applications. This paper focuses on three different models trained using a Support Vector Machine (SVM), ResNet, and a traditional Convolutional Neural Network (CNN). The datasets used in this study were obtained from the University of Cambridge and the Coswara project. Preliminary investigations show that for models trained and tested using the Cambridge Dataset, CNN performed the best with an AUC of 0.816 followed by the ResNet model and the SVM model with an AUC of 0.738 and 0.671, respectively. Further investigation shows that the models perform poorly in classifying COVID-19-positive and -negative classes when tested on the Coswara dataset. Testing results show that the SVM model performed the best, with an AUC of 0.564, followed by the ResNet and the CNN models (AUCs = 0.522 and 0.443, respectively). Our future work will focus on improving the models’ performance by applying various data pre-processing and feature extraction methods on the datasets.