Figure - available from: International Journal of Speech Technology
This content is subject to copyright. Terms and conditions apply.
NN-SVM ML classifier (Wiering et al., 2013)

NN-SVM ML classifier (Wiering et al., 2013)

Source publication
Article
Full-text available
In speaker recognition, identifying the speaker using the customized features of the speech signal is crucial. A single type of speech characteristics does not completely denote the identity of the spokesman. This work incorporated the usage of a hybrid scheme of neural network support vector machine (NN-SVM) to identify the speaker with greater pr...

Similar publications

Preprint
Full-text available
In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation. The 5th ABAW competition focuses on fac...

Citations

... Some researchers have used CNN only as a feature extractor and used other classifiers for classification [43]. These classifiers can be used alone or as a hybrid [44][45][46]. Apart from the classifiers mentioned above, the use of subspace classifiers for speaker identification is quite rare in the literature [47]. However, new methodologies can be developed to produce subspace classifier algorithms that perform well in the speaker identification field. ...
Preprint
Full-text available
Speaker identification is crucial in many application areas, such as automation, security, and user experience. This study examines the use of traditional classification algorithms and hybrid algorithms, as well as newly developed subspace classifiers, in the field of speaker identification. In the study, six different feature structures were tested for the various classifier algorithms. Stacked Features-Common Vector Approach (SF-CVA) and Hybrid CVA-FLDA (HCF) subspace classifiers are used for the first time in the literature for speaker identification. In addition, CVA is evaluated for the first time for speaker recognition using hybrid deep learning algorithms. This paper is also aimed at increasing accuracy rates with different hybrid algorithms. The study includes Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM), i-vector + PLDA, Time Delayed Neural Network (TDNN), AutoEncoder + Softmax (AE + Softmaxx), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Common Vector Approach (CVA), SF-CVA, HCF, and Alexnet classifiers for speaker identification. The six different feature extraction approaches consist of Mel Frequency Cepstral Coefficients (MFCC) + Pitch, Gammatone Cepstral Coefficients (GTCC) + Pitch, MFCC + GTCC + Pitch + eight spectral features, spectrograms,i-vectors, and Alexnet feature vectors. For SF-CVA, 100% accuracy was achieved in most tests by combining the training and test feature vectors of the speakers separately. RNN-LSTM, i-vector + KNN, AE + softmax, TDNN, and i-vector + HCF classifiers gave the highest accuracy rates in the tests performed without combining training and test feature vectors.
... The classification accuracy and F1 value of this model are higher than those of other traditional models, and even if the balance of the speaker dataset decreases, its stability will not be affected. 15 Networks models to optimize subspace loss, extract the best feature vectors, and achieve speaker identity classification and recognition. This feature extractor is feasible. ...
... In Equation (14), r p is the correction constant, with a value of 1. The determination of n m is Equation (15). ...
Article
Full-text available
Digital technology still has a low level of intelligence in the microgrid mode of teaching behavior analysis, resulting in the traditional manual observation and recording stage still being used for speaker identity classification, and the efficiency of teaching behavior analysis is also low. In response to the above issues, the research is based on the teacher‐student analysis method and proposes a dual clustering algorithm based on the general background model Gaussian mixture model for speaker identity classification, thereby realizing the development and design of intelligent behavior analysis software. The research results indicate that the average recall rate of behavior transition points in the classroom teaching discourse corpus of the intelligent behavior analysis software is 89.03%, which is better than traditional analysis methods. Therefore, the intelligent behavior analysis software constructed by the dual clustering algorithm has high effectiveness and practicality. The research proposes a method model and implements intelligent visualization for classroom teaching behavior analysis, improving the efficiency of analyzing current microgrid teaching behavior.
Chapter
The most natural method of interaction is speech. To recognize the speaker, we employ a method known as “automatic speaker recognition,” which takes into account the speaker's voice's several characteristics, including pitch, timbre, tension, frequency, etc. Additionally, it communicates details regarding the manner in which voice is created and conveyed. The method is applicable to forensics, monitoring, and authenticity applications. The problem with the current approach is the vast number of features employed for voice recognition. In this work, we are planning to use support vector machine (SVM) as a classifier and reinforcement learning as a linear regression to recognize the speaker's voice purely based on a limited set of attributes selected from the convolutional neural network. Building a clear, unambiguous and delegated automatic spokesman identification organism is the major goal of this endeavor. We will only test our algorithm on a small voice dataset due to space restrictions. To increase accuracy, the system is trained and evaluated using a large number of numerical speech data samples. Here, 125 examples will be used to help identify each phrase. The NIST and ELSDSR speaker datasets were used to conduct an analysis of the hybrid network models that were suggested earlier. The findings of our experiments demonstrated that the CNN-SVM model that we proposed achieved a high recognition accuracy of 95% (avg). In the same vein, the findings show that the deep network model that was proposed is superior to other models in terms of its ability to correctly identify speakers.KeywordsSVMDeep learningMFCCLPCDFTCNN