About the lab

Intelligent Sound Processing Laboratory (ISP-Lab)

Research Topics (2018-current):
– Spoken Language Identification (LID)
– Speaker Identification (SID)
– Speaker Diarization (Speaker Segmentation)
– Spoken Keyword Spotting (KWS) & Spoken Term Detection (STD)
– Soken Emotion Recognition (SER)
– Voice Activity Detection (VAD) & Speech Activity Detection (SAD)
– Automatic Speech Recognition (ASR)
– Voice Pathology Detection From Speech
– Automatic Audio Scene Recognition
– Audio Source Separation & Speech Enhancement
– Anomalous Sound Detection (ASD)
– English-to-Persian Voice Actor Recommender System ‎
– Diagnosis of Depression from Speech Signals of Conversations
– Alzheimer’s Dementia Recognition From Speech
– Imagined Speech Detection by EEG signals
– Heart Sound Signal Classification

Exploiting auditory filter models as interpretable convolutional frontends to obtain optimal architectures for speaker gender recognition

Article

Oct 2023

What can phone attractors in RPS tell us? A study of dynamic information in speech signals for phone classification purposes

Article

Aug 2023

A Persian Wake Word Detection System Based on the Fine Tuning of A Universal Phone Decoder and Levenshtein Distance

Conference Paper

May 2023
2023 9th International Conference on Web Research (ICWR)

Analyzing the Use of Auditory Filter Models for Making Interpretable Convolutional Neural Networks for Speaker Identification

Conference Paper

Jan 2023
2023 28th International Computer Conference, Computer Society of Iran (CSICC)

Spoken language identification using a genetic-based fusion approach to combine acoustic and universal phonetic results

Article

Dec 2022

Identification of the spoken languages in an audio file is performed automatically using the spoken language identification (LID) process. In this paper, we proposed a genetic-based fusion method to combine the score probabilities of an x-vector-based acoustic LID (ALID) and a phonetic LID (PLID) system. The ALID system is based on an LDA classifier able to identify different languages using x-vectors, while the PLID system is based on an SVM classifier which takes into account perplexities as its feature vector, which are derived from phone language models utilizing a universal phone recognizer named Allosaurus. With the help of genetic-based fusion, 54 weights will be extracted. Having 27 languages in our database and two different LID systems results in 54 weights for our fusion. The individual results of our acoustic and phonetic LID systems are eventually combined by applying these weights. Based on the experimental results on 27 languages from the NIST-LRE09 database, the fusion of the acoustic system and the phonetic system results in 93.30% accuracy, which has approximately a 21% reduction in identification error to our best baseline system with 91.50% accuracy.

Lab head

Yasser Shekofteh

Shahid Beheshti University

Department

Faculty of Computer Science and Engineering

About Yasser Shekofteh

Research Fields: - Digital Signal Processing (DSP) and Machine Learning (ML) - Automatic Speech Recognition (ASR), Spoken Term Detection (STD), and Keyword Spotting (KWS) - Voice Commands and Speech Assistant - Speaker Recognition (SRE) and Spoken Language Identification (LID) - Voice Pathology and Sound Heart Detection - Speech Enhancement – noise reduction - Dynamical Systems and Chaos, System Identification, and Parameter Estimation - Robotics {Speech Processing}