Figure 2 - uploaded by Marelie Davel
Content may be subject to copyright.
Visual representation of the PPR-LM architecture.

Visual representation of the PPR-LM architecture.

Source publication
Article
Full-text available
Copyright: 2009 South African Institute of Electrical Engineers This article introduces the first Spoken Language Identification system developed to distinguish among all eleven of South Africa’s official languages. The PPR-LM (Parallel Phoneme Recognition followed by Language Modeling) architecture is implemented, and techniques such as phoneme fr...

Context in source publication

Context 1
... are then represented as groups of vectors. Figure 2 provides a visual representation of the PPR-LM architecture. An utterance given as input to the system is passed to three ASR systems (English, French and Portuguese phoneme recognizers in the image) that together form the front-end of the system. ...

Similar publications

Article
Full-text available
Research in the field of spoken language identification (spoken LID) on local languages helps to extend the outreach of technology to local language speakers. This research also contributes to the preservation of local languages. In this paper, we report our work on identifying spoken data in three local Indonesian languages: Minangkabau, Sundanese...
Conference Paper
Full-text available
This paper describes the statistical machine translation system developed at RWTH Aachen University for the English!Romanian translation task of the ACL 2016 First Conference on Machine Translation (WMT 2016). We combined three different state-ofthe-art systems in a system combination: A phrase-based system, a hierarchical phrase-based system and a...
Article
Full-text available
Since real-time systems have special characteristics the development of such systems requires the observation of quantitative system aspects. Quan- titative predictions are needed already during the modeling phase of the system development process. Recently the Unified Modeling Language (UML) includ- ing its Profile for Schedulability, Performance,...
Conference Paper
Full-text available
In recent years the Unified Modeling Language (UML) including its profiles gained increasing acceptance as a specification language for modeling real-time systems. It is crucial to enable early quantitative predictions during the modeling phase of real-time systems development pro- cesses. UML itself is not directly analyzable. Performance evaluati...
Conference Paper
Full-text available
We describe the recent progress in SRI's Mandarin speech-to- text system developed for 2008 evaluation in the DARPA GALE program. A data-driven lexicon expansion technique and lan- guage model adaptation methods contribute to the improvement in recognition performance. Our system yields 8.3% character error rateon theGALE dev08 test set, and 7.5% a...

Citations

... There are numerous features of an audio speech segment that can vary from language to language. These can be included into various LID system designs, each with a unique level of complexity and outcome 29 . Several feature extraction techniques exist for language classification 30 . ...
Article
Full-text available
In recent times, there is an increasing interest in employing technology to process natural language with the aim of providing information that can benefit society. Language identification refers to the process of detecting which speech a speaker appears to be using. This paper presents an audio-based Ethio-semitic language identification system using Recurrent Neural Network. Identifying the features that can accurately differentiate between various languages is a difficult task because of the very high similarity between characters of each language. Recurrent Neural Network (RNN) was used in this paper in relation to the Mel-frequency cepstral coefficients (MFCCs) features to bring out the key features which helps provide good results. The primary goal of this research is to find the best model for the identification of Ethio-semitic languages such as Amharic, Geez, Guragigna, and Tigrigna. The models were tested using an 8-h collection of audio recording. Experiments were carried out using our unique dataset with an extended version of RNN, Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory (BLSTM), for 5 and 10 s, respectively. According to the results, Bidirectional Long Short Term Memory (BLSTM) with a 5 s delay outperformed Long Short Term Memory (LSTM). The BLSTM model achieved average results of 98.1, 92.9, and 89.9% for training, validation, and testing accuracy, respectively. As a result, we can infer that the best performing method for the selected Ethio-Semitic language dataset was the BLSTM algorithm with MFCCs feature running for 5 s.
... The multilingual bottleneck feature significantly improved LID performance. PPRLM based LID system was trained for 11 South African languages in [162]. 13-dimensional MFCC with Δ and Δ 2 was used as acoustic features and fed to a context-dependent HMM model for phoneme recognition. ...
Preprint
Full-text available
Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.
... • Requires more computation cost. 6. Peché et al. [55] HMMs NIST-LRE database • Obtains high-quality linguistic resources. It improves the identification performance. ...
... In 2016, Ferrer et al. [19] have compared diverse techniques for spoken language recognition using trained deep neural networks (DNNs) for predicting the senone posteriors with diverse characteristics concerning available training data, task definition and acoustic conditions. In 2009, Peché et al. [55] have suggested PPRLM architecture for distinguishing the South Africa's official languages using the phoneme frequency filtering through the HMMs. In 2021, Garain et al. [24] have implemented an automatic speech recognition using ensemble learning-based deep learning approach called FuzzyGCP with the combination of approaches like Semi-supervised Generative Adversarial Network (SSGAN), DCNN and Deep Dumb Multi-Layer Perceptron (DDMLP) for getting the language classes to extract the languages from speech signals. ...
Article
Full-text available
Information Technology has touched new vistas for a couple of decades mostly to simplify the day-to-day life of the humans. One of the key contributions of Information Technology is the application of Artificial Intelligence to achieve better results. The advent of artificial intelligence has given rise to a new branch of Natural Language Processing (NLP) called Computational Linguistics, which generates frameworks for intelligently manipulating spoken language knowledge and has brought human-machine onto a new stage. In this context, speech has arisen to be one of the imperative forms of interfaces, which is the basic mode of communication for us, and generally the most preferred one. Language identification, being the front-end for various natural language processing tasks, plays an important role in language translation. Owing to this, the focus has been given on the field of speech recognition involving the identification & recognition of languages by a machine. Spoken language identification is the identification of language present in a speech segment despite its size (duration & speed), ambiance (topic & emotion), and moderator (gender, age, demographic region). This paper has investigated various existing spoken language identification models implemented using different deep learning approaches, datasets, and performance measures utilized for their analysis. It also highlights the main features and challenges faced by these models. A comprehensive comparative study of deep learning techniques has been carried out for spoken language identification. Moreover, this review analyzes the efficiency of the spoken language models that can help the researchers to propose new language identification models for speech signals.
... The multilingual bottleneck feature significantly improved LID performance. PPRLM based LID system was trained for 11 South African languages in [162]. 13-dimensional MFCC with Δ and Δ 2 was used as acoustic features and fed to a context-dependent HMM model for phoneme recognition. ...
Article
Automatic spoken language identification (LID) is a very important research field in the era of multilingual voice-command-based human-computer interaction (HCI). A front-end LID module helps to improve the performance of many speech-based applications in the multilingual scenario. India is a populous country with diverse cultures and languages. The majority of the Indian population needs to use their respective native languages for verbal interaction with machines. Therefore, the development of efficient Indian spoken language recognition systems is useful for adapting smart technologies in every section of Indian society. The field of Indian LID has started gaining momentum in the last two decades, mainly due to the development of several standard multilingual speech corpora for the Indian languages. Even though significant research progress has already been made in this field, to the best of our knowledge, there are not many attempts to analytically review them collectively. In this work, we have conducted one of the very first attempts to present a comprehensive review of the Indian spoken language recognition research field. In-depth analysis has been presented to emphasize the unique challenges of low-resource and mutual influences for developing LID systems in the Indian contexts. Several essential aspects of the Indian LID research, such as the detailed description of the available speech corpora, the major research contributions, including the earlier attempts based on statistical modeling to the recent approaches based on different neural network architectures, and the future research trends are discussed. This review work will help assess the state of the present Indian LID research by any active researcher or any research enthusiasts from related fields.
... In addition to general work on text normalization, there has been work on African languages in particular. This work includes work on individual languages, such as Yoruba (Asahiah et al., 2017), Swahili (Hurskainen, 2004), and Zulu (Pretorius and Bosch, 2003), as well as some crosslinguistic (Peche et al., 2009). However, much of the crosslinguistic work focuses on languages of South Africa, which ignores many of the languages we are interested in. ...
Preprint
Full-text available
Training data for machine learning models can come from many different sources, which can be of dubious quality. For resource-rich languages like English, there is a lot of data available, so we can afford to throw out the dubious data. For low-resource languages where there is much less data available, we can't necessarily afford to throw out the dubious data, in case we end up with a training set which is too small to train a model. In this study, we examine the effects of text normalization and data set quality for a set of low-resource languages of Africa -- Afrikaans, Amharic, Hausa, Igbo, Malagasy, Somali, Swahili, and Zulu. We describe our text normalizer which we built in the Pynini framework, a Python library for finite state transducers, and our experiments in training language models for African languages using the Natural Language Toolkit (NLTK), an open-source Python library for NLP.
... Then a trained word-based lexical model is applied to identify languages via recognised word sequences. The approaches such as Parallel Phone Recognition and Language Modelling (P-PRLM) [4,18] and parallel phoneme recognition vector space modelling (PPR-VSM) [17] are some of the most popular approaches to the LID system. The P-PRLM approach employs multiple phoneme recognisers that tokenise the speech waveform into sequences of phonemes. ...
... The benefit of scaling datasets is to speed up the training and classification process in order to obtain the best model performance and to avoid numerical differences that could lead to overfitting if the training data attributes are in a large range [16]. A grid search was used to estimate the SVM parameters such as C, gamma, margin error, trade-off parameter and kernel width before training the classifier [5], [17]. The Radial Basis Function (RBF) kernel was used for training the classifier. ...
Conference Paper
Full-text available
This paper proposes phoneme clustering methods for multilingual language identification (LID) on a mixed-language corpus. A one-pass multilingual automated speech recognition (ASR) system converts spoken utterances into occurrences of phone sequences. Hidden Markov models were employed to train multilingual acoustic models that handle multiple languages within an utterance. Two phoneme clustering methods were explored to derive the most appropriate phoneme similarities between the target languages. Ultimately a supervised machine learning technique was employed to learn the language transition of the phonotactic information and engage the support vector machine (SVM) models to classify phoneme occurrences. The system performance was evaluated on mixed-language speech corpus for two South African languages (Sepedi and English) using the phone error rate (PER) and LID classification accuracy separately. We show that multilingual ASR which fed directly to the LID system has a direct impact on LID accuracy. Our proposed system has achieved an acceptable phone recognition and classification accuracy in mixed-language speech and monolingual speech (i.e. either Sepedi or English). Data-driven, and knowledge-driven phoneme clustering methods improve ASR and LID for code-switched speech. The data-driven method obtained the PER of 5.1% and LID classification accuracy of 94.5% when the acoustic models are trained with 64 Gaussian mixtures per state.
... The benefit of scaling data sets is to speed up training and classification process in order to obtain the best model performance and to avoid numerical differences that could lead to over-fitting if the training data attributes are in a large range [19]. A grid search is a simple search technique which was used to estimate the SVM parameters such as C,gamma, margin-error trade-off parameter and kernel width before training the classifier [4], [20]. The Radial Basis Function (RBF) kernel was used for training the classifier. ...
... Then a trained word-based lexical model is applied to identify languages via recognized words sequences. The approaches such as P-PRLM, phoneme recognition followed by language modelling (PRLM) [4] and parallel phoneme recognition vector space modelling (PPR-VSM) [17] are some of the most popular approaches to LID system. The P-PRLM approach employs multiple phoneme recognizers that tokenize the speech waveform into sequences of phonemes. ...
Conference Paper
Full-text available
This paper presents an incorporation of phoneme sequences as language information to perform language identification (LID) in code-switched speech. The one-pass recognition system converts the spoken utterances into an occurrence of phone sequences. We employed the hidden Markov model (HMM) to build robust content-dependent acoustic models that can handle multiple languages within an utterance. We reported two phoneme mapping methods to determine the phoneme similarities among our target languages. A statistical phoneme-based bigram language model is incorporated for speech decoding to obviate possible phone mismatches. We supervised support vector machine (SVM) which learned the language transition of the phonotactic information in the mixed-language speech given the recognized phone sequences. The back-end decision is taken by an SVM which classifies language identity given the likelihood scores based on the monolingual phone occurrence segments. The experiments were performed with commonly mixed Northern Sotho and English speech corpora. We evaluate the system measuring the performance of the phone recognition and LID portions separately. We were able to obtain a phone recognition accuracy of 84.4% when using data-driven phoneme mapping approach modeled with 16 Gaussian mixtures per state. The proposed system achieved an acceptable LID accuracy of 89.6% and average of 81.4% on code-switched speech and monolingual speech segments respectively.
... Peche et. al. showed the versatility of PPR-SVM SLID architecture by utilizing the system in a limited data environment [6], porting the system to operate it in a new low-bandwidth environment [7] and successfully applying the system in realworld resource-scarce environment, to identify South African languages [8]. The SLID task was more challenging since the in-domain audio had no transcriptions which added further complications. ...
... From the various SLID systems surveyed, the SLID system presented by Peche et. al. [6], [7], [8] was the best candidate and served as our baseline SLID system. Briefly, the entire SLID framework consists of data filtering, phone recogniser training, classifier training and evaluation. ...
... One of the most widely used SLID architectures is the Parallel Phone Recogniser front-end [2] and classifier back-end scheme [3], [8]. In this set-up, a bank of phone recognisers are used to generate phonetic information streams from the audio which are then fused together to form an input to a classifier which makes a final decision about the spoken language. ...
Conference Paper
Full-text available
Speech technologies have matured over the past few decades and have made significant impacts in a variety of fields, from assistive technologies to personal assistants. However, speech system development is a resource intensive activity and requires language resources such as text annotated audio recordings and pronunciation dictionaries. Unfortunately, many languages found in the developing world fall into the resource-scarce category and due to this resource scarcity the deployment of Automatic Speech Recognition (ASR) systems in the developing world is severely inhibited. Given that few task-specific corpora exist and speech technology systems perform poorly when deployed in a new environment, we investigate the use of acoustic model adaptation. We propose a new blind deconvolution technique which rapidly adapts acoustic models to a new environment and increases their overall robustness. This new technique is utilized in a Spoken Language Identification (SLID) system and significantly improves the system's accuracy by 6% relative to the baseline system and achieves comparable performances when compared to relatively more computationally intensive standard adaptation techniques.
... The ASR technology has achieved a point where carefully designed speech-enabling systems for suitably constrained applications are a reality. The LID system is also an enabling technology for a wide range of multilingual speech processing applications, such as spoken document retrieval, spoken language translation [4], and multilingual ASR [2], interactive voice response and telephone call routing system. For voice surveillance purposes over telephone network, an LID system can also make a massive online language monitoring possible. ...
... For voice surveillance purposes over telephone network, an LID system can also make a massive online language monitoring possible. Each spoken language has several linguistic features that can distinguish one language from another, namely; acoustics, phonetics, phonotactics, prosodic, lexical and syntactic features [4]. A robust LID system that utilizes such linguistic information can be achieved and typically very simple to construct [5]. ...
... Over the past decades, several LID approaches on multilingual speech have been applied. Techniques such as parallel phone recognition followed by language model (PPRLM) [5], SVM using phoneme frequency filtering [4], vector space and n-gram language model [5]. Though these approaches have successfully identified several spoken languages, input test was based on monolingual speech data. ...
Preprint
Full-text available
Multilingual speakers have the ability and tendency for engaging in code-switching-a mixed-language phenomenon that is referred to as the usage of more than one language in utterances which presents great challenges the speech-enabled systems. This paper presents the proposed comprehensive scheme for automatic language identification (LID) integrated with automatic speech recognition (ASR) system to identify languages used in a code-switched speech context. The front-end ASR system feeds the decoded phonemes into the LID system. We used robust hidden Markov models to build acoustic models derived from the hybrid phoneme set that handles multiple languages within an utterance. We incorporated a statistical phone based bigram language model to obviate a domain-limited vocabulary recognition system. A spoken utterance is converted into feature vectors with attributes that represents the statistical occurrences of each acoustic units. At the back-end, a supervised support vector machine classifier based on n-gram structures is used to correctly identify the acoustic unit feature vectors. It was observed from experiments that the use of similar phonetic features in the hybrid phoneme set resulted in 3% reduction in word error rate as the ASR accuracy increased to 67.6%. Furthermore, the proposed scheme achieved a high encouraging performance with an LID accuracy of 85% on a code-switched speech and average of 81% on a monolingual speech.