Standard Text-to-speech system architecture

Source publication

DNN-based grapheme-to-phoneme conversion for Arabic text-to-speech synthesis

Article

Full-text available

Sep 2020

Arabic text-to-speech synthesis from non-diacritized text is still a big challenge, because of unique Arabic language rules and characteristics. Indeed, the diacritic and gemination signs, which are special characters representing respectively short vowels and consonant doubling, have a major effect on accurate pronunciation of Arabic. However thes...

Context 1

... TTS system is based on two main modules (cf. Fig. 2): the Text-to-Phoneme module named also Natural Language Processing (NLP) module, and the Phonemeto-Speech module or the Digital Signal Processing (DSP) ...

View in full-text

Context 2

... this can not be compatible with all languages : each language has its own characteristics that lead to having its own system. The focus of this work is to carry out all linguistic module component for Arabic Text as shown in figure 2, using the deep learning approach. In this research, no linguistic features and tools are employed as it has been the case in previous works, such as [16]. ...

View in full-text

Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech

Preprint

Full-text available

Sep 2022

Converting written texts into their spoken forms is an essential problem in any text-to-speech (TTS) systems. However, building an effective text normalization solution for a real-world TTS system face two main challenges: (1) the semantic ambiguity of non-standard words (NSWs), e.g., numbers, dates, ranges, scores, abbreviations, and (2) transform...

Deep Learning Driven Arabic Text to Speech Synthesizer for Visually Challenged People

Article

Full-text available

Jan 2023
INTELL AUTOM SOFT CO

Advancements in Arabic Text-to-Speech Systems: A 22-Year Literature Review

Article

Full-text available

Jan 2023

Although there are several speech synthesis models available for different languages tailored to specific domain requirements and applications, there is currently no readily available information on the latest trends in Arabic language speech synthesis. This can make it challenging for beginners to research and develop text-to-speech (TTS) systems for Arabic languages. To address this issue, this article provides a comprehensive overview of several scholars’ contributions to the field of Arabic TTS, along with an examination of the unique features of the Arabic language and the corresponding challenges in creating TTS systems. Reporting only on papers discussing Arabic TTS, this systematic review evaluated the available literature published between 2000 and 2022. We conducted a systematic review in six databases using preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines to identify studies that addressed Arabic Text-to-Speech systems. Of a total of 3719 articles identified, only 36 (0.96%) articles met our search criteria. Bibliometric analyses of these studies were conducted and reported. The results highlight the main types of speech synthesis techniques used in TTS systems: concatenative, formant, deep neural network (DNN), hybrid models, and multiagent. The corpora used to develop these systems, as well as the diacritization techniques incorporated, evaluation techniques, and the results of the performance of the systems are reported. Subjective evaluation using the mean opinion score is most applied to measure the accuracy of systems. This study also identifies gaps in the literature and makes recommendations for future research directions.

TOWARD DEVELOPING AN INTELLIGENT PERSONAL ASSISTANT FOR TUNISIAN ARABIC

Article

Full-text available

Dec 2022

Intelligent systems powered by Artificial Intelligence techniques have been massively proposed to help humans in various tasks. The intelligent personal assistant (IPA) is one of these smart systems. In this paper, we present an attempt to create an IPA, that interacts with users via Tunisian Arabic (TA), (the colloquial form used in Tunisia). We propose and explore a simple-to-implement method for building the principal components of a TA IPA. We apply Deep learning techniques: CNN [1], RNN encoder-decoder [2] and end-to-end approaches for creating IPA speech components (Speech Recognition and Speech Synthesis). In addition, we explore the availability and free dialog platform for understanding and generating the suitable response in TA for a request. For this proposal, we create and use TA transcripts for generating the corresponding models. Evaluation results are acceptable for the first attempt.

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Article

Full-text available

May 2022

DNN-Based Arabic Speech Synthesis

Conference Paper

May 2022

This article discusses a Deep Neural Network-based Text-to-Speech synthesis for the Arabic language. Subjective and objective tests were used to evaluate the system. We used the Mean Opinion Score (MOS) for subjective evaluation, and the Diagnostic Rhyme Test (DRT) to test the intelligibility of some consonants and vowels. We use the Perceptual Evaluation of Speech Quality (PESQ) for objective evaluation. The results have a mean of 3.92/5, 3.88/5 for the MOS and DRT tests, respectively, and 3.17/5 for the PESQ test; the majority of words and sentences were recognized, and the system's overall evaluation quality was satisfactory. Furthermore, the results show a significant improvement in the quality of synthesized speech for DNN-based TTS when compared to its HMM-based counterpart.

Automatic Methods and Neural Networks in Arabic Texts Diacritization: A Comprehensive Survey

Article

Full-text available

Oct 2021

Manar M. Almanea

Arabic diacritics are signs used in Arabic orthography to represent essential morphophonological and syntactic information. It is a common practice to leave out those diacritics in written Arabic. Most Arabic electronic texts lack such diacritics. The processing of those texts for various puposes of Natural Language Processing is a complicated task. Diacritized words are necessary for applications such as machine translation, sentiment analysis, and speech synthesis. To address this problem, several studies proposed automatic systems to restore diacritics in Arabic texts. The present paper presents an in-depth survey of 56 most recent Arabic diacritization studies. The studies encompassed in this survey have been selected from the following databases: IEEE Xplore, Clarivate, Analytics, Google Scholar, and Science Direct. Based on the diacritization approach, the studies are grouped into four sections in terms of method; rule-based, simple statistical, hybrid, and Neural Networks. While rule-based methods such as morphological analyzers and lexicon retrievals were the earliest approaches, results indicated that they are still valuable tools that can aid in the process of diaciritization. Effective statistical methods that produced diacritics with acceptable accuracy include Hidden Markov Model, n-grams, and Support Vector Machines. They are often accompanied by either rule-based or neural networks in hybrid systems. Neural networks, specifically Bidirectional Long Short Term Memory, reached very high diacritization accuracy levels. Studies employing neural networks focused on evaluating and comparing the efficacy of several types of neural networks or a hybrid of them, testing alternatives of input units or suggested schemes for partial daicritization. The study synthesizes the results of the studies, identifies research gaps, and offers recommendations for future research.

Improving Arabic Diacritization by Learning to Diacritize and Translate

Preprint

Sep 2021

We propose a novel multitask learning method for diacritization which trains a model to both diacritize and translate. Our method addresses data sparsity by exploiting large, readily available bitext corpora. Furthermore, translation requires implicit linguistic and semantic knowledge, which is helpful for resolving ambiguities in the diacritization task. We apply our method to the Penn Arabic Treebank and report a new state-of-the-art word error rate of 4.79%. We also conduct manual and automatic analysis to better understand our method and highlight some of the remaining challenges in diacritization.

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Preprint

Full-text available

Aug 2021

Soft spelling errors are a class of spelling mistakes that is widespread among native Arabic speakers and foreign learners alike. Some of these errors are typographical in nature. They occur due to orthographic variations of some Arabic letters and the complex rules that dictate their correct usage. Many people forgo these rules, and given the identical phonetic sounds, they often confuse such letters. In this paper, we propose a bidirectional long short-term memory network that corrects this class of errors. We develop, train, evaluate, and compare a set of BiLSTM networks. We approach the spelling correction problem at the character level. We handle Arabic texts from both classical and modern standard Arabic. We treat the problem as a one-to-one sequence transcription problem. Since the soft Arabic errors class encompasses omission and addition mistakes, to preserve the one-to-one sequence transcription, we propose a simple low-resource yet effective technique that maintains the one-to-one sequencing and avoids using a costly encoder-decoder architecture. We train the BiLSTM models to correct the spelling mistakes using transformed input and stochastic error injection approaches. We recommend a configuration that has two BiLSTM layers, uses the dropout regularization, and is trained using the latter training approach with error injection rate of 40%. The best model corrects 96.4% of the injected errors and achieves a low character error rate of 1.28% on a real test set of soft spelling mistakes.

Stemmer and Phonotactic Rules to Improve n-Gram Tagger-Based Indonesian Phonemicization

Article

Full-text available

Jan 2021

A phonemicization or grapheme-to-phoneme conversion (G2P) is a process of converting a word into its pronunciation. It is one of the essential components in speech synthesis, speech recognition, and natural language processing. The deep learning (DL)-based state-of-the-art G2P model generally gives low phoneme error rate (PER) as well as word error rate (WER) for high-resource languages, such as English and European, but not for low-resource languages. Therefore, some conventional machine learning (ML)-based G2P models incorporated with specific linguistic knowledge are preferable for low-resource languages. However, these models are poor for several low-resource languages because of various issues. For instance, an Indonesian G2P model works well for roots but gives a high PER for derivatives. Most errors come from the ambiguities of some roots and derivative words containing four prefixes: 〈ber〉,〈meng〉,〈peng〉, and 〈ter〉. In this research, an Indonesian G2P model based on n-gram combined with stemmer and phonotactic rules (NGTSP) is proposed to solve those problems. An investigation based on 5-fold cross-validation, using 50 k Indonesian words, informs that the proposed NGTSP gives a much lower PER of 0.78% than the state-of-the-art Transformer-based G2P model (1.14%). Besides, it also provides a much faster processing time.

The Effect of Different Mobile Trajectory on the Performance of VoIP Application in WiMAX Network

Conference Paper

Full-text available

Mar 2022

Standard Text-to-speech system architecture

Contexts in source publication

Similar publications

Citations