Figure 2 - uploaded by Abeer Alwan
Content may be subject to copyright.
2: Example of bit allocation and bit prioritization for the subband coder operating at 32 kbps. Each block represents the allocation of one bit to each subband sample (1 kbps). The first three blocks (3 kbps) are reserved for the transmission of the side information (bit allocation and the different gains).

2: Example of bit allocation and bit prioritization for the subband coder operating at 32 kbps. Each block represents the allocation of one bit to each subband sample (1 kbps). The first three blocks (3 kbps) are reserved for the transmission of the side information (bit allocation and the different gains).

Source publication
Article
Full-text available
This paper presents source and channel coding techniques for remote automatic speech recognition (ASR) systems. As a case study, Line Spectral Pairs (LSP) extracted from the 6th order allpole Perceptual Linear Prediction (PLP) spectrum are transmitted and speech recognition features are then obtained. The LSPs, quantized using first-order predictiv...

Citations

... Another two cepstral coefficients, derived from Linear prediction coefficients, are linear prediction cepstral coefficients (LPCC) and perceptual linear prediction coefficients (PLPC). LPCC is mainly used in noise elimination [30] or music genre classification [31], or speech recognition [32]. PLPC involves critical band spectral resolution, equal-loudness curve, and intensity loudness power law [20]. ...
Article
Full-text available
This paper analyses the efficiency of various frequency cepstral coefficients (FCC) in a non-speech application, specifically in classifying acoustic impulse events-gunshots. There are various methods for such event identification available. The majority of these methods are based on time or frequency domain algorithms. However, both of these domains have their limitations and disadvantages. In this article, an FCC, combining the advantages of both frequency and time domains, is presented and analyzed. These originally speech features showed potential not only in speech-related applications but also in other acoustic applications. The comparison of the classification efficiency based on features obtained using four different FCC, namely mel-FCC (MFCC), inverse mel-frequency cepstral coefficients (IMFCC), linear-frequency cepstral coefficients (LFCC), and gammatone-frequency cepstral coefficients (GTCC) is presented. An optimal frame length for an FCC calculation is also explored. Various gunshots from short guns and rifle guns of different calibers and multiple acoustic impulse events, similar to the gunshots, to represent false alarms are used. More than 600 acoustic events records have been acquired and used for training and validation of two designed classifiers, support vector machine, and neural network. Accuracy, recall and Matthew’s correlation coefficient measure the classification success rate. The results reveal the superiority of GFCC to other analyzed methods.
... In [19] and [29], it is shown that an efficient representation of the PLP spectrum for quantization is using the line spectral frequencies (LSF) of the linear prediction system, to exploit their high inter-and intra-frame correlation. Quantizing LSFs also yields a better representation of the low-order cepstral coefficients, more important for speech recognition. ...
Article
Full-text available
We present a framework for developing source coding, channel coding and decoding as well as erasure concealment techniques adapted for distributed (wireless or packet-based) speech recognition. It is shown that speech recognition as opposed to speech coding, is more sensitive to channel errors than channel erasures, and appropriate channel coding design criteria are determined. For channel decoding, we introduce a novel technique for combining at the receiver soft decision decoding with error detection. Frame erasure concealment techniques are used at the decoder to deal with unreliable frames. At the recognition stage, we present a technique to modify the recognition engine itself to take into account the time-varying reliability of the decoded feature after channel transmission. The resulting engine, referred to as weighted Viterbi recognition, further improves the recognition accuracy. Together, source coding, channel coding and the modified recognition engine are shown to provide good recognition accuracy over a wide range of communication channels with bit rates of 1.2 kbps or less.
Chapter
Full-text available
Distributed Speech Recognition (DSR) systems rely on efficient transmission of speech information from distributed clients to a centralized server. Wireless or network communication channels within DSR systems are typically noisy and bursty. Thus, DSR systems must utilize efficient Error Recovery (ER) schemes during transmission of speech information. Some ER strategies, referred to as forward error control (FEC), aim to create redundancy in the source coded bitstream to overcome the effect of channel errors, while others are designed to create spread or delay in the feature stream in order to overcome the effect of bursty channel errors. Furthermore, ER strategies may be designed as a combination of the previously described techniques. This chapter presents an array of error recovery techniques for remote speech recognition applications. This chapter is organized as follows. First, channel characterization and modeling are discussed. Next, media-specific FEC is presented for packet erasure applications, followed by a discussion on media-independent FEC techniques for bit error applications, including general linear block codes, cyclic codes, and convolutional codes. The application of unequal error protection (UEP) strategies utilizing combinations of the aforementioned FEC methods is also presented. Finally, frame-based interleaving is discussed as an alternative to overcoming the effect of bursty channel erasures. The chapter concludes with examples of modern standards for channel coding strategies for distributed speech recognition (DSR).