Fig 4 - uploaded by Maoshen Jia
Content may be subject to copyright.
High-level block diagram of the decoder  

High-level block diagram of the decoder  

Source publication
Article
Full-text available
A multi-layer embedded speech and audio coding algorithm based on bit-plane coding and Scalar Quantized Vector Huffman Coding (SQVH) is proposed in this paper. In this codec the signal sampled at 32 kHz can be coded in terms of scalable bit rates. The core codec is International Telecommunication Union Telecommunication Standardization Sector (ITU-...

Citations

... Hence, the T(f) should be discretized and converted into the MDCT domain. The whole processing procedure includes two steps: inverse time-frequency transform and MDCT [26]. After these operations, absolute auditory masking threshold in the MDCT domain is denoted as Tmdct (l) (dB expression), where l = 1, 2, …, L. ...
... SQVH is a kind of efficient transform coding method which is used in fixed bitrate codec [26][27][28]. In this section, SQVH with variable bitrate for encoding downmix signal is designed and described as follows. ...
Article
Full-text available
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded.
... The secondary significant TF instants are contained within the 2nd stage, etc. Thereafter, the preserved TF instants along with their original information are multiplied by a sensing matrix to form the observation signals, which can be further encoded via the Scalar Quantized Vector Huffman Coding (SQVH) [11] for transmission. In the decoding phase, the preserved TF instants can be attained via the Compressed Sensing (CS) techniques [12,13], i.e., solving the l 1 -norm minimization problems with respect to the received observation signals. ...
... Specifically, the observation matrix s Y corresponding to s is attained through: (11) , ss Y where ˆ represents the sensing matrix with size D×L. The problem in choosing the type of the matrix ˆ and determining the number of sensing measurements D will be discussed in the next subsection. ...
... The observation signal s y can be further encoded by the SQVH[11] described as follows.The vector s y is decomposed into two parts, i.e., the sign These two parts are processed separately.To quantize the magnitude, the vector is divided into W subvectors, where each subvector contains B coefficients, i.e.,(15) ; Q step and Q offset represent the quantization step size and the offset, respectively; Q max represents the upper bound of the quantization index.We group the R quantization indices together to form a vectorAfter that, ind s q is quantized via the vector Huffman coding. The W root-mean-square values are expanded to form a R-dimensional vector ...
Article
Full-text available
Object-based audio techniques have become common since they provide the flexibility for personalized rendering. In this paper a multi-stage encoding scheme for multiple audio objects is proposed. The scheme is based on intra-object sparsity. In the encoding phase the dominant Time Frequency (TF) instants of all active object signals are extracted and divided into several stages to form the multi-stage observation signals for transmission. In the decoding phase the preserved TF instants are recovered via Compressed Sensing (CS) technique, and further used for reconstructing the audio objects. The evaluations validated that the proposed encoding scheme can achieve scalable transmission while maintaining perceptual quality of each audio object.
... Define the matrix X: 11 ...
... For the superwideband codec, the input signal sampled at 32 kHz. So the super-wideband embedded codec which is proposed in [11] is selected for left channel signal processing instead of G.729. [12] which is requested by ITU-T for the candidate stereo codec is listed in table 2: The objective performance test of the proposed codec and reference codec is compared by Objective Difference Grade (ODG) scores using Perceptual Evaluation of Audio Quality (PEAQ) according to ITU-R BS.1387 [13] . The PEAQ test compares the perceptual difference between the processed signal and the original one. ...
Conference Paper
Full-text available
In this paper a compressive sampling method of MLT coefficients which is used for extracting stereo information is adopted based on principal component analysis (PCA) and Modulated Lapped Transform (MLT). With this method, an embedded variable bit-rates stereo speech and audio coding algorithm is proposed in this paper. In this codec, the stereo signal sampled at 32 kHz and 16 kHz can be coded in terms of scalable bit rates, the structure of bit-stream is embedded and the bit-stream can be divided into several layers. The core codec is ITU-T G.729.1 which can process mono signal with 7 kHz bandwidth. Besides there are four extra bit-rates added include 40, 48, 56, and 64kb/s. The maximum bit-rates of wideband stereo signal and super-wideband stereo signal are 48kb/s and 64kb/s, respectively. The objective and subjective test results show that the quality of the proposed codec is no worse than the reference codec which is requested by ITU-T.