Fig 1 - uploaded by A. Rinotti
Content may be subject to copyright.
MPEG-4 AAC decoder block diagram.  

MPEG-4 AAC decoder block diagram.  

Source publication
Conference Paper
Full-text available
We present a fast software implementation of the MPEG-4 AAC (advanced audio coding) main and low complexity (LC) decoder. The reference implementation is analyzed and selected algorithms are presented to improve the performance of most of its building blocks, i.e., bitstream de-formatter, noiseless decoding, prediction, and filterbank. The code is...

Context in source publication

Context 1
... noise is carefully distributed over frequency bands so that it is masked by the signal energy. AAC supports up to 48 main audio channels with sample rates that range from 8 kHz to 96 kHz. Different trade-offs between quality and complexity are provided by three profiles: Main pro- file, Low Complexity profile, and Scalable Sample Rate profile. Fig. 1 shows the arrangement of the building blocks of an MPEG-4 Main decoder [9]. They are here briefly illustrated to understand the optimization process described in Section ...

Similar publications

Article
Full-text available
At Forschungszentrum Karlsruhe an Ultrasound Computer Tomography system USCT) is under development for early breast cancer detection. To detect morphological indicators in sub-millimeter resolution, the visualization is based on a SAFT algorithm (synthetic aperture focusing technique). The current 3D demonstrator system consists of approx. 2000 tra...
Conference Paper
Full-text available
In this paper we present a wideband (44.1 kHz sampling rate) audio and speech coder that combines two different strategies, namely, parametric and waveform coding. It is shown how this approach can be used to design a layered bit stream scalable coder offering a wide variety of decoding bit rates with little scalability loss. Moreover, the bit rate...
Chapter
Full-text available
The design of a SIMD machine is usually complex because it leads to developing an efficient Processing Element and to writing all the softwares required by the chip and the control of the machine. We propose a different approach by using an efficient 32-bit off-the-shelf processor with its software environment (compiler and assembler) and a program...

Citations

Chapter
Onedimensional or multidimensional range query is one of the most important query of physical implementation of DBMS. The number of compared items (of a data structure) can be enormous especially for lower selectivity of the range query. The number of compare operations increases for more complex items (or tuples) with the longer length, e.g. words stored in a B-tree. Due to the possibly high number of compare operations executed during the range query processing, we can take into account hardware devices providing a parallel task computation like CPU’s SIMD or GPU. In this paper, we show the performance and scalability of sequential, index, CPU’s SIMD, and GPU variants of the range query algorithm. These results make possible a future integration of these computation devices into a DBMS kernel.
Article
Electronic Digital Image Stabilization (EDIS) system has a large amount of data and intensive computation. The real-time implementation of EDIS system entails rapid processing of data. The characteristics of video and image series process are the data high parallelism and repeating computation. According to the requirements and characteristics of system, using efficient C++ with inline SIMD (Single Instruction Multiple Data) and multi-thread programming to achieve real-time performance of the system in conventional PC. The block-matching motion estimation, with SAD (Sum of Absolute Differences) criterion, with modified three-step search strategy in conjunction with diamond search technology, is used to reduce computation and accelerate the execution. The stabilization system uses Kalman filter to remove high-frequency image jitter with retained smooth global movements. The results of test show that it is possible to implement an efficient and robust real-time stabilization system in conventional PC.
Conference Paper
Current main stream CPUs provide SIMD (Single Instruction Multiple Data) computational capabilities. Although producers of current hardware provide other computational capabilities like multi-cores CPU, GPU or APU, an important feature of SIMD is that it provides parallel operations for one CPU's core. In previous works, authors introduced an utilization of the SIMD instructions in some indexing data structures like B-tree. Since multidimensional data structures manage n-dimensional tuples or rectangles, the utilization of these instructions seems to be straightforward in operations manipulating these n-dimensional objects. In this article, we show the utilization of SIMD in the R-tree data structure. Since the range query is one of the most important operation of multidimensional data structures, we suppose the utilization of SIMD in range query processing. Moreover, we show properties and scalability of this solution. We show that the SIMD range query algorithm is up-to 2× faster then the conventional algorithm.
Article
The psychoacoustic model in MPEG audio layer-III (MP3) encoder is optimized for the fixed-point processing. The optimization process consists of determining the data word length of arithmetic unit and the algorithm for transcendental functions that are often used in the psychoacoustic model. In order to determine the data word length, we defined a statistical model expressing the relation between the fixed-point operation errors of the psychoacoustic model and the probability of alteration of the allocated bits doe to these errors. Based on the simulations using this model, we chose a 24-bit data path and constructed a 24-bit fixed-point MP3 encoder. Sound quality tests using the constructed fixed-point encoder showed a mean degradation of -0.2 on ITU-R 5-point audio impairment scale.
Article
Full-text available
This paper presents an efficient way to implement a software Reed-Solomon (RS) decoder. We use lookup tables, Single Instruction Multiple Data (SIMD) parallel processing instruction sets, and loop expansion, etc. to implement a software RS decoder in the Intel Central Processing Unit (CPU) platform. Our software RS decoder achieves the decoding speed of 68 MB/sec (350 k RS packets/sec). The Digital Video Broadcasting Terrestrial (DVB-T) used in Taiwan needs 6617 RS packets/sec to achieve real-time reception; thus, our implementation of a software RS decoder requires only 1.89 percent CPU loading.
Article
The multi-core computing has become the new trend of computer industry. Electronic digital image stabilization (EDIS) system involves a large amount of data and intensive computation. The real-time implementation of EDIS system entails rapid processing of data with high parallelism and repeating computation, which are suitable for parallel data processing of image sequences. The algorithm solution was proposed to deal with high speed image stabilization to meet the requirements of real-time application. According to the data processing characteristics, efficient C++ with inline SIMD (single instruction multiple data) and multi-thread programming are used on dual-core PC for real parallel running threads. After the motion vectors acquired by fast searching strategy, adaptive multi-local motions remove the moving objects interference. And the Kalman filter removes high-frequency image jitter motion while smooth global movements are retained. The test results show that it is possible to achieve a robust high efficient performance and real-time processing for stabilization system on conventional PC.
Article
This paper presents a fast fixed-point implementation of MPEG-2 AAC-LC (Advanced Audio Coding, Low Complexity profile defined in ISO/IEC13818-7) audio decoder. By analyzing the decoding algorithm, we propose tuned optimization strategies for improving the decoding speed of the main modules of the decoder. After these optimizations, the designed fixed-point AAC-LC decoder is proved to be 20% faster than the reference decoder, FAAD2, an open source AAC. The decoding result also illustrates the high quality as the reference.
Conference Paper
Full-text available
In this paper, we discuss the procedures how to make Viterbi decoder faster. The implementation in Intel CPU with SSE4 parallel processing instruction sets and some other methods achieves the decoding speed 47.05 Mbps (0.64 Mbps originally). The DVB-T mode used in Taiwan needs 13.27 Mbps to achieve real-time reception, so our implementation of software Viterbi decoder takes only 28% CPU loading.
Article
Most modern microprocessors provide multiple identical functional units to increase performance. This paper presents dual-mode floating-point adder architectures that support one higher precision addition and two parallel lower precision additions. A double precision floating-point adder implemented with the improved single-path algorithm is modified to design a dual-mode double precision floating-point adder that supports both one double precision addition and two parallel single precision additions. A similar technique is used to design a dual-mode quadruple precision floating-point adder that implements the two-path algorithm. The dual-mode quadruple precision floating-point adder supports one quadruple precision and two parallel double precision additions. To estimate area and worst-case delay, double, quadruple, dual-mode double, and dual-mode quadruple precision floating-point adders are implemented in VHDL using the improved single-path and the two-path floating-point addition algorithms. The correctness of all the designs is tested and verified through extensive simulation. Synthesis results show that dual-mode double and dual-mode quadruple precision adders designed with the improved single-path algorithm require roughly 26% more area and 10% more delay than double and quadruple precision adders designed with the same algorithm. Synthesis results obtained for adders designed with the two-path algorithm show that dual-mode double and dual-mode quadruple precision adders requires 33% and 35% more area and 13% and 18% more delay than double and quadruple precision adders, respectively.