MPEG-4 AAC decoder block diagram.

Source publication

Fast implementation of the MPEG-4 AAC main and low complexity decoder

Conference Paper

Full-text available

Jun 2004

We present a fast software implementation of the MPEG-4 AAC (advanced audio coding) main and low complexity (LC) decoder. The reference implementation is analyzed and selected algorithms are presented to improve the performance of most of its building blocks, i.e., bitstream de-formatter, noiseless decoding, prediction, and filterbank. The code is...

Context 1

... noise is carefully distributed over frequency bands so that it is masked by the signal energy. AAC supports up to 48 main audio channels with sample rates that range from 8 kHz to 96 kHz. Different trade-offs between quality and complexity are provided by three profiles: Main pro- file, Low Complexity profile, and Scalable Sample Rate profile. Fig. 1 shows the arrangement of the building blocks of an MPEG-4 Main decoder [9]. They are here briefly illustrated to understand the optimization process described in Section ...

View in full-text

High throughput SAFT for an experimental USCT system as MATLAB implementation with use of SIMD CPU instructions - art. no. 692010

Article

Full-text available

Feb 2008

At Forschungszentrum Karlsruhe an Ultrasound Computer Tomography system USCT) is under development for early breast cancer detection. To detect morphological indicators in sub-millimeter resolution, the visualization is based on a SAFT algorithm (synthetic aperture focusing technique). The current 3D demonstrator system consists of approx. 2000 tra...

A hybrid parametric-waveform approach to bit stream scalable audio coding

Conference Paper

Full-text available

Dec 2004

In this paper we present a wideband (44.1 kHz sampling rate) audio and speech coder that combines two different strategies, namely, parametric and waveform coding. It is shown how this approach can be used to design a layered bit stream scalable coder offering a wide variety of decoding bit rates with little scalability loss. Moreover, the bit rate...

POMP or How to Design a Massively Parallel Machine with Small Developments

Chapter

Full-text available

Oct 2006

The design of a SIMD machine is usually complex because it leads to developing an efficient Processing Element and to writing all the softwares required by the chip and the control of the machine. We propose a different approach by using an efficient 32-bit off-the-shelf processor with its software environment (compiler and assembler) and a program...

Simulink based low power Mpeg-4 AAC Audio encoder and decoder

Conference Paper

Apr 2015

Processing of Range Query Using SIMD and GPU

Chapter

Jan 2013

Onedimensional or multidimensional range query is one of the most important query of physical implementation of DBMS. The number of compared items (of a data structure) can be enormous especially for lower selectivity of the range query. The number of compare operations increases for more complex items (or tuples) with the longer length, e.g. words stored in a B-tree. Due to the possibly high number of compare operations executed during the range query processing, we can take into account hardware devices providing a parallel task computation like CPU’s SIMD or GPU. In this paper, we show the performance and scalability of sequential, index, CPU’s SIMD, and GPU variants of the range query algorithm. These results make possible a future integration of these computation devices into a DBMS kernel.

Fast parallel implementation of real-time electronic digital image stabilization system

Article

Jul 2006

Electronic Digital Image Stabilization (EDIS) system has a large amount of data and intensive computation. The real-time implementation of EDIS system entails rapid processing of data. The characteristics of video and image series process are the data high parallelism and repeating computation. According to the requirements and characteristics of system, using efficient C++ with inline SIMD (Single Instruction Multiple Data) and multi-thread programming to achieve real-time performance of the system in conventional PC. The block-matching motion estimation, with SAD (Sum of Absolute Differences) criterion, with modified three-step search strategy in conjunction with diamond search technology, is used to reduce computation and accelerate the execution. The stabilization system uses Kalman filter to remove high-frequency image jitter with retained smooth global movements. The results of test show that it is possible to implement an efficient and robust real-time stabilization system in conventional PC.

Processing of Multidimensional Range Query Using SIMD Instructions

Conference Paper

Nov 2011

Current main stream CPUs provide SIMD (Single Instruction Multiple Data) computational capabilities. Although producers of current hardware provide other computational capabilities like multi-cores CPU, GPU or APU, an important feature of SIMD is that it provides parallel operations for one CPU's core. In previous works, authors introduced an utilization of the SIMD instructions in some indexing data structures like B-tree. Since multidimensional data structures manage n-dimensional tuples or rectangles, the utilization of these instructions seems to be straightforward in operations manipulating these n-dimensional objects. In this article, we show the utilization of SIMD in the R-tree data structure. Since the range query is one of the most important operation of multidimensional data structures, we suppose the utilization of SIMD in range query processing. Moreover, we show properties and scalability of this solution. We show that the SIMD range query algorithm is up-to 2× faster then the conventional algorithm.

Fixed-point Processing Optimization of MPEG Psychoacoustic Model-II Algorithm for ASIC Implementation

Article

Jan 2004

The psychoacoustic model in MPEG audio layer-III (MP3) encoder is optimized for the fixed-point processing. The optimization process consists of determining the data word length of arithmetic unit and the algorithm for transcendental functions that are often used in the psychoacoustic model. In order to determine the data word length, we defined a statistical model expressing the relation between the fixed-point operation errors of the psychoacoustic model and the probability of alteration of the allocated bits doe to these errors. Based on the simulations using this model, we chose a 24-bit data path and constructed a 24-bit fixed-point MP3 encoder. Sound quality tests using the constructed fixed-point encoder showed a mean degradation of -0.2 on ITU-R 5-point audio impairment scale.

Reed-Solomon Decoder Optimization for PC-Based DVB-T Software Radio Receiver

Article

Full-text available

Jan 2011

This paper presents an efficient way to implement a software Reed-Solomon (RS) decoder. We use lookup tables, Single Instruction Multiple Data (SIMD) parallel processing instruction sets, and loop expansion, etc. to implement a software RS decoder in the Intel Central Processing Unit (CPU) platform. Our software RS decoder achieves the decoding speed of 68 MB/sec (350 k RS packets/sec). The Digital Video Broadcasting Terrestrial (DVB-T) used in Taiwan needs 6617 RS packets/sec to achieve real-time reception; thus, our implementation of a software RS decoder requires only 1.89 percent CPU loading.

Real-time electronic digital image stabilization system based on multi-core computation

Article

Aug 2009

Yubin Zhou

The multi-core computing has become the new trend of computer industry. Electronic digital image stabilization (EDIS) system involves a large amount of data and intensive computation. The real-time implementation of EDIS system entails rapid processing of data with high parallelism and repeating computation, which are suitable for parallel data processing of image sequences. The algorithm solution was proposed to deal with high speed image stabilization to meet the requirements of real-time application. According to the data processing characteristics, efficient C++ with inline SIMD (single instruction multiple data) and multi-thread programming are used on dual-core PC for real parallel running threads. After the motion vectors acquired by fast searching strategy, adaptive multi-local motions remove the moving objects interference. And the Kalman filter removes high-frequency image jitter motion while smooth global movements are retained. The test results show that it is possible to achieve a robust high efficient performance and real-time processing for stabilization system on conventional PC.

Optimization of fixed-point operation of MPEG-2 AAC decoder

Article

Jul 2007

This paper presents a fast fixed-point implementation of MPEG-2 AAC-LC (Advanced Audio Coding, Low Complexity profile defined in ISO/IEC13818-7) audio decoder. By analyzing the decoding algorithm, we propose tuned optimization strategies for improving the decoding speed of the main modules of the decoder. After these optimizations, the designed fixed-point AAC-LC decoder is proved to be 20% faster than the reference decoder, FAAD2, an open source AAC. The decoding result also illustrates the high quality as the reference.

Software Viterbi Decoder with SSE4 Parallel Processing Instructions for Software DVB-T Receiver

Conference Paper

Full-text available

Sep 2009

In this paper, we discuss the procedures how to make Viterbi decoder faster. The implementation in Intel CPU with SSE4 parallel processing instruction sets and some other methods achieves the decoding speed 47.05 Mbps (0.64 Mbps originally). The DVB-T mode used in Taiwan needs 13.27 Mbps to achieve real-time reception, so our implementation of software Viterbi decoder takes only 28% CPU loading.

Dual-mode floating-point adder architectures

Article

Dec 2008
J SYST ARCHITECT

Ahmet Akkaş

Most modern microprocessors provide multiple identical functional units to increase performance. This paper presents dual-mode floating-point adder architectures that support one higher precision addition and two parallel lower precision additions. A double precision floating-point adder implemented with the improved single-path algorithm is modified to design a dual-mode double precision floating-point adder that supports both one double precision addition and two parallel single precision additions. A similar technique is used to design a dual-mode quadruple precision floating-point adder that implements the two-path algorithm. The dual-mode quadruple precision floating-point adder supports one quadruple precision and two parallel double precision additions. To estimate area and worst-case delay, double, quadruple, dual-mode double, and dual-mode quadruple precision floating-point adders are implemented in VHDL using the improved single-path and the two-path floating-point addition algorithms. The correctness of all the designs is tested and verified through extensive simulation. Synthesis results show that dual-mode double and dual-mode quadruple precision adders designed with the improved single-path algorithm require roughly 26% more area and 10% more delay than double and quadruple precision adders designed with the same algorithm. Synthesis results obtained for adders designed with the two-path algorithm show that dual-mode double and dual-mode quadruple precision adders requires 33% and 35% more area and 13% and 18% more delay than double and quadruple precision adders, respectively.

MPEG-4 AAC decoder block diagram.

Context in source publication

Similar publications

Citations