Figure 2 - uploaded by Youn-Long Lin
Content may be subject to copyright.
Bypass Decoding Process

Bypass Decoding Process

Source publication
Conference Paper
Full-text available
We propose a hardware accelerator for context-based adaptive binary arithmetic decoding (CABAC) in H.264/AVC. We also propose an efficient memory system for easy integration with other components such as motion compensation and IDCT. We develop an efficient finite state machine so that our design can generate one bit every 2 to 3 clock cycles. Expe...

Similar publications

Conference Paper
Full-text available
Thanks to their flexibility, FPGAs are nowadays widely used to implement digital systems' prototypes and, more frequently, their final releases. Reconfiguration traditionally required an external controller to upload contents in the FPGA. Dynamic Partial Reconfiguration (DPR) opens new horizons in FPGAs' applications, providing many new utilization...
Conference Paper
Full-text available
In this paper, a novel human action/gesture recognition algorithm based on spatiotemporal gradients of moving points dealt with in 2D in the transform domain is presented. 2DPCA is used to obtain compact feature descriptor representing each action/gesture and Canonical correlation analysis is used to distinguishbetweentestingandtrainingdescriptor.T...
Conference Paper
Full-text available
In this paper, a novel human action/gesture recognition algorithm based on spatiotemporal gradients of moving points dealt with in 2D in the transform domain is presented. 2DPCA is used to obtain compact feature descriptor representing each action/gesture and Canonical correlation analysis is used to distinguishbetweentestingandtrainingdescriptor.T...
Article
Full-text available
This paper presents the implementation and evaluation of a computer vision task on a Field Programmable Gate Array (FPGA). As an experimental approach for an application-specific image-processing problem, it provides results about gained performance and precision compared with similar solutions on General Purpose Processor (GPP) architectures. The...
Article
Full-text available
The Hough Transform (HT) is a method for extracting straight lines from an edge image. The main limitations of the HT for usage in actual applications are computation time and storage requirements. This paper reports a hardware architecture for HT implementation on a Field Programmable Gate Array (FPGA) with parallelized voting procedure. The 2-dim...

Citations

... In the literature, there have been some designs proposed for realizing CABAC decoder in hardware architectures [4], [5], [7]. However, these designs do not support the important tools of macroblock adaptive frame field (MBAFF) coding and 8 × 8 transform coding, which are inevitable coding tools for HDTV applications using H.264 HP video. ...
... Table I shows the maximum, average, and minimum processing cycles per MB of the proposed CABAC decoder on decoding HD1080@30 frames/s video sequences. Table II shows the comparison results of the proposed design with the existing ones [4], [5], [7]. According to system verification, it takes 396 cycles to decode one MB on average for the HD1080 video with 60 Mbits/s, which is fast enough to support the H.264 HP@L4.1 video decoding. ...
... According to system verification, it takes 396 cycles to decode one MB on average for the HD1080 video with 60 Mbits/s, which is fast enough to support the H.264 HP@L4.1 video decoding. Compared to the designs [4], [5], [7], the proposed design owns better performance in terms of achieving about 1.43, 2.82, and 2.09 times of data throughput improvement, respectively. In addition, the proposed design reduces about 44% in the hardware cost as compared to the design [4]. ...
Article
In this letter we propose a high-throughput VLSI architecture design for H.264 high-profile context-based adaptive binary arithmatic coding (HP CABAC) decoding for HDTV applications. To speed up the inherent sequential CABAC decoding, we eliminate the bottleneck by proposing a look-ahead decision parsing technique on the grouped context table with cache registers, which reduces 62% of cycle count on average as compared with the original CABAC decoding. In addition, the proposed design supports the macroblock adaptive frame field coding tools in H.264 main profile coding and 8 times 8 transform in H.264 high-profile coding. It achieves the real-time processing for H.264 CABAC decoding up to L4.1@30 frames/s with maximum 60 Mbits/s when operating at 105 MHz.
... This makes it more difficult to design high performance CABAC decoding engines. Jian-Wen Chen [11] implemented a full hard-wired CABAC decoder, which could decode the CIF video in real time. Wei Yu [12] improved the decoding engine with several techniques, such as context model register grouping and multiple-bin post processing and so on. ...
... All header information decoding is processed in header decoding engine (HDE) with context model RAM. Traditional constant-bin-rate decoding scheme is adopted in HDE as [11] [12] and Fig. 3. The values of range and offset in HDE and MBAD are swapped when SE switching between the two engines. ...
Article
Full-text available
This paper presents an efficient VLSI architecture for H.264/AVC Content-Adaptive Binary Arithmetic Code (CABAC) decoding. We introduce several new techniques to maximize the parallelism of the decoding process, including variable-bin-rate strategy, multiple-bin arithmetic decoding and efficient probability propagation scheme. The CABAC engine can ensure the real-time decoding for H.264/AVC main profile HD level 4.0. Synthesis results show that the multi-bin decoder can be operated up to 45MHz, and the total logic area is only 42K gates when targeted at TSMC's 0.18um process.
... We first analyze the numbers of cycles needed for our previous decoder [6] to decode each type of syntax elements in a macroblock. We use a test sequence " Mobile " in CIF resolution under QP = 28. ...
... Context Memory Build table CABAC controller Initial table Parameter Mem CABAC decoder Slice layer Macroblock layer Figure 2. Proposed CABAC decoder architecture We implement the initial table using a 1484x16-bit ROM and the context table a 399x7-bit two-port SRAM. Additionally, we use two single-port Coefficient memories (533x9-bit) and one two-port Mb_info memory [6]. The Coefficient memory is read by the IDCT module in a pingpong fashion. ...
Conference Paper
Full-text available
We present a high-performance hardwired context-based adaptive binary arithmetic decoder (CABAD) for H.264/AVC. Based on an analysis of decoding time for different types of syntax elements, we propose three parallel processing techniques. Our decoder takes 309 clock cycles to decode a typical I-type macroblock. It needs to run at only 45 MHz for 1080HD application. Therefore, our architecture is suitable for low power mobile applications
... This approach is also found in [13], which speeds up the decoding by using pointer chains to retrieve the context models for the subsequent bins and storing the intermediate results of context selection instead of the original reference macroblock data. Chen et al. [14] have proposed a hardware accelerator for CABAC decoding. However, there is no specific acceleration scheme except that the decoding is controlled by an optimized finite state machine (FSM). ...
Article
The decoding of context-based adaptive binary arithmetic coding (CABAC) imposes a heavy performance requirement on H.264/AVC decoding systems particularly for large-scale video sequences. As a simple approach of elevating the operating frequency is not sufficient to meet the performance requirement, this paper proposes an efficient approach to accelerate the decoding, which is effective under relatively low operating frequency. Since the CABAC decoding procedure is highly sequential and has strong data dependencies, it is difficult to exploit parallelism and pipeline schemes. The proposed approach resolves the difficulties by modifying the operation chain based on a thorough analysis, eventually enabling both parallel operations and pipelining. More specifically, 1) several context models are simultaneously loaded from memory while context selection is performed in parallel and 2) bin-level pipelining is enabled by employing a small storage to remove structural hazards and data dependencies. Experimental results show that the proposed approach leads to the real-time decoding of HD sequences
... The high data dependency among the three steps limits the possibility of parallelism exploiting. The first implemented CABAC decoder introduced by Jian-Wen Chen [3] can only process CIF resolution. Wei proposed post-processing and register-based context model [2], which can process the D1 stream in real-time. ...
... II. PROPOSED SCHEME Traditional CABAC engines [2]- [3] are bin-rate limited, which process arithmetic decoding at fix bin rate. H.264/AVC defines the maximum bin rate of 268Mbins/s for high definition level 4.0 (1080i@30Hz). ...
... In addition, bypass decoding process is efficiently embedded in the MBAD, only one MUX and one AND gate are added for one bin. And line-bit-rate strategy can replace the complex renormalization process of traditional bin-by-bin schemes [2]- [3] with only a bin valid signal t 0 . ...
Conference Paper
Full-text available
This paper presents an efficient VLSI architecture for H.264/AVC CABAC decoding. We introduce several new techniques to extremely exploit, to the largest extent possible, the parallelism of the decoding process, including line-bit-rate decoding, multiple bin arithmetic decoding and efficient probability propagation scheme. The CABAC engine can ensure the real-time decoding for H.264/AVC main profile HD level 4.0. synthesis results show that the multi-bin decoder can run up to 45 MHz, and the total area is only 42K gates.
... Combining all the techniques mentioned above on optimizing the CABAC decoding flow, there is about 57% of cycle count reduction in realizing the CABAD. Compared to the design [15], the proposed CABAD owns 35% reduction in the hardware cost and 3 times speed-up in data throughput rate. The residual data decoded by the proposed CAVLC and CABAC decoder will be reordered from zigzag scan order to raster scan order. ...
Article
In this paper, a low-cost H.264/AVC video decoder design is presented for high definition television (HDTV) applications. Through optimization from algorithmic and architectural perspectives, the proposed design can achieve real-time H.264 video decoding on HD1080 video (1920 times 1088@30 Hz) when operating at 120 MHz with 320 mW power dissipation. Fabricated by using the TSMC one-poly six-metal 0.18 mum CMOS technology, the proposed design occupies 2.9times2.9 mm<sup>2</sup> silicon area with the hardware complexity of 160K gates and 4.5K bytes of local memory
... The table also shows the required operating frequency. It takes three cycles to process one bit in the CABAC de- coder [7] if a dedicated hardware is designed. In a case of building a CABAC decoder onto an MB pipeline, 2.35 GHz is required, while in a case of synchronizing a CABAC coder with a frame, 585 MHz is needed but this frequency is still infeasible (Fig. 6(a)). ...
Article
Full-text available
SUMMARY We propose an elastic pipeline that can apply dynamic voltage scaling (DVS) to hardwired logic circuits. In order to demon- strate its feasibility, a hardwired H.264/AVC HDTV decoder is designed as a real-time application. An entropy decoding process is divided into context-based adaptive binary arithmetic coding (CABAC) and syntax el- ement decoding (SED), which has advantages of smoothing workload for CABAC and keeping efficiency of the elastic pipeline. An operating fre- quency and supply voltage are dynamically modulated every slot depend- ing on workload of H.264 decoding to minimize power. We optimize the number of slots per frame to enhance power reduction. The proposed de- coder achieves a power reduction of 50% in a 90-nm process technology,
... System-on-Chip (SoC) implementation was mentioned as a forward-looking statement but not implemented. Chen et al. [3] proposed a hardware CABAC decoder with buffer for storing the syntax element contents of 24 neighboring MBs, and intended to implement this design as an IP block. ...
Conference Paper
Abstract—In this paper, we propose a system-on-chip software hardware co-design methodology for a statistical coder. We use the Context Adaptive Binary Arithmetic Coder (CABAC) used in the Main profile of the H.264/AVC video coding standard as a design example. The design methodology first involves performance and complexity analyses of the existing CABAC reference software, and thus the top-level CABAC software hardware architecture can be conceptualized. The design is aimed to strike a balance between software modules and hardware modules based on design constraints. Verification is performed by comparing the compressed bit stream generated by the reference CABAC SW (without any HW assisted circuitries), with that output by the top-level CABAC architecture (with HW assisted circuitries). Standard video test sequences have been used for verification purpose. The CABAC architecture is then put within the system-on-chip frame work where system bus and its signals, input/output FIFO buffers, debug structures, reset circuit, etc. are designed into. Compared to existing statistical coders, this design is aimed for significant coding time saving by balancing timing between software modules and hardware modules, is well verified with standard video test sequences, and is reusable as an IP in a SoC environment.
... The inherent sequential data dependency in the CABAD severely limits the data throughput rate, which imposes difficulties in achieving real-time H.264 video decoding on the HD format videos like HD1080@30fps. It normally takes 3 clock cycles to decode one-bin of CABAD codeword, which has imposed a severe processing bottleneck in the existing CABAD design [3]. According to our observation, the data dependency mentioned above can be first released by adopting the design concept of pipelining in decoding CABAD codeword, which can improve the data throughput rate to be decoding one-bin of CABAD codeword in two cycles. ...
... This performance can meet the real-time processing requirement in H.264 video decoding on HD1080 video. As compared to the existing design [3], the proposed design both owns 40% reduction of hardware cost and possesses over 1.6 times data throughput improvement. The rest of this paper is organized as follows. ...
... The proposed design operates at 120MHz with the cost of 83,157 gates in total, including IDS and all the context memories. We have integrated the proposed design in a H.264 BP/MP video decoder system for system and FPGA verification, which passes over hundred testing sequences including the conformance sequences from JVT [6] and those generated by H.264 reference software encoder JM93 [7].Table 1 shows the comparison results of the proposed design with the existing one [3]. According to our system verification, it respectively takes about 463 cycles, 308 cycles and 254 cycles to decode one MB in I slices (with qp=36), P and B slice (with qp=26), which achieves the real-time CABAD on HD1080i videos. ...
Conference Paper
Full-text available
In this paper we present a high throughput VLSI architecture design for context-based adaptive binary arithmetic decoding (CABAD) in MPEG-4 AVC/H.264. To speed-up the inherent sequential operations in CABAD, we break down the processing bottleneck by proposing a look-ahead codeword parsing technique on the segmenting context tables with cache registers, which averagely reduces up to 53% of cycle count. Based on a 0.18 mum CMOS technology, the proposed design outperforms the existing design by both reducing 40% of hardware cost and achieving about 1.6 times data throughput at the same time