H.264 Decoder Block Diagram

Source publication

H.264 Decoder: A Case Study in Multiple Design Points

Conference Paper

Full-text available

Jul 2008

H.264, a state-of-the-art video compression standard, is used across a range of products from cellphones to HDTV. These products have vastly different performance, power and cost requirements, necessitating different hardware-software solutions for H.264 decoding. We show that a design methodology and associated tools which support synthesis from h...

Context 1

... a coded frame, slices, or groups of macroblocks, may be intrapre- dicted, interpredicted from the previous frame, or interpre- dicted from multiple reference frames. Figure 1 shows a block diagram of our H.264 decoder. ...

View in full-text

Context 2

... implementation closely models the block diagram for the CODEC shown in Figure 1. To keep the design as flexible as possible each block was organized to sup- port latency-insensitive communications. ...

View in full-text

Case study of an HEVC decoder application using high-level synthesis: intraprediction, dequantization, and inverse transform blocks

Article

Full-text available

May 2019
J ELECTRON IMAGING

Due to the increasing need for testing solutions to complex hardware designs, several efforts were made in order to improve high-level synthesis (HLS) techniques. These solutions are conceived in such a way that they had to provide reasonable agreements in terms of design time, resources involved, and performance. Generally speaking, two main constraints should be satisfied for HLS applications. The first constraint consists in the ability to process complex systems at a reasonable cost, whereas the second revolves around considering some test constraints in the first tasks of the HLS flow. To fulfill these two constraints, we treated a case study using HLS for intraprediction, dequantization, and inverse transform decoding blocks of an high efficiency video coding (HEVC) decoder. For this experiment, version 10 was used of the HEVC test model (HM) reference software containing >200 functions and over 8000 lines of code. In addition, the suggested algorithm was implemented in an software/hardware (SW/HW) environment using Xilinx ZC 702-based platform. Finally, taking advantage of HLS optimization methods, the hardware design can process 6, 13, 71, and 285 video frames per second for 1600p, 1080p, 480p, and 240p video resolutions, respectively. Yet, the SW/HW designs can only decode 0.5, 1.5, 4, and 15.2 frames per second for the same video resolutions, i.e., with a gain of 3% in frame rate and 60% in power consumption compared to SW implementation.

IBBB Inter Frame Prediction of H.264 /AVC

Article

Full-text available

Sep 2018

As the world expanded around us and increased the popularity of the Internet by sending receiving, uploading or downloading the high definition videos, it was necessary to use a good technology to reduce the size of the dedicated video and specialized high-quality one. If the videos are send or receive, they need a wide bandwidth to capture this amount of information in the video. Based on the above, the H.264/AVC is a good technology that gives great results for encoding and decoding videos. This technology was developed jointly by (ITU-T) International Telecommunication Union–Telecommunication Standardization, and (ISO) International Organization for Standardization. Our work involves applying the encoding and decoding process of the standard using MATLAB (2013Ra) program. The work is focusing in inter frame prediction using the (IBBB) frame pattern. The video that was subjected to encoding and decoding processing was (Xylophone video name) with (240X320) size and (30f/sec) as a bit rate.

High Level Synthesis of Complex Applications: An H.264 Video Decoder

Conference Paper

Full-text available

Feb 2016

High level synthesis (HLS) is gaining wider acceptance for hardware design due to its higher productivity and better design space exploration features. In recent years, HLS techniques and design flows have also advanced significantly, and as a result, many new FPGA designs are developed with HLS. However, despite many studies using HLS, the size and complexity of such applications remain generally small, and it is not well understood how to design and optimize for HLS with large, complex reference code. Typical HLS benchmark applications contain somewhere between 100 to 1400 lines of code and about 20 sub-functions, but typical input applications may contain many times more code and functions. To study such complex applications, we present a case study using HLS for a full H.264 decoder: an application with over 6000 lines of code and over 100 functions. We share our experience on code conversion for synthesizabil-ity, various HLS optimizations, HLS limitations while dealing with complex input code, and general design insights. Through our optimization process, we achieve 34 frames/s at 640x480 resolution (480p). To enable future study and benefit the research community, we open-source our synthe-sizable H.264 implementation.

FlashBench: A workbench for a rapid development of flash-based storage devices

Conference Paper

Full-text available

Oct 2012

As the cell size of NAND flash memory is shrinking, its physical characteristics such as performance and lifetime are significantly degraded. As effective solutions of overcoming such poor physical characteristics, more cross-layer system-level approaches (such as compression and deduplication techniques) are expected to be developed. These system-level techniques typically employ intelligent software algorithms supported by specialized hardware accelerators. Using hardware accelerators combined with sophisticated software algorithms greatly increases the design complexity of flash-based storage devices. However, existing storage design environments are not adequate enough to handle this increased design complexity in a timely and efficient manner. To address this new challenge, we propose a novel storage development environment, called FlashBench, that helps developers to build high-complexity storage solutions quickly. FlashBench is designed to provide a generic framework for the rapid development and validation of storage software/hardware algorithms by supporting multi-level design environments, specifically optimized for seamless hardware/software cross-layer integrations. Our case study demonstrates that FlashBench enables developers to implement high-complexity flash devices with specialized optimization functions in a shorter development time over traditional design environments.

Leveraging latency-insensitivity to ease multiple FPGA design

Conference Paper

Full-text available

Feb 2012

Traditionally, hardware designs partitioned across multiple FPGAs have had low performance due to the inefficiency of maintaining cycle-by-cycle timing among discrete FPGAs. In this paper, we present a mechanism by which complex designs may be efficiently and automatically partitioned among multiple FPGAs using explicitly programmed latency-insensitive links. We describe the automatic synthesis of an area efficient, high performance network for routing these inter-FPGA links. By mapping a diverse set of large research prototypes onto a multiple FPGA platform, we demonstrate that our tool obtains significant gains in design feasibility, compilation time, and even wall-clock performance.

Survey on H.264 Standard

Conference Paper

Full-text available

Jan 2012

The progress of science and technology demands multimedia applications to be realized on embedded systems as it involves transfer of large amounts of data. Compared with standards such as MPEG-2, MPEG-4 Visual, H.264 can deliver better image quality at the same compressed bit rate or at a lower bit rate. The increase in compression efficiency and flexibility come at the expense of increase in complexity, which is a fact that must be overcome. Therefore, an efficient Co-design methodology is required, where the encoder software application is highly optimized and structured in a very modular and efficient manner, so as to allow its most complex and time consuming operations to be offloaded to dedicated hardware accelerators. This paper provides an overview of the features of H.264 and surveys the emerging studies related to new coding features of the standard. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.

Verification of microarchitectural refinements in rule-based systems

Article

Full-text available

Jul 2011

Microarchitectural refinements are often required to meet performance, area, or timing constraints when designing complex digital systems. While refinements are often straightforward to implement, it is difficult to formally specify the conditions of correctness for those which change cycle-level timing. As a result, in the later stages of design only those changes are considered that do not affect timing and whose verification can be automated using tools for checking FSM equivalence. This excludes an essential class of microarchitectural changes, such as the insertion of a register in a long combinational path to meet timing. A design methodology based on guarded atomic actions, or rules, offers an opportunity to raise the notion of correctness to a more abstract level. In rule-based systems, many useful refinements can be expressed simply by breaking a single rule into smaller rules which execute the original operation in multiple steps. Since the smaller rule executions can be interleaved with other rules, the verification task is to determine that no new behaviors have been introduced. We formalize this notion of correctness and present a tool based on SMT solvers that can automatically prove that a refinement is correct, or provide concrete information as to why it is not correct. With this tool, a larger class of refinements at all stages of the design process can be verified easily. We demonstrate the use of our tool in proving the correctness of the refinement of a processor pipeline from four stages to five.

System Level Power Estimation of System-on-Chip Interconnects in Consideration of Transition Activity and Crosstalk

Conference Paper

Jan 2011

As technology reaches nanoscale order, interconnection systems account for the largest part of power consumption in Systems-on-Chip. Hence, an early and sufficiently accurate power estimation technique is needed for making the right design decisions. In this paper we present a method for system-level power estimation of interconnection fabrics in Systems-on-Chip. Estimations with simple average assumptions regarding the data stream are compared against estimations considering bit level statistics in order to include low level effects like activity factors and crosstalk capacitances. By examining different data patterns and traces of a video decoding system as a realistic example, we found that the data dependent effects are not negligible influences on power consumption in the interconnection system of nanoscale chips. Due to the use of statistical data there is no degradation of simulation speed in our approach.

LEAP Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic [Extended Version]

Article

Full-text available

Dec 2010

Developers accelerating applications on FPGAs or other re-configurable logic have nothing but raw memory devices in their standard toolkits. Each project typically includes tedious development of single-use memory management. Soft-ware developers expect a programming environment to in-clude automatic memory management. Virtual memory pro-vides the illusion of very large arrays and processor caches reduce access latency without explicit programmer instruc-tions. LEAP scratchpads for reconfigurable logic dynamically allocate and manage multiple, independent, memory arrays in a large backing store. Scratchpad accesses are cached au-tomatically in multiple levels, ranging from shared on-board, RAM-based, set-associative caches to private caches stored in FPGA RAM blocks. In the LEAP framework, scratch-pads share the same interface as on-die RAM blocks and are plug-in replacements. Additional libraries support heap management within a storage set. Like software developers, accelerator authors using scratchpads may focus more on core algorithms and less on memory management. Two uses of FPGA scratchpads are analyzed: buffer man-agement in an H.264 decoder and memory management within a processor microarchitecture timing model.

BlueSSD: An Open Platform for Cross-layer Experiments for NAND Flash-based SSDs

Article

Full-text available

Jun 2010

In this paper we describe BlueSSD, an open platform for exploring hardware and software for NAND flash-based SSD architectures. We introduce the overall architecture of BlueSSD from a hardware and software perspective and briefly explain our design methodology. Preliminary evaluation shows that BlueSSD delivers performance comparable to commercially available SSDs.

H.264 Decoder Block Diagram

Contexts in source publication

Citations