Block diagram of video codec.

Source publication

High-Throughput Deblocking Filter Architecture Using Quad Parallel Edge Filter for H.264 Video Coding Systems

Article

Full-text available

Jul 2019

With the increasing demand in electronic gadgets expecting better video quality for multimedia applications, various coding standards evolved for the past two decades and optimization on the architectures of the various modules used in the video codec is most popular. In this work, an efficient architecture for deblocking filter used to smoothen th...

Context 1

... are highly correlated spatially within an image frame and for a video, pixels are highly correlated with respect to time since the video is the collection of frames over a period of time. Hence a video data is both spatially and temporally correlated. Compression of the video data is done by exploiting the spatial and temporal redundancies. Fig. 1 shows the block diagram of a video codec. The various blocks involved in the codec are the transformation unit, quantization unit, inverse quantization unit, entropy coder, inverse transform unit, deblocking filter, motion estimation and motion compensation unit. Deblocking filter is one of the most critical units among the various ...

View in full-text

Context 2

... 0 , , p1 and q1 are given in (12), (13) and (14) 0 = Min(Max (−c 0 , , 0i ), c 0 ) (−c 1 , , p1i ), c 1 ) ...

View in full-text

Context 3

... the 4×4 sub-block. Four internal dual-port memory is used to store the vertically edge filtered data of a MB. Once the vertical filtering of all the edges are done, the data from the internal memory is fetched and filtered for horizontal edges. After horizontal edge filtering the filtered data is rearranged again to transpose the 4×4 block. Fig. 10 and Fig. 11 shows the operation of the filter module within the filter unit. During the filtering process, the weak filter modifies one or two pixels on either side of a 4×4 block edge and a strong filter modifies upto 3 pixels on either side of a 4×4 block edge as given in the below equations. The filtered data is then written to the ...

View in full-text

Context 4

... sub-block. Four internal dual-port memory is used to store the vertically edge filtered data of a MB. Once the vertical filtering of all the edges are done, the data from the internal memory is fetched and filtered for horizontal edges. After horizontal edge filtering the filtered data is rearranged again to transpose the 4×4 block. Fig. 10 and Fig. 11 shows the operation of the filter module within the filter unit. During the filtering process, the weak filter modifies one or two pixels on either side of a 4×4 block edge and a strong filter modifies upto 3 pixels on either side of a 4×4 block edge as given in the below equations. The filtered data is then written to the external ...

View in full-text

Context 5

... QPEDBF architecture is implemented using Verilog HDL and the functional verification is done by simulating the RTL using Modelsim ALTERA. The functional simulation shows that this architecture can filter a MB in 58 clock cycles. The data from the external memory is read in the order, as shown in Fig. 14. Each number in Fig. 14 indicates a 4×4 sub-block. The operation performed for each clock cycle is shown in Fig. 12 and Fig. 13. Initially, for the first four clock cycles, four 4×4 sub-blocks of reconstructed pixel data from the external memory is read and stored in the internal buffer. BS for each sub-edge is computed while the data ...

View in full-text

Context 6

View in full-text

Context 7

... verification is done by simulating the RTL using Modelsim ALTERA. The functional simulation shows that this architecture can filter a MB in 58 clock cycles. The data from the external memory is read in the order, as shown in Fig. 14. Each number in Fig. 14 indicates a 4×4 sub-block. The operation performed for each clock cycle is shown in Fig. 12 and Fig. 13. Initially, for the first four clock cycles, four 4×4 sub-blocks of reconstructed pixel data from the external memory is read and stored in the internal buffer. BS for each sub-edge is computed while the data is being read from the external memory and the computed BS value is fetched in the fourth clock cycle, and the filter is enabled ...

View in full-text

Steerable-Discrete-Cosine-Transform (SDCT): Hardware Implementation and Performance Analysis

Article

Full-text available

Mar 2020

In the last years, the need for new efficient video compression methods grown rapidly as frame resolution has increased dramatically. The Joint Collaborative Team on Video Coding (JCT-VC) effort produced in 2013 the H.265/High Efficiency Video Coding (HEVC) standard, which represents the state of the art in video coding standards. Nevertheless, in...

A Low-Complex Frame Rate Up-Conversion with Edge-Preserved Filtering

Article

Full-text available

Jan 2020

The improvement of resolution of digital video requires a continuous increase of computation invested into Frame Rate Up-Conversion (FRUC). In this paper, we combine the advantages of Edge-Preserved Filtering (EPF) and Bidirectional Motion Estimation (BME) in an attempt to reduce the computational complexity. The inaccuracy of BME results from the existing similar structures in the texture regions, which can be avoided by using EPF to remove the texture details of video frames. EPF filters out by the high-frequency components, so each video frame can be subsampled before BME, at the same time, with the least accuracy degradation. EPF also preserves the edges, which prevents the deformation of object in the process of subsampling. Besides, we use predictive search to reduce the redundant search points according to the local smoothness of Motion Vector Field (MVF) to speed up BME. The experimental results show that the proposed FRUC algorithm brings good objective and subjective qualities of the interpolated frames with a low computational complexity.

Methods to develop high throughput hardware architectures for HEVC Deblocking Filter using mixed pipelined-block processing techniques

Article

Mar 2022
MICROELECTRON J

This paper presents four highly efficient hardware architectures for Deblocking Filter (DBF) which can be used in High-Efficiency Video Coding (HEVC) encoder or decoder. Mixed pipelined and block processing techniques are used in these architectures to achieve high throughput while consuming minimum possible area. In these proposed architectures, the coding tree unit with 64 × 64 blocks of pixels in a frame are processed in the form of 32 × 32 blocks of pixels. Once the deblocking filtering process completes, these pixels will be stored in block memory via output buffer. Experimental results on Application Specific Integrated Circuit (ASIC) and Field Programmable Gate Array (FPGA) platforms demonstrate that the proposed hardware architecture has a throughput improvement of 69% to 83% at the expense of slightly increased gate count compared to the previously known architectures of DBF. With 180 nm technology library, the fastest architecture out of the four achieves a throughput of 162 FPS (Frames per second) at an operating frequency of 250MHz.

A fast integrated deblocking filter and sample-adaptive-offset parameter estimation architecture for HEVC

Article

Sep 2021
MICROPROCESS MICROSY

Low power hardware acceleration cores for integration into real-time High Efficiency Video Coding (HEVC) codec for smartphones, tablets, camcorders, and televisions are in great demand. This demand motivates one for an efficient approximation of important power-consuming modules of HEVC including in-loop filters. This paper presents a hardware-efficient implementation of integrated deblocking filter (DBF) and sample adaptive offset (SAO) parameter estimation architecture for 16×16, 32×32, and 64×64 coding tree units (CTU) in HEVC. When the architecture is extended to HEVC-Test-Model (HM) Software, the luminance peak-signal-to-noise-ratio gets increased by 0.02 decibel, and the execution time of DBF and SAO gets decreased by at most 35% and 38%, respectively while compared to the reference algorithm. Moreover, it delivers rate–distortion performance comparable to the HEVC standard and reports mean-squared-error, structural-similarity (SSIM) index, and multi-scale SSIM (MS-SSIM) index of values 0.15, 0.9984, and 1, respectively for 4K video sequences. The architecture consumes minimum power, area, and energy equal to 9.83 milliwatts, 162 kilo-gate-equivalents, and 44 picojoules, respectively while supporting up to forty-six 8K frames per second. Additionally, it reports 78% smaller area and requires 75% less clock-cycles-per-largest-coding-unit as compared to the separate implementation of DBF and SAO with reference to HEVC. Such designs with low power, area, and energy can be integrated into a real-time HEVC codec for portable HEVC-compliant consumer electronic devices.

Hardware Accelerator for Dual Standard Deblocking Filter

Conference Paper

Feb 2021

Block diagram of video codec.

Contexts in source publication

Similar publications

Citations