The architecture of a Ryzen processor.

Source publication

Complexity Analysis of a Versatile Video Coding Decoder over Embedded Systems and General Purpose Processors

Article

Full-text available

May 2021

The increase in high-quality video consumption requires increasingly efficient video coding algorithms. Versatile video coding (VVC) is the current state-of-the-art video coding standard. Compared to the previous video standard, high efficiency video coding (HEVC), VVC demands approximately 50% higher video compression while maintaining the same qu...

Context 1

... this study, two platforms were used: an AMD Ryzen Threadripper high performance processor [34] and an embedded NXSoC [22]. The architecture of the Ryzen processor is shown in Figure 3. This processor is based on Zen microarchitecture [35], which has a primary building block a core complex (CCX). ...

View in full-text

DCT Approximations Based on Chen's Factorization

Preprint

Full-text available

Jul 2022

In this paper, two 8-point multiplication-free DCT approximations based on the Chen's factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared wit...

Encoding flow of the intra-predicted luminance samples

The Low-Frequency Non-Separable Transform (LFNST) process at the...

Signal flow graph to compute LFNST-4x4 Matrix8 Y0 output

FPGA-based implementation of the VVC low-frequency non-separable transform

Article

Full-text available

May 2024

The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT...

The inside of the Frame_Decoding\documentclass[12pt]{minimal}...

Speed-up of the proposed model and of the OpenVVC decoder while using...

OpenVVC Decoder Parameterized and Interfaced Synchronous Dataflow (PiSDF) Model: Tile Based Parallelism

Article

Full-text available

Oct 2022

The emergence of the new video coding standard, Versatile Video Coding (VVC), has resulted in a 40-50% coding gain over its predecessor HEVC for the same visual quality. However, this is accompanied by a sharp increase in computational complexity. The emergence of the VVC standard and the increase in video resolution have exceeded the capacity of s...

8-Point 1D transform using MCM algorithm

8-Point 1D approximate transform architecture

Hardware implementation of PSO-based approximate DST transform for VVC standard

Article

Full-text available

Feb 2022

The H.266/Versatile Video Coding (VVC) standard, released in July 2020, has improved the encoder performance over the previous High Efficiency Video Coding (HEVC) with a significant increase in coding complexity. Enhancements on the transform module mainly involve the introduction of the Adaptive Multiple Transform (AMT) which has led to an additio...

Fig. 2: Illustration of tile partitioning: grid of 4 tiles labeled from...

Fig. 3: Block diagram of the OpenVVC decoder architecture .

Fig. 8: Decoding time-line in RA configuration with 2 decoding threads.

OpenVVC: a Lightweight Software Decoder for the Versatile Video Coding Standard

Preprint

Full-text available

May 2022

Performance analysis of optimized versatile video coding software decoders on embedded platforms

Article

Full-text available

Oct 2023

In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Therefore, in July 2020, the versatile video coding (VVC) standard was released, providing up to 50% bit-rate savings for the same video quality compared to its predecessor high-efficiency video coding (HEVC). However, these bit-rate savings come at the cost of high computational complexity, particularly for live applications and on resource-constrained embedded devices. This paper evaluates two optimized VVC software decoders, named OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. These decoders exploit optimization techniques such as data-level parallelism using single instruction multiple data (SIMD) instructions and functional-level parallelism using frame, tile, and slice-based parallelisms. Furthermore, a comparison of decoding runtime, energy, and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of full high-definition (FHD) resolution on the first platform using 8 cores and high-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of the average energy consumed: around 26 J and 15 J for the 8 cores and 4 cores platforms, respectively. Furthermore, OpenVVC showed better results regarding memory usage with a lower average maximum memory consumed during runtime than VVdeC.

GPU-based parallelisation of a versatile video coding adaptive loop filter in resource-constrained heterogeneous embedded platform

Article

Full-text available

Apr 2023

This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilised to maximise the degree of parallelism, making the programme capable of exploiting the GPU capabilities. The proposed approach enables to accelerate the ALF computation by an average of two times when compared to an already fully optimised version of the software decoder implementation over an embedded platform. Finally, this work presents an analysis of energy consumption, showing that the proposed methodology has a negligible impact on this key parameter.

GPU-based Parallelisation of a Versatile Video Coding Adaptive Loop Filter in Resource-Constrained Heterogeneous Embedded Platform

Preprint

Full-text available

Dec 2022

The computational load requirements of the algorithms integrating current video encoders and decoders makes it necessary to exploit the full capabilities of the hardware to achieve real-time performance, especially in embedded systems. Today, versatile video coding (VVC) is the state-of-the-art reference standard. VVC achieves a compression rate of up to 50\% compared to its predecessor, the high efficiency video coding (HEVC) standard. However, this improvement comes with a significant increase in the complexity of the involved algorithms. Embedded computing systems, such as those integrated in portable multimedia devices, have also increased their computational power. This is due not only to improvements in general-purpose processors and but also to the integration in the same chip architecture of other types of processors that serve as accelerators and offload the CPU. In particular, GPU-type processors are dominating commercial solutions in this sector. In this context, this paper presents a methodology to migrate the adaptive loop filtering (ALF) processing block of a VVC decoder to a GPU. The obtained experimental results show an average speedup of 2 for ALF when compared to an already fully optimised version of the software decoder implementation. In addition, such results show that the proposed parallelisation has a negligible impact in the power consumption of the decoder.

Implementation of a Real-Time Versatile Video Coding Decoder Based on VVdeC Over an Embedded Multicore Platform

Article

Aug 2022
IEEE T CONSUM ELECTR

The current state-of-the-art video coding standard versatile video coding (VVC) offers 50% bit rate savings over its predecessor high efficiency video coding (HEVC) while maintaining the same quality at the cost of a significant increase in the computational complexity. In this work, an Embedded General Purpose Processor (EGPP) with Single Instruction Multiple Data (SIMD) optimisations suite was used to accelerate an open-source software decoder for achieving real-time decoding over an embedded multi-core platform with 8 cores. The decoding speedup has been analyzed with different number of threads/cores, showing that performance increases linearly. This suggests that multi-core based performance for this solution did not reach the saturation point with 8 cores and more parallelism is possible using a higher number of cores. Thanks to the optimization process, ×2 and ×2.4 speedups have been obtained in the decoding time for All Intra and Random Access sequences, respectively. With these results, real-time decoding is achieved for High Definition (HD) sequences. The implementation of a real-time VVC decoder for a low resources embedded platform is the main contribution of this research.

Fusion-Based Versatile Video Coding Intra Prediction Algorithm with Template Matching and Linear Prediction

Article

Full-text available

Aug 2022
SENSORS-BASEL

The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings.

Performance Analysis of Optimized Versatile Video Coding Software Decoders on Embedded Platforms

Preprint

Jun 2022

In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Hence, in July 2020 the Versatile Video Coding (VVC) standard was released providing up to 50% bit-rate saving for the same video quality compared to its predecessor High Efficiency Video Coding (HEVC). However, this bit-rate saving comes at the cost of a high computational complexity, particularly for live applications and on resource-constraint embedded devices. This paper presents two optimized VVC software decoders, named OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. They exploit optimization techniques such as data level parallelism using Single Instruction Multiple Data (SIMD) instructions and functional level parallelism using frame, tile and slice-based parallelisms. Furthermore, a comparison in terms of decoding run time, energy and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of Full High definition (FHD) resolution over the first platform using 8 cores and High-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of average consumed energy: around 26 J and 15 J for the 8 cores and 4 cores embedded platforms, respectively. Regarding the memory usage, OpenVVC showed better results with less average maximum memory consumed during run time compared to VVdeC.

OpenVVC: a Lightweight Software Decoder for the Versatile Video Coding Standard

Preprint

Full-text available

May 2022

In the recent years, users requirements for higher resolution, coupled with the apparition of new multimedia applications, have created the need for a new video coding standard. The new generation video coding standard, called Versatile Video Coding (VVC), has been developed by the Joint Video Experts Team, and offers coding capability beyond the previous generation High Efficiency Video Coding (HEVC) standard. Due to the incorporation of more advanced and complex tools, the decoding complexity of VVC standard compared to HEVC has approximately doubled. This complexity increase raises new research challenges to achieve live software decoding. In this context, we developed OpenVVC, an open-source software decoder that supports a broad range of VVC functionalities. This paper presents the OpenVVC software architecture, its parallelism strategy as well as a detailed set of experimental results. By combining extensive data level parallelism with frame level parallelism, OpenVVC achieves real-time decoding of UHD video content. Moreover, the memory required by OpenVVC is remarkably low, which presents a great advantage for its integration on embedded platforms with low memory resources. The code of the OpenVVC decoder is publicly available at https://github.com/OpenVVC/OpenVVC

The architecture of a Ryzen processor.

Context in source publication

Similar publications

Citations