Figure 3 - uploaded by Anup Saha
Content may be subject to copyright.
The architecture of a Ryzen processor.

The architecture of a Ryzen processor.

Source publication
Article
Full-text available
The increase in high-quality video consumption requires increasingly efficient video coding algorithms. Versatile video coding (VVC) is the current state-of-the-art video coding standard. Compared to the previous video standard, high efficiency video coding (HEVC), VVC demands approximately 50% higher video compression while maintaining the same qu...

Context in source publication

Context 1
... this study, two platforms were used: an AMD Ryzen Threadripper high performance processor [34] and an embedded NXSoC [22]. The architecture of the Ryzen processor is shown in Figure 3. This processor is based on Zen microarchitecture [35], which has a primary building block a core complex (CCX). ...

Similar publications

Preprint
Full-text available
In this paper, two 8-point multiplication-free DCT approximations based on the Chen's factorization are proposed and their fast algorithms are also derived. Both transformations are assessed in terms of computational cost, error energy, and coding gain. Experiments with a JPEG-like image compression scheme are performed and results are compared wit...
Article
Full-text available
The Versatile Video Coding (VVC) standard, released in July 2020, brings better coding performance than the High-Efficiency Video Coding (HEVC) thanks to the introduction of new coding tools. The transform module in the VVC standard incorporates the Multiple Transform Selection (MTS) concept, which relies on separable Discrete Cosine Transform (DCT...
Article
Full-text available
The emergence of the new video coding standard, Versatile Video Coding (VVC), has resulted in a 40-50% coding gain over its predecessor HEVC for the same visual quality. However, this is accompanied by a sharp increase in computational complexity. The emergence of the VVC standard and the increase in video resolution have exceeded the capacity of s...
Article
Full-text available
The H.266/Versatile Video Coding (VVC) standard, released in July 2020, has improved the encoder performance over the previous High Efficiency Video Coding (HEVC) with a significant increase in coding complexity. Enhancements on the transform module mainly involve the introduction of the Adaptive Multiple Transform (AMT) which has led to an additio...
Preprint
Full-text available
In the recent years, users requirements for higher resolution, coupled with the apparition of new multimedia applications, have created the need for a new video coding standard. The new generation video coding standard, called Versatile Video Coding (VVC), has been developed by the Joint Video Experts Team, and offers coding capability beyond the p...

Citations

... First, ILMCS is a new tool in VVC, which enhances decoding performance by inversely mapping the luma code to the reconstructed block. DBF and SAO in VVC are very similar to HEVC [19]. DBF is used to detect and filter the artifacts of pixels at the boundary of the block, and SAO is used to minimize the distortion of the sample over the pixels filtered by DBF. ...
... An optimized VVC software decoder for mobile platforms is presented in [26]. This decoder is based on VTM − 11.0 reference software, and achieves realtime decoding for HD video sequences using SIMD and multi-threading on ARM [19] based platform. Finally, in [40] the authors present an heterogenous CPU + GPU implementation of a VVC decoder where the ALF filtering was migrated to the GPU cores. ...
Article
Full-text available
In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Therefore, in July 2020, the versatile video coding (VVC) standard was released, providing up to 50% bit-rate savings for the same video quality compared to its predecessor high-efficiency video coding (HEVC). However, these bit-rate savings come at the cost of high computational complexity, particularly for live applications and on resource-constrained embedded devices. This paper evaluates two optimized VVC software decoders, named OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. These decoders exploit optimization techniques such as data-level parallelism using single instruction multiple data (SIMD) instructions and functional-level parallelism using frame, tile, and slice-based parallelisms. Furthermore, a comparison of decoding runtime, energy, and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of full high-definition (FHD) resolution on the first platform using 8 cores and high-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of the average energy consumed: around 26 J and 15 J for the 8 cores and 4 cores platforms, respectively. Furthermore, OpenVVC showed better results regarding memory usage with a lower average maximum memory consumed during runtime than VVdeC.
... For instance, the inclusion of an Adaptive Loop Filter (ALF) in the filtering loop is responsible for reducing the coding artefacts and minimising the mean square error between the original and reconstructed samples. However, these in-loop filters significantly increase the computational complexity requirements of a decoder, on average 30% and 40% when decoding on a high-performance general purpose processor (HGPP) and an embedded general purpose processor (EGPP), respectively [4,5]. According to [5], ALF alone represents an average computational complexity of 5-12% and 12-24% of the total decoding time share on an HGPP and an EGPP, respectively. ...
... However, these in-loop filters significantly increase the computational complexity requirements of a decoder, on average 30% and 40% when decoding on a high-performance general purpose processor (HGPP) and an embedded general purpose processor (EGPP), respectively [4,5]. According to [5], ALF alone represents an average computational complexity of 5-12% and 12-24% of the total decoding time share on an HGPP and an EGPP, respectively. As a result, it has become a rather challenging research goal to reduce the in-loop filtering time, and more particularly to reduce the ALF filtering time through parallelisation to achieve realtime decoding. ...
Article
Full-text available
This paper presents a GPU-based parallelisation of an optimised versatile video decoder (VVC) adaptive loop filter (ALF) filter on a resource-constrained heterogeneous platform. The GPU has been comprehensively utilised to maximise the degree of parallelism, making the programme capable of exploiting the GPU capabilities. The proposed approach enables to accelerate the ALF computation by an average of two times when compared to an already fully optimised version of the software decoder implementation over an embedded platform. Finally, this work presents an analysis of energy consumption, showing that the proposed methodology has a negligible impact on this key parameter.
... For instance, the inclusion of an Adaptive Loop Filter (ALF) in the filtering loop is responsible for reducing the coding artifacts and for minimising the mean square error between the original and reconstructed samples. Nonetheless, these in-loop filters significantly increase the computational complexity requirements of a decoder, on average 30% and 40% when decoding on a high-performance general purpose processor (HGPP) and embedded general purpose processor (EGPP), respectively [4], [5]. According to [5], ALF alone represents an average computational complexity of 5%-12% and 12%-24% of the total decoding time share on a HGPP and on a EGPP, respectively. ...
... Nonetheless, these in-loop filters significantly increase the computational complexity requirements of a decoder, on average 30% and 40% when decoding on a high-performance general purpose processor (HGPP) and embedded general purpose processor (EGPP), respectively [4], [5]. According to [5], ALF alone represents an average computational complexity of 5%-12% and 12%-24% of the total decoding time share on a HGPP and on a EGPP, respectively. As a result, it has become a rather challenging research goal to reduce the in-loop filtering time, and more particularly to reduce the ALF filtering time through parallelisation in order to achieve real-time decoding. ...
Preprint
Full-text available
The computational load requirements of the algorithms integrating current video encoders and decoders makes it necessary to exploit the full capabilities of the hardware to achieve real-time performance, especially in embedded systems. Today, versatile video coding (VVC) is the state-of-the-art reference standard. VVC achieves a compression rate of up to 50\% compared to its predecessor, the high efficiency video coding (HEVC) standard. However, this improvement comes with a significant increase in the complexity of the involved algorithms. Embedded computing systems, such as those integrated in portable multimedia devices, have also increased their computational power. This is due not only to improvements in general-purpose processors and but also to the integration in the same chip architecture of other types of processors that serve as accelerators and offload the CPU. In particular, GPU-type processors are dominating commercial solutions in this sector. In this context, this paper presents a methodology to migrate the adaptive loop filtering (ALF) processing block of a VVC decoder to a GPU. The obtained experimental results show an average speedup of 2 for ALF when compared to an already fully optimised version of the software decoder implementation. In addition, such results show that the proposed parallelisation has a negligible impact in the power consumption of the decoder.
... 0000-0000/00$00.00 © 2021 IEEE This version is useful given its proximity to the development process of the standard, but the performance offered falls far short of the real-time [22]. Alternatively, and within the open-source solutions, versatile video deCoder in its version 0.2.0.0 (VVdeC2) [23] has been developed. ...
Article
The current state-of-the-art video coding standard versatile video coding (VVC) offers 50% bit rate savings over its predecessor high efficiency video coding (HEVC) while maintaining the same quality at the cost of a significant increase in the computational complexity. In this work, an Embedded General Purpose Processor (EGPP) with Single Instruction Multiple Data (SIMD) optimisations suite was used to accelerate an open-source software decoder for achieving real-time decoding over an embedded multi-core platform with 8 cores. The decoding speedup has been analyzed with different number of threads/cores, showing that performance increases linearly. This suggests that multi-core based performance for this solution did not reach the saturation point with 8 cores and more parallelism is possible using a higher number of cores. Thanks to the optimization process, ×2 and ×2.4 speedups have been obtained in the decoding time for All Intra and Random Access sequences, respectively. With these results, real-time decoding is achieved for High Definition (HD) sequences. The implementation of a real-time VVC decoder for a low resources embedded platform is the main contribution of this research.
... Wang et al. [33] designed a Sample Adaptive Offset (SAO) acceleration method to reduce the complexity of VVC. Saha et al. [34] analyzed the decoder complexity of VVC on two different platforms. ...
Article
Full-text available
The new generation video coding standard Versatile Video Coding (VVC) has adopted many novel technologies to improve compression performance, and consequently, remarkable results have been achieved. In practical applications, less data, in terms of bitrate, would reduce the burden of the sensors and improve their performance. Hence, to further enhance the intra compression performance of VVC, we propose a fusion-based intra prediction algorithm in this paper. Specifically, to better predict areas with similar texture information, we propose a fusion-based adaptive template matching method, which directly takes the error between reference and objective templates into account. Furthermore, to better utilize the correlation between reference pixels and the pixels to be predicted, we propose a fusion-based linear prediction method, which can compensate for the deficiency of single linear prediction. We implemented our algorithm on top of the VVC Test Model (VTM) 9.1. When compared with the VVC, our proposed fusion-based algorithm saves a bitrate of 0.89%, 0.84%, and 0.90% on average for the Y, Cb, and Cr components, respectively. In addition, when compared with some other existing works, our algorithm showed superior performance in bitrate savings.
... First, ILMCS is a new addition to VVC which enhances the decoding performance by inverse mapping the luma code to the reconstructed block. DBF and SAO in VVC are very similar to HEVC [24]. DBF is used to detect and filter the artifacts of the pixels at the block boundaries and SAO is used to minimize sample distortion over the pixels filtered by DBF. ...
... The presented decoder was generated from VTM-11.0 reference software. Here, the decoder achieved real-time decoding for HD video sequences using SIMD and multi-threading on ARM [24] based platform. ...
Preprint
In recent years, the global demand for high-resolution videos and the emergence of new multimedia applications have created the need for a new video coding standard. Hence, in July 2020 the Versatile Video Coding (VVC) standard was released providing up to 50% bit-rate saving for the same video quality compared to its predecessor High Efficiency Video Coding (HEVC). However, this bit-rate saving comes at the cost of a high computational complexity, particularly for live applications and on resource-constraint embedded devices. This paper presents two optimized VVC software decoders, named OpenVVC and Versatile Video deCoder (VVdeC), designed for low resources platforms. They exploit optimization techniques such as data level parallelism using Single Instruction Multiple Data (SIMD) instructions and functional level parallelism using frame, tile and slice-based parallelisms. Furthermore, a comparison in terms of decoding run time, energy and memory consumption between the two decoders is presented while targeting two different resource-constraint embedded devices. The results showed that both decoders achieve real-time decoding of Full High definition (FHD) resolution over the first platform using 8 cores and High-definition (HD) real-time decoding for the second platform using only 4 cores with comparable results in terms of average consumed energy: around 26 J and 15 J for the 8 cores and 4 cores embedded platforms, respectively. Regarding the memory usage, OpenVVC showed better results with less average maximum memory consumed during run time compared to VVdeC.
... The ALF performs Wiener filtering to minimize the Mean Squared Error (MSE) between original and reconstructed samples. It is responsible for an important share of VVC decoding complexity [21], mostly due to the classification of every 4×4 block of samples and to the application of diamond shape filters on both luma and chroma samples. Applied in parallel with ALF, the CC-ALF relies on the luma samples to adjust the chroma samples value. ...
... Since 70% of the world population will have mobile connectivity by 2023 according to Cisco [43], the optimization of the decoding process on low performance GPP platforms 2 is a crucial issue. To tackle this concern, Saha et al. [21] optimize the VVdeC decoder for a system on chip heterogeneous platform composed of a low performance GPP and Graphical Processing Unit (GPU) processor. The SSE and AVX instructions included in VVdeC are converted to the Neon instruction set available on low performance GPPs. ...
Preprint
Full-text available
In the recent years, users requirements for higher resolution, coupled with the apparition of new multimedia applications, have created the need for a new video coding standard. The new generation video coding standard, called Versatile Video Coding (VVC), has been developed by the Joint Video Experts Team, and offers coding capability beyond the previous generation High Efficiency Video Coding (HEVC) standard. Due to the incorporation of more advanced and complex tools, the decoding complexity of VVC standard compared to HEVC has approximately doubled. This complexity increase raises new research challenges to achieve live software decoding. In this context, we developed OpenVVC, an open-source software decoder that supports a broad range of VVC functionalities. This paper presents the OpenVVC software architecture, its parallelism strategy as well as a detailed set of experimental results. By combining extensive data level parallelism with frame level parallelism, OpenVVC achieves real-time decoding of UHD video content. Moreover, the memory required by OpenVVC is remarkably low, which presents a great advantage for its integration on embedded platforms with low memory resources. The code of the OpenVVC decoder is publicly available at https://github.com/OpenVVC/OpenVVC