Article

An Algorithm for Learning Orthonormal Matrix Codebooks for Adaptive Transform Coding

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper proposes a novel data-driven approach to designing orthonormal transform matrix codebooks for adaptive transform coding of any non-stationary vector processes which can be considered locally stationary. Our algorithm, which belongs to the class of block-coordinate descent algorithms, relies on simple probability models such as Gaussian or Laplacian for transform coefficients to directly minimize with respect to the orthonormal transform matrix the mean square error (MSE) of scalar quantization and entropy coding of transform coefficients. A difficulty commonly encountered in such minimization problems is imposing the orthonormality constraint on the matrix solution. We get around this difficulty by mapping the constrained problem in Euclidean space to an unconstrained problem on the Stiefel manifold and leveraging known algorithms for unconstrained optimization on manifolds. While the basic design algorithm directly applies to non-separable transforms, an extension to separable transforms is also proposed. We present experimental results for adaptive transform coding of still images and video inter-frame prediction residuals, comparing the transforms designed using the proposed method and a number of other content-adaptive transforms recently reported in the literature.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Article
Full-text available
The paper provides an overview of the quantization and entropy coding methods in the Versatile Video Coding (VVC) standard. Special focus is laid on techniques that improve coding efficiency relative to the methods included in the High Efficiency Video Coding (HEVC) standard: The inclusion of trellis-coded quantization, the advanced context modeling for entropy coding of transform coefficient levels, the arithmetic coding engine with multi-hypothesis probability estimation, and the joint coding of chroma residuals. Beside a description of the design concepts, the paper also discusses motivations and implementation aspects. The effectiveness of the quantization and entropy coding methods specified in VVC is validated by experimental results.
Article
Full-text available
High Efficiency Video Coding (HEVC) is the most recent jointly developed video coding standard of ITU-T Visual Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). Although its basic architecture is built along the conventional hybrid block-based approach of combining prediction with transform coding, HEVC includes a number of coding tools with greatly enhanced coding-efficiency capabilities relative to those of prior video coding standards. Among these tools are new transform coding techniques that include the support for dyadically increasing transform block sizes ranging from 4 × 4 to 32 × 32, the partitioning of residual blocks into variable block-size transforms by using a quadtree-based partitioning dubbed as residual quadtree (RQT) as well as some properly designed entropy coding techniques for quantized transform coefficients of variable transform block sizes. In this paper, we describe these HEVC techniques for transform coding with a particular focus on the RQT structure and the entropy coding stage and demonstrate their benefit in terms of improved coding efficiency by experimental results.
Article
Full-text available
This paper provides a systematical rate-distortion (R-D) analysis of the dead-zone plus uniform threshold scalar quantization (DZ+UTSQ) with nearly-uniform reconstruction quantization (NURQ) for generalized Gaussian distribution (GGD), which consists of two aspects: R-D performance analysis and R-D modeling. In R-D performance analysis, we first derive the preliminary constraint of optimum entropy-constrained DZ+UTSQ/NURQ for GGD, under which the property of GGD distortion-rate (D-R) function is elucidated. Then for GGD source of actual transform coefficients, the refined constraint and precise conditions of optimum DZ+UTSQ/NURQ are rigorously deduced in the real coding bit rate range, and efficient DZ+UTSQ/NURQ design criteria are proposed to reasonably simplify the utilization of effective quantizers in practice. In R-D modeling, inspired by the R-D performance analysis, the D-R function is first developed, followed by the novel rate-quantization (R-Q) and distortion- quantization (D-Q) models derived using analytical and heuristic methods. The D-R, R-Q and D-Q models together form the source model describing the relationship between rate, distortion and quantization step. One application of the proposed source model, as is shown in the second part of this paper, is the effective two-pass VBR coding algorithm design on encoder of H.264/AVC reference software, which achieves constant video quality and desirable rate control accuracy.
Article
Full-text available
In today's hybrid video coding, Rate-Distortion Optimization (RDO) plays a critical role. It aims at minimizing the distortion under a constraint on the rate. Currently, the most popular RDO algorithm for one-pass coding is the one recommended in the H.264/AVC reference software. It, or HR-lambda for convenience, is actually a kind of universal method which performs the optimization only according to the quantization process while ignoring the properties of input sequences. Intuitively, it is not efficient all the time and an adaptive scheme should be better. Therefore, a new algorithm Lap- lambda is presented in this paper. Based on the Laplace distribution of transformed residuals, the proposed Lap-lambda is able to adaptively optimize the input sequences so that the overall coding efficiency is improved. Cases which cannot be well captured by the proposed models are considered via escape methods. Comprehensive simulations verify that compared with HR-lambda , Lap-lambda shows a much better or similar performance in all scenarios. Particularly, significant gains of 1.79 dB and 1.60 dB in PSNR are obtained for slow sequences and B-frames, respectively.
Conference Paper
Full-text available
Transform-based image coding has been the mainstream for many years, as witnessed in from the early effort in JPEG to the recent advances in HD Photo. Traditionally, a 2-D transform used in image coding is always implemented separately along the vertical and horizontal directions, respectively. However, it is usually true that many image blocks contain oriented structures (e.g., edges) and/or textures that do not follow either the vertical or horizontal direction. The traditional 2-D transform thus may not be the most appropriate one to these image blocks. This well-known fact has recently triggered several attempts towards the development of directional transforms so as to better preserve the directional information in an image block. Some of these directional transforms have been applied in image coding, demonstrating a significant coding gain. This paper presents an overview of these directional transforms as well as a discussion of some existing problems and their potential solutions.
Article
Full-text available
Block-based discrete cosine transform (DCT) has been successfully adopted into several international image/video coding standards, e.g., MPEG-2, H.264/AVC, as it can achieve a good tradeoff between performance and complexity. Although DCT theoretically approximates the optimum Karhunen–Loève transform under first-order Markov conditions, one fixed set of transform basis functions (TBF) cannot handle all the cases efficiently due to the non-stationary nature of video contents. To further improve the performance of block-based transform coding, in this paper, we present the design of rate-distortion optimized transform (RDOT) which contributes to both intraframe and interframe coding. The most important property which makes a difference between RDOT and the conventional DCT is that, in the proposed method, transform is implemented with multiple TBF candidates which are obtained from off-line training. With this feature, for coding each residual block, the encoder is capable to select the optimal set of TBF in terms of rate-distortion performance, and better energy compaction is achieved in the transform domain. To obtain an optimum group of candidate TBF, we have developed a two-step iterative optimization technique for the off-line training, with which the TBF candidates are refined at each iteration until the training process becomes converged. Moreover, analysis on the optimal group of candidate TBF is also presented in this paper, with a detailed description of a practical implementation for the proposed algorithm on the latest VCEG key technical area software platform. Extensive experimental results show that, compared with the conventional DCT-based transform scheme adopted into the state-of-the-art H.264/AVC video coding standard, significant improvement of coding performance has been achieved for both intraframe and interframe coding with our proposed method.
Article
Full-text available
We describe a general coding strategy leading to a family of universal image compression systems designed to give good performance in applications where the statistics of the source to be compressed are not available at design time or vary over time or space. The basic approach considered uses a two-stage structure in which the single source code of traditional image compression systems is replaced with a family of codes designed to cover a large class of possible sources. To illustrate this approach, we consider the optimal design and use of two-stage codes containing collections of vector quantizers (weighted universal vector quantization), bit allocations for JPEG-style coding (weighted universal bit allocation), and transform codes (weighted universal transform coding). Further, we demonstrate the benefits to be gained from the inclusion of perceptual distortion measures and optimal parsing. The strategy yields two-stage codes that significantly outperform their single-stage predecessors. On a sequence of medical images, weighted universal vector quantization outperforms entropy coded vector quantization by over 9 dB. On the same data sequence, weighted universal bit allocation outperforms a JPEG-style code by over 2.5 dB. On a collection of mixed test and image data, weighted universal transform coding outperforms a single, data-optimized transform code (which gives performance almost identical to that of JPEG) by over 6 dB.
Article
Full-text available
Two new design techniques for adaptive orthogonal block transforms based on vector quantization (VQ) codebooks are presented. Both techniques start from reference vectors that are adapted to the characteristics of the signal to be coded, while using different methods to create orthogonal bases. The resulting transforms represent a signal coding tool that stands between a pure VQ scheme on one extreme and signal-independent, fixed block transformation-like discrete cosine transform (DCT) on the other. The proposed technique has superior compaction performance as compared to DCT both in the rendition of details of the image and in the peak signal-to-noise ratio (PSNR) figures.
Conference Paper
Full-text available
In this work, we study a multiple-input multiple-output wireless system, where the channel state information is partially available at the transmitter through a feedback link. Based on singular value decomposition, the MIMO channel is split into independent subchannels, which allows separate, and therefore, efficient decoding of the transmitted data signal. Effective feedback of the required spatial channel information entails efficient quantization/encoding of a Haar unitary matrix. The parameter reduction of an n × n unitary matrix to its n<sup>2</sup> - n basic parameters is performed through Givens decomposition. We prove that Givens matrices of a Haar unitary matrix are statistically independent. Subsequently, we derive the probability distribution function (PDF) of the corresponding matrix elements. Based on these analyses, an efficient quantization scheme is proposed. The performance evaluation is provided for a scenario where the rates allocated to each independent channel are selected according to its corresponding gain. The results indicate a significant performance improvement compared to the performance of MIMO systems without feedback at the cost of a very low-rate feedback link.
Conference Paper
Full-text available
We investigate the application of local principal component analysis (PCA) to transform coding for fixed-rate image compression. Local PCA transform coding adapts to differences in correlations between signal components by partitioning the signal space into regions and compressing signal vectors in each region with a separate local transform coder. Previous researchers optimize the signal space partition and transform coders independently and consequently underestimate the potential advantage of using adaptive transform coding methods. We propose a new algorithm that concurrently optimizes the signal space partition and local transform coders. This algorithm is simply a constrained version of the LBG algorithm for vector quantizer design. Image compression experiments show that adaptive transform coders designed with our integrated algorithm compress an image with less distortion than previous related methods. We saw improvements in compressed image signal-to-noise ratio of 0.5 to 2.0 dB compared to other tested adaptive methods and 2.5 to 3.0 dB compared to global PCA transform coding
Article
Full-text available
Over the past two decades, there have been various studies on the distributions of the DCT coefficients for images. However, they have concentrated only on fitting the empirical data from some standard pictures with a variety of well-known statistical distributions, and then comparing their goodness of fit. The Laplacian distribution is the dominant choice balancing simplicity of the model and fidelity to the empirical data. Yet, to the best of our knowledge, there has been no mathematical justification as to what gives rise to this distribution. We offer a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature. This model also allows us to investigate how certain changes in the image statistics could affect the DCT coefficient distributions
Article
Full-text available
Nearly all block-based transform schemes for image and video coding developed so far choose the 2-D discrete cosine transform (DCT) of a square block shape. With almost no exception, this conventional DCT is implemented separately through two 1-D transforms, one along the vertical direction and another along the horizontal direction. In this paper, we develop a new block-based DCT framework in which the first transform may choose to follow a direction other than the vertical or horizontal one. The coefficients produced by all directional transforms in the first step are arranged appropriately so that the second transform can be applied to the coefficients that are best aligned with each other. Compared with the conventional DCT, the resulting directional DCT framework is able to provide a better coding performance for image blocks that contain directional edges-a popular scenario in many image signals. By choosing the best from all directional DCTs (including the conventional DCT as a special case) for each image block, we will demonstrate that the rate-distortion coding performance can be improved remarkably. Finally, a brief theoretical analysis is presented to justify why certain coding gain (over the conventional DCT) results from this directional framework.
Article
Full-text available
A source model describing the relationship between bits, distortion, and quantization step sizes of a large class of block-transform video coders is proposed. This model is initially derived from the rate-distortion theory and then modified to match the practical coders and real image data. The realistic constraints such as quantizer dead-zone and threshold coefficient selection are included in our formulation. The most attractive feature of this model is its simplicity in its final form. It enables us to predict the bits needed to encode a picture at a given distortion or to predict the quantization step size at a given bit rate. There are two aspects of our contribution: one, we extend the existing results of rate-distortion theory to the practical video coders, and two, the nonideal factors in real signals and systems are identified, and their mathematical expressions are derived from empirical data. One application of this model, as shown in the second part of this paper, is the buffer/quantizer control on a CCITT P×64 k coder with the advantage that the picture quality is nearly constant over the entire picture sequence
Article
Aimed at both casual spectators and active participants of optimization on manifolds alike, this introductory article presents a wide range of information I would have liked to have been told when I first entered the field. Several arguments are put forth: 1) it is not true that nonconvex implies difficult, 2) many optimization problems in signal processing are approached from the wrong perspective (once-off versus realtime optimization), and 3) the geometry of a manifold should not be used simply for the sake of it. This article also predicts that there is considerable potential for future work on optimization on compact manifolds; large classes of such problems are no harder than convex problems.
Article
Image compression has always been an important topic in the last decades due to the explosive increase of images. The popular image compression formats are based on different transforms which convert images from the spatial domain into compact frequency domain to remove the spatial correlation. In this paper, we focus on the exploration of data-driven transform, Karhunen-Loéve transform (KLT), the kernels of which are derived from specific images via Principal Component Analysis (PCA), and design a high efficient KLT based image compression algorithm with variable transform sizes. To explore the optimal compression performance, the multiple transform sizes and categories are utilized and determined adaptively according to their rate-distortion (RD) costs. Moreover, comprehensive analyses on the transform coefficients are provided and a band-adaptive quantization scheme is proposed based on the coefficient RD performance. Extensive experiments are performed on several class-specific images as well as general images, and the proposed method achieves significant coding gain over the popular image compression standards including JPEG, JPEG 2000, and the state-of-the-art dictionary learning based methods.
Article
Throughout the past few decades, the separable Discrete Cosine Transform (DCT), particularly the DCT type II, has been widely used in image and video compression. It is well known that, under first-order stationary Markov conditions, DCT is an efficient approximation of the optimal Karhunen–Loève transform. However, for natural image and video sources, the adaptivity of a single separable transform with fixed core is rather limited for the highly dynamic image statistics, e.g., textures and arbitrarily directed edges. It is also known that non-separable transforms can achieve better compression efficiency for images with directional texture patterns, yet they are computationally complex, especially when the transform size is large. In order to achieve higher transform coding gains with relatively low-complexity implementations, we propose a joint separable and non-separable transform. The proposed separable primary transform, named Enhanced Multiple Transform (EMT), applies multiple transform cores from a pre-defined subset of sinusoidal transforms, and the transform selection is signaled in a joint block level manner. Moreover, a Non-Separable Secondary Transform (NSST) method is proposed to operate in conjunction with EMT. Unlike the existing non-separable transform schemes which require excessive amounts of memory and computation, the proposed NSST efficiently improves coding gain with much lower complexity. Extensive experimental results show that the proposed methods, in a state-of-the-art video codec, such as HEVC, can provide significant coding gains (average 6.9% and 4.5% bitrate reductions for intra and random-access coding, respectively).
Article
Rate Distortion Optimized Quantization (RDOQ) is an efficient encoder optimization method that plays an important role in improving the rate-distortion (RD) performance of the High Efficiency Video Coding (HEVC) codecs. However, the superior performance of RDOQ is achieved at the expense of high computational complexity cost in two stages RD minimization including determination of optimal quantized level among available candidates for each transformed coefficient and determination of best quantized coefficients for transform units with the minimum total cost, to softly optimize the quantized coefficients. To reduce the computational cost of the RDOQ algorithm in HEVC, we propose a low complexity RDOQ scheme by modeling the statistics of the transform coefficients with hybrid Laplace distribution. In this manner, specifically designed block level rate and distortion models are established based on the coefficient distribution. Therefore, the optimal quantization levels can be directly determined by optimizing the RD performance of the whole block, while the complicated RD cost calculations can be eventually avoided. Extensive experimental results show that with about 0.3%􀀀0.4% RD performance degradation, the proposed low complexity RDOQ algorithm is able to reduce around 70% quantization time with up to 17% total encoding time reduction compared to the original RDOQ implementation in HEVC on average.
Article
Rate control typically involves two steps: bit allocation and bitrate control. The bit allocation step can be implemented in various fashions depending on how many levels of allocation are desired and whether or not an optimal ratedistortion (R-D) performance is pursued. The bitrate control step has a simple aim in achieving the target bitrate as precisely as possible. In our recent research, we have developed a -domain rate control algorithm capable of controlling the bitrate precisely for High Efficiency Video Coding (HEVC). The initial research [1] showed that the bitrate control in the -domain can be more precise than the conventional schemes. However, the simple bit allocation scheme adopted in this initial research is unable to achieve an optimal R-D performance reflecting the inherent R-D characteristics governed by the video content. In order to achieve an optimal R-D performance, the bit allocation algorithms need to be developed taking into account the video content of a given sequence. The key issue in deriving the video content-guided optimal bit allocation algorithm is to build a suitable R-D model to characterize the R-D behavior of the video content. In this research, to complement the R- model developed in our initial work [1], a D- model is properly constructed to complete a comprehensive framework of -domain R-D analysis. Based on this comprehensive -domain R-D analysis framework, a suite of optimal bit allocation algorithms are developed. In particular, we design both picture level and Basic Unit level bit allocation algorithms based on the fundamental rate-distortion optimization (RDO) theory to take full advantage of content-guided principles. The proposed algorithms are implemented in HEVC reference software, and the experimental results demonstrate that they can achieve obvious R-D performance improvement with smaller bitrate control error. The proposed bit allocation algorithms have already been adopted by the Joint Collaborative Team on Video Coding (JCT-VC) and integrated into the HEVC reference software.
Article
The High Efficiency Video Coding (HEVC) standard (ITU-T H.265 and ISO/IEC 23008-2) has been developed with the main goal of providing significantly improved video compression compared with its predecessors. In order to evaluate this goal, verification tests were conducted by the Joint Collaborative Team on Video Coding of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29. This paper presents the subjective and objective results of a verification test in which the performance of the new standard is compared with its highly successful predecessor, the Advanced Video Coding (AVC) video compression standard (ITU-T H.264 and ISO/IEC 14496-10). The test used video sequences with resolutions ranging from 480p up to ultra-high definition, encoded at various quality levels using the HEVC Main profile and the AVC High profile. In order to provide a clear evaluation, this paper also discusses various aspects for the analysis of the test results. The tests showed that bit rate savings of 59% on average can be achieved by HEVC for the same perceived video quality, which is higher than a bit rate saving of 44% demonstrated with the PSNR objective quality metric. However, it has been shown that the bit rates required to achieve good quality of compressed content, as well as the bit rate savings relative to AVC, are highly dependent on the characteristics of the tested content.
Chapter
There is no better way to quantize a single vector than to use VQ with a codebook that is optimal for the probability distribution describing the random vector. However, direct use of VQ suffers from a serious complexity barrier that greatly limits its practical use as a complete and self-contained coding technique
Article
Four representations and parameterizations of orthogonal matrices Q∈R^(m×n) in terms of the minimal number of essential parameters {ϕ} are discussed: the exponential representation, the Householder reflector representation, the Givens rotation representation, and the rational Cayley transform representation. Both square n=m and rectangular n<m situations are considered. Two separate kinds of parameterizations are considered, one in which the individual columns of Q are distinct, the Stiefel manifold, and the other in which only Span(Q) is significant, the Grassmann manifold. The practical issues of numerical stability, continuity, and uniqueness are discussed. The computation of Q in terms of the essential parameters {ϕ}, and also the extraction of {ϕ} for a given Q are considered for all of the parameterizations. The transformation of gradient arrays between the Q and {ϕ} variables is discussed for all representations. It is our hope that developers of new methods will benefit from this comparative presentation of an important but rarely analyzed subject.
Article
We propose a new transform design method that targets the generation of compression-optimized transforms for next-generation multimedia applications. The fundamental idea behind transform compression is to exploit regularity within signals such that redundancy is minimized subject to a fidelity cost. Multimedia signals, in particular images and video, are well known to contain a diverse set of localized structures, leading to many different types of regularity and to nonstationary signal statistics. The proposed method designs sparse orthonormal transforms (SOT) that automatically exploit regularity over different signal structures and provides an adaptation method that determines the best representation over localized regions. Unlike earlier work that is motivated by linear approximation constructs and model-based designs that are limited to specific types of signal regularity, our work uses general nonlinear approximation ideas and a data-driven setup to significantly broaden its reach. We show that our SOT designs provide a safe and principled extension of the Karhunen-Loeve transform (KLT) by reducing to the KLT on Gaussian processes and by automatically exploiting non-Gaussian statistics to significantly improve over the KLT on more general processes. We provide an algebraic optimization framework that generates optimized designs for any desired transform structure (multi-resolution, block, lapped, etc.) with significantly better n-term approximation performance. For each structure, we propose a new prototype codec and test over a database of images. Simulation results show consistent increase in compression and approximation performance compared with conventional methods.
Article
The directional intra prediction (IP) in H.264/AVC and HEVC tends to cause the residue to be anisotropic. To transform the IP residue, Mode Dependent Directional Transform (MDDT) based on Karhunen Loève transform (KLT) can achieve better energy compaction than DCT, with one transform assigned to each prediction mode. However, due to the data variation, different residue blocks with the same IP mode may not have the same statistical properties. Instead of constraining one transform for each IP mode, in this paper, we propose a novel rate-distortion optimized transform (RDOT) scheme which allows a set of specially trained transforms to be available to all modes, and each block can choose its preferred transform to minimize the rate-distortion (RD) cost. We define a cost function which is an estimate of the true RD cost and use the Lloyd-type algorithm (a sequence of transform optimization and data reclassification alternately) to find the optimal set of transforms. The proposed RDOT scheme is implemented in HM9.0 software of HEVC. Experimental results suggest that RDOT effectively achieves 1.6% BD-Rate reduction under the Intra Main condition and 1.6% BD-Rate reduction under the Intra High Efficiency (HE) 10bit condition.
Article
It is well known that the discrete cosine transform (DCT) and Karhunen–Loève transform (KLT) are two good representatives in image and video coding: the first can be implemented very efficiently while the second offers the best R-D coding performance. In this work, we attempt to design some new transforms with two goals: i) approaching to the KLT's R-D performance and ii) maintaining the implementation cost no bigger than that of DCT. To this end, we follow a cascade structure of multiple butterflies to develop an iterative algorithm: two out of N nodes are selected at each stage to form a Givens rotation (which is equivalent to a butterfly); and the best rotation angle is then determined by maximizing the resulted coding gain. We give the closed-form solutions for the node-selection as well as the angle-determination, together with some design examples to demonstrate their superiority.
Article
In this work, we propose a cascaded sparse/DCT (S/DCT) two-layer representations of prediction residuals, and implement this idea on top of the state-of-the-art High Efficiency Video Coding (HEVC) standard. First, a dictionary is adaptively trained to contain featured patterns of residual signals so that a high portion of energy in a structured residual can be efficiently coded via sparse coding. It is observed the sparse representation alone is less effective in the R-D performance due to the side information overhead at higher bit rates. To overcome this problem, the DCT representation is cascaded at the second stage. It is applied to the remaining signal to improve coding efficiency. The two representations successfully complement each other. It is demonstrated by experimental results that the proposed algorithm outperforms the HEVC reference codec HM5.0 in the Common Test Condition [1].
Article
Discrete Cosine Transform (DCT) is the orthogonal transform that is most commonly used in image and video compression. The motion-compensation residual (MC-residual) is also compressed with the DCT in most video codecs. However, the MC-residual has different characteristic from a nature image. In this paper, we develop a new orthogonal transformRotated Orthogonal Transform (ROT) that can perform better on the MC-residual than the DCT for coding purpose. We derive the proposed ROT based on orthogonal-constrained L1-Norm minimization problem for its sparse property. Using the DCT matrix as the starting point, a better orthogonal transform matrix is derived. In addition, by exploring inter-frame dependency and local motion activity, transmission of substantial side information is avoided. The experiment results confirm that, with small computation overhead, the ROT is adaptive to change of local spatial characteristic of MC-residual frame and provides higher compression efficiency for the MC-residual than DCT, especially for high/complex motion videos.
Article
Transforms used in image coding are also commonly used to compress prediction residuals in video coding. Prediction residuals have different spatial characteristics from images, and it is useful to develop transforms that are adapted to prediction residuals. In this paper, we explore the differences between the characteristics of images and motion compensated prediction residuals by analyzing their local anisotropic characteristics and develop transforms adapted to the local anisotropic characteristics of these residuals. The analysis indicates that many regions of motion compensated prediction residuals have 1-D anisotropic characteristics and we propose to use 1-D directional transforms for these regions. We present experimental results with one example set of such transforms within the H.264/AVC codec and the results indicate that the proposed transforms can improve the compression efficiency of motion compensated prediction residuals over conventional transforms.
Article
We propose a direction-adaptive DWT (DA-DWT) that locally adapts the filtering directions to image content based on directional lifting. With the adaptive transform, energy compaction is improved for sharp image features. A mathematical analysis based on an anisotropic statistical image model is presented to quantify the theoretical gain achieved by adapting the filtering directions. The analysis indicates that the proposed DA-DWT is more effective than other lifting-based approaches. Experimental results report a gain of up to 2.5 dB in PSNR over the conventional DWT for typical test images. Subjectively, the reconstruction from the DA-DWT better represents the structure in the image and is visually more pleasing.
Article
The optimal linear block transform for coding images is well known to be the Karhunen-Loeve transformation (KLT). However, the assumption of stationarity in the optimality condition is far from valid for images. Images are composed of regions whose local statistics may vary widely across an image. While the use of adaptation can result in improved performance, there has been little investigation into the optimality of the criterion upon which the adaptation is based. In this paper we propose a new transform coding method in which the adaptation is optimal. The system is modular, consisting of a number of modules corresponding to different classes of the input data. Each module consists of a linear transformation, whose bases are calculated during an initial training period. The appropriate class for a given input vector is determined by the subspace classifier. The performance of the resulting adaptive system is shown to be superior to that of the optimal nonadaptive linear transformation. This method can also be used as a segmentor. The segmentation it performs is independent of variations in illumination. In addition, the resulting class representations are analogous to the arrangement of the directionally sensitive columns in the visual cortex.
Article
Discusses various aspects of transform coding, including: source coding, constrained source coding, the standard theoretical model for transform coding, entropy codes, Huffman codes, quantizers, uniform quantization, bit allocation, optimal transforms, transforms visualization, partition cell shapes, autoregressive sources, transform optimization, synthesis transform optimization, orthogonality and independence, and departures form the standard model
Article
This paper presents novel algorithms that iteratively converge to a local minimum of a real-valued function f (X) subject to the constraint that the columns of the complex-valued matrix X are mutually orthogonal and have unit norm. The algorithms are derived by reformulating the constrained optimization problem as an unconstrained one on a suitable manifold. This significantly reduces the dimensionality of the optimization problem. Pertinent features of the proposed framework are illustrated by using the framework to derive an algorithm for computing the eigenvector associated with either the largest or the smallest eigenvalue of a Hermitian matrix
Article
A multiple class approach is proposed for improving the performance of the Karhunen-Loeve transform (KLT) image compression technique. The classification is adaptively performed by suitable neural networks. Several examples are presented in order to show that the proposed method performs much better than the classical discrete cosine transform (DCT)
Article
The rate-distortion (R-D) optimization technique plays an important role in optimizing video encoders. Modeling the rate and distortion functions accurately in acceptable complexity helps to make the optimization more practical. In this paper, we propose a bit-rate estimation function and a distortion measure by modeling the transform coefficients with spatial-domain variance. Furthermore, with quantization-based thresholding to determine the number, the absolute sum, and the squared sum of transform coefficients, the simplified transform-domain R-D measurement is introduced. The proposed algorithms can reduce the computation complexity for the R-D optimized mode-decision by using the new cost function evolving from the simplified transform-domain R-D model. Based on the proposed estimations, a rate-control scheme in the macroblock layer is also proposed to improve the coding efficiency
Article
In this correspondence we report results from our experiments to find useful measures of local image autocovariance parameters from small subblocks of data. Our criterion for the reliability of parameter estimates is that they should correlate with observed signal activity and yield high quality results when used in adaptive processing. We describe a method for estimating the correlation parameters of first-order Markov (nonseparable exponential) autocovariance models, The method assumes that image data are stationary within N × N pixel sub-blocks. Values of the autocovariance parameters may be calculated at every pixel location. A value of N = 16 yields results which fit our criterion, even when the original data are degraded by blur and noise. An application to data compression is suggested.
Article
We examine the performance of the Karhunen-Loeve transform (KLT) for transform coding applications. The KLT has long been viewed as the best available block transform for a system that orthogonally transforms a vector source, scalar quantizes the components of the transformed vector using optimal bit allocation, and then inverse transforms the vector. This paper treats fixed-rate and variable-rate transform codes of non-Gaussian sources. The fixed-rate approach uses an optimal fixed-rate scalar quantizer to describe the transform coefficients; the variable-rate approach uses a uniform scalar quantizer followed by an optimal entropy code, and each quantized component is encoded separately. Earlier work shows that for the variable-rate case there exist sources on which the KLT is not unique and the optimal quantization and coding stage matched to a "worst" KLT yields performance as much as 1.5 dB worse than the optimal quantization and coding stage matched to a "best" KLT. In this paper, we strengthen that result to show that in both the fixed-rate and the variable-rate coding frameworks there exist sources for which the performance penalty for using a "worst" KLT can be made arbitrarily large. Further, we demonstrate in both frameworks that there exist sources for which even a best KLT gives suboptimal performance. Finally, we show that even for vector sources where the KLT yields independent coefficients, the KLT can be suboptimal for fixed-rate coding.
Article
The Karhunen-Loeve transform (KLT) is optimal for transform coding of a Gaussian source. This is established for all scale-invariant quantizers, generalizing previous results. A backward adaptive technique for combating the data dependence of the KLT is proposed and analyzed. When the adapted transform converges to a KLT, the scheme is universal among transform coders. A variety of convergence results are proven