Chapter

Graph Spectral Image and Video Compression

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This chapter presents methods for building graph Fourier transforms (GFTs) for image and video compression. A key insight is that classical transforms, such as the discrete sine/cosine transform (DCT) or the Karhunen–Loeve transform (KLT), can be interpreted from a graph perspective. The chapter considers two sets of techniques for designing graphs, from which the associated GFTs are derived: Graph learning oriented GFT (GL‐GFT), and Block‐adaptive GFT. The graph spectral approaches aim to find graph Laplacian matrices, which denote the inverse covariances for the models of interest. The chapter discusses more specific 1D line models, with rigorous derivations of two separate Gaussian Markov random fields for intra‐ and inter‐predicted blocks. The experimental results demonstrated that GL‐GFTs can provide considerable coding gains with respect to standard transform coding schemes using/DCT. In comparison with the KLTs obtained from sample covariances, GL‐GFTs are more robust and provide better generalization.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This paper introduces a novel graph signal processing framework for building graph-based models from classes of filtered signals. In our framework, graph-based modeling is formulated as a graph system identification problem, where the goal is to learn a weighted graph (a graph Laplacian matrix) and a graph-based filter (a function of graph Laplacian matrices). In order to solve the proposed problem, an algorithm is developed to jointly identify a graph and a graph-based filter (GBF) from multiple signal/data observations. Our algorithm is valid under the assumption that GBFs are one-to-one functions. The proposed approach can be applied to learn diffusion (heat) kernels, which are popular in various fields for modeling diffusion processes. In addition, for specific choices of graph-based filters, the proposed problem reduces to a graph Laplacian estimation problem. Our experimental results demonstrate that the proposed algorithm outperforms the current state-of-the-art methods. We also implement our framework on a real climate dataset for modeling of temperature signals.
Conference Paper
Full-text available
Block-based compression tends to be inefficient when blocks contain arbitrary shaped discontinuities. Recently, graph-based approaches have been proposed to address this issue, but the cost of transmitting graph topology often overcome the gain of such techniques. In this work we propose a new Superpixel-driven Graph Transform (SDGT) that uses clusters of superpixels, which have the ability to adhere nicely to edges in the image, as coding blocks and computes inside these homogeneously colored regions a graph transform which is shape-adaptive. Doing so, only the borders of the regions and the transform coefficients need to be transmitted, in place of all the structure of the graph. The proposed method is finally compared to DCT and the experimental results show how it is able to outperform DCT both visually and in term of PSNR.
Article
Full-text available
Intra-prediction is employed in block-based image coding to reduce energy in the prediction residual before transform coding. Conventional intra-prediction schemes copy directly from known pixels across block boundaries as prediction. In this letter, we first cluster differences between neighboring pixel pairs. Then, for each pixel pair, we add the cluster mean to the known pixel for prediction of the neighboring unknown pixel. The cluster indices are transmitted per block, allowing the decoder to mimic the same intra-prediction. We then propose an optimized transform for the prediction residual, based on a generalized version of previously developed Graph Fourier Transform (GFT). Experimental results show that our generalized intra-prediction plus transform coding outperforms combinations of previous intra-prediction and ADST coding by 2.5 dB in PSNR on average.
Conference Paper
Full-text available
In video coding, motion compensation is an essential tool to obtain residual block signals whose transform coefficients are encoded. This paper proposes novel graph-based transforms (GBTs) for coding inter-predicted residual block signals. Our contribution is twofold: (i) We develop edge adaptive GBTs (EA-GBTs) derived from graphs estimated from residual blocks, and (ii) we design template adaptive GBTs (TA-GBTs) by introducing simplified graph templates generating different set of GBTs with low transform signaling overhead. Our experimental results show that proposed methods significantly outperform traditional DCT and KLT in terms of rate-distortion performance.
Conference Paper
Full-text available
The Karhunen-Loeve transform (KLT) is known to be optimal for decorrelating stationary Gaussian processes, and it provides effective transform coding of images. Although the KLT allows efficient representations for such signals, the transform itself is completely data-driven and computationally complex. This paper proposes a new class of transforms called graph template transforms (GTTs) that approximate the KLT by exploiting a priori information known about signals represented by a graph-template. In order to construct a GTT (i) a design matrix leading to a class of transforms is defined, then (ii) a constrained optimization framework is employed to learn graphs based on given graph templates structuring a priori known information. Our experimental results show that some instances of the proposed GTTs can closely achieve the rate-distortion performance of KLT with significantly less complexity.
Article
Full-text available
Piecewise smooth (PWS) images (e.g., depth maps or animation images) contain unique signal characteristics such as sharp object boundaries and slowly-varying interior surfaces. Leveraging on recent advances in graph signal processing, in this paper we propose to compress PWS images using suitable Graph Fourier Transforms (GFT) to minimize the total signal representation cost of each pixel block, considering both the sparsity of the signal's transform coefficients and the compactness of transform description. Unlike fixed transforms such as the Discrete Cosine Transform (DCT), we can adapt GFT to a particular class of pixel blocks. In particular, we select one among a defined search space of GFTs to minimize total representation cost via our proposed algorithms, leveraging on graph optimization techniques such as spectral clustering and minimum graph cuts. Further, for practical implementation of GFT we introduce two techniques to reduce computation complexity. First, at the encoder we low-pass filter and down-sample a high-resolution (HR) pixel block to obtain a low-resolution (LR) one, so that a LR-GFT can be employed. At the decoder, up-sampling and interpolation are performed adaptively along HR boundaries coded using arithmetic edge coding (AEC), so that sharp object boundaries can be well preserved. Second, instead of computing GFT from a graph in real-time via eigen-decomposition, the most popular LR-GFTs are pre-computed and stored in a table for lookup during encoding and decoding. Using depth maps and computer-graphics images as examples of PWS images, experimental results show that our proposed multi-resolution (MR)-GFT scheme outperforms H.264 intra by 6:8 dB on average in PSNR at the same bit rate.
Conference Paper
Full-text available
Transform coding plays a crucial role in video coders. Recently, additional transforms based on the DST and the DCT have been included in the latest video coding standard, HEVC. Those transforms were introduced after a thoroughly analysis of the video signal properties. In this paper, we design additional transforms by using an alternative learning approach. The appropriateness of the design over the classical KLT learning is also shown. Subsequently, the additional designed transforms are applied to the latest HEVC scheme. Results show that coding performance is improved compared to the standard. Additional results show that the coding performance can be significantly further improved by using non-separable transforms. Bitrate reductions in the range of 2% over HEVC are achieved with those proposed transforms.
Article
Full-text available
Given i.i.d. observations of a random vector X∈ℝp, we study the problem of estimating both its covariance matrix Σ*, and its inverse covariance or concentration matrix Θ*=(Σ*)−1. When X is multivariate Gaussian, the non-zero structure of Θ* is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ* is the ℓ1-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian X, since it corresponds to minimizing an ℓ1-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p,s,d), our analysis identifies other key quantities that control rates: (a) the ℓ∞-operator norm of the true covariance matrix Σ*; and (b) the ℓ∞ operator norm of the sub-matrix Γ*SS, where S indexes the graph edges, and Γ*=(Θ*)−1⊗(Θ*)−1; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ* and (d) the rate of decay 1/f(n,δ) on the probabilities {|Σ̂nij−Σ*ij|>δ}, where Σ̂n is the sample covariance based on n samples. Our first result establishes consistency of our estimate Θ̂ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $\ensuremath{d}=o(\sqrt{\ensuremath{s}})$ . In our second result, we show that with probability converging to one, the estimate Θ̂ correctly specifies the zero pattern of the concentration matrix Θ*. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.
Article
Full-text available
In this article we provide an overview of rate-distortion (R-D) based optimization techniques and their practical application to image and video coding. We begin with a short discussion of classical rate-distortion theory and then we show how in many practical coding scenarios, such as in standards-compliant coding environments, resource allocation can be put in an R-D framework. We then introduce two popular techniques for resource allocation, namely, Lagrangian optimization and dynamic programming. After a discussion of these techniques as well as some of their extensions, we conclude with a quick review of literature in these areas citing a number of applications related to image and video compression and transmission
Conference Paper
Full-text available
We propose a complete video encoder based on directional “non-separable” transforms that allow spatial and temporal correlation to be jointly exploited. These lifting-based wavelet transforms are applied on graphs that link pixels in a video sequence based on motion information. In this paper, we first consider a low complexity version of this transform, which can operate on subgraphs without significant loss in performance. We then study coefficient reordering techniques that lead to a more realistic and efficient encoder that the one we presented in our earlier work. Our proposed technique shows encouraging results as compared to a comparable scheme based on the DCT transform.
Conference Paper
Full-text available
In this work a new set of edge-adaptive transforms (EATs) is presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid filtering across edges in each image block, thus, they avoid creating large high frequency coefficients. These transforms are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between DCT and EAT in an RD-optimized manner. These transforms are applied to coding depth maps used for view synthesis in a multi-view video coding system, and provides up to 29% bit rate reduction for a fixed quality in the synthesized views.
Article
Full-text available
This paper proposes a novel approach to jointly optimize spatial prediction and the choice of the subsequent transform in video and image compression. Under the assumption of a separable first-order Gauss-Markov model for the image signal, it is shown that the optimal Karhunen-Loeve Transform, given available partial boundary information, is well approximated by a close relative of the discrete sine transform (DST), with basis vectors that tend to vanish at the known boundary and maximize energy at the unknown boundary. The overall intraframe coding scheme thus switches between this variant of the DST named asymmetric DST (ADST), and traditional discrete cosine transform (DCT), depending on prediction direction and boundary information. The ADST is first compared with DCT in terms of coding gain under ideal model conditions and is demonstrated to provide significantly improved compression efficiency. The proposed adaptive prediction and transform scheme is then implemented within the H.264/AVC intra-mode framework and is experimentally shown to significantly outperform the standard intra coding mode. As an added benefit, it achieves substantial reduction in blocking artifacts due to the fact that the transform now adapts to the statistics of block edges. An integer version of this ADST is also proposed.
Article
Full-text available
We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm—the graphical lasso—that is remarkably fast: It solves a 1000-node problem (∼500000 parameters) in at most a minute and is 30–4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Bühlmann (2006). We illustrate the method on some cell-signaling data from proteomics.
Article
Full-text available
In our paper titled ldquoalgebraic signal processing theory: foundation and 1-D Timerdquo appearing in this issue of the IEEE Transactions on Signal Processing , we presented the algebraic signal processing theory, an axiomatic and general framework for linear signal processing. The basic concept in this theory is the signal model defined as the triple ( A , M ,Phi), where A is a chosen algebra of filters, M an associated A -module of signals, and Phi is a generalization of the z -transform. Each signal model has its own associated set of basic SP concepts, including filtering, spectrum, and Fourier transform. Examples include infinite and finite discrete time where these notions take their well-known forms. In this paper, we use the algebraic theory to develop infinite and finite space signal models. These models are based on a symmetric space shift operator, which is distinct from the standard time shift. We present the space signal processing concepts of filtering or convolution, ldquo z -transform,rdquo spectrum, and Fourier transform. For finite length space signals, we obtain 16 variants of space models, which have the 16 discrete cosine and sine transforms (DCTs/DSTs) as Fourier transforms. Using this novel derivation, we provide missing signal processing concepts associated with the DCTs/DSTs, establish them as precise analogs to the DFT, get deep insight into their origin, and enable the easy derivation of many of their properties including their fast algorithms.
Article
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of transforms called graph-based transforms (GBTs) for video compression, and proposes two different techniques to design GBTs. In the first technique, we formulate an optimization problem to learn graphs from data and provide solutions for optimal separable and nonseparable GBT designs, called GL-GBTs. The optimality of the proposed GL-GBTs is also theoretically analyzed based on Gaussian-Markov random field (GMRF) models for intra and inter predicted block signals. The second technique develops edge-adaptive GBTs (EA-GBTs) in order to flexibly adapt transforms to block signals with image edges (discontinuities). The advantages of EA-GBTs are both theoretically and empirically demonstrated. Our experimental results show that the proposed transforms can significantly outperform the traditional Karhunen-Loeve transform (KLT).
Conference Paper
In many video coding systems, separable transforms (such as two-dimensional DCT-2) have been used to code block residual signals obtained after prediction. This paper proposes a parametric approach to build graph-based separable transforms (GBSTs) for video coding. Specifically, a GBST is derived from a pair of line graphs, whose weights are determined based on two non-negative parameters. As certain choices of those parameters correspond to the discrete sine and cosine transform types used in recent video coding standards (including DCT-2, DST-7 and DCT-8), this paper further optimizes these graph parameters to better capture residual block statistics and improve video coding efficiency. The proposed GBSTs are tested on the Versatile Video Coding (VVC) reference software, and the experimental results show that about 0.4% average coding gain is achieved over the existing set of separable transforms constructed based on DCT-2, DST-7 and DCT-8 in VVC.
Article
In this paper, we propose a new graph-based transform and illustrate its potential application to signal compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an effective graph-based transform. We introduce a novel graph estimation algorithm, which uncovers the connectivities between the graph signal values by taking into consideration the coding of both the signal and the graph topology in rate-distortion terms. In particular, we introduce a novel coding solution for the graph by treating the edge weights as another graph signal that lies on the dual graph. Then, the cost of the graph description is introduced in the optimization problem by minimizing the sparsity of the coefficients of its graph Fourier transform (GFT) on the dual graph. In this way, we obtain a convex optimization problem whose solution defines an efficient transform coding strategy. The proposed technique is a general framework that can be applied to different types of signals, and we show two possible application fields, namely natural image coding and piecewise smooth image coding. The experimental results show that the proposed graph-based transform outperforms classical fixed transforms such as DCT for both natural and piecewise smooth images. In the case of depth map coding, the obtained results are even comparable to the state-of-the-art graph-based coding method, that are specifically designed for depth map images.
Article
In this paper, we propose a new graph-based coding framework and illustrate its application to image compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an effective graph-based transform. We introduce a novel graph estimation algorithm, which uncovers the connectivities between the graph signal values by taking into consideration the coding of both the signal and the graph topology in rate-distortion terms. In particular, we introduce a novel coding solution for the graph by treating the edge weights as another graph signal that lies on the dual graph. Then, the cost of the graph description is introduced in the optimization problem by minimizing the sparsity of the coefficients of its graph Fourier transform (GFT) on the dual graph. In this way, we obtain a convex optimization problem whose solution defines an efficient transform coding strategy. The proposed technique is a general framework that can be applied to different types of signals, and we show two possible application fields, namely natural image coding and piecewise smooth image coding. The experimental results show that the proposed method outperforms classical fixed transforms such as DCT, and, in the case of depth map coding, the obtained results are even comparable to the state-of-the-art graph-based coding method, that are specifically designed for depth map images.
Article
Graphs are fundamental mathematical structures used in various fields to represent data, signals and processes. In this paper, we propose a novel framework for learning/estimating graphs from data. The proposed framework includes (i) formulation of various graph learning problems, (ii) their probabilistic interpretations and (iii) efficient algorithms to solve them. We specifically focus on graph learning problems where the goal is to estimate a graph Laplacian matrix from some observed data under given structural constraints (e.g., graph connectivity and sparsity). Our experimental results demonstrate that the proposed algorithms outperform the current state-of-the-art methods in terms of graph learning performance.
Article
The construction of a meaningful graph plays a crucial role in the success of many graph-based representations and algorithms for handling structured data, especially in the emerging field of graph signal processing. However, a meaningful graph is not always readily available from the data, nor easy to define depending on the application domain. In particular, it is often desirable in graph signal processing applications that a graph is chosen such that the data admit certain regularity or smoothness on the graph. In this paper, we address the problem of learning graph Laplacians, which is equivalent to learning graph topologies, such that the input data form graph signals with smooth variations on the resulting topology. To this end, we adopt a factor analysis model for the graph signals and impose a Gaussian probabilistic prior on the latent variables that control these signals. We show that the Gaussian prior leads to an efficient representation that favors the smoothness property of the graph signals. We then propose an algorithm for learning graphs that enforces such property and is based on minimizing the variations of the signals on the learned graph. Experiments on both synthetic and real world data demonstrate that the proposed graph learning framework can efficiently infer meaningful graph topologies from signal observations under the smoothness prior.
Chapter
There is no better way to quantize a single vector than to use VQ with a codebook that is optimal for the probability distribution describing the random vector. However, direct use of VQ suffers from a serious complexity barrier that greatly limits its practical use as a complete and self-contained coding technique
Conference Paper
H.265/HEVC intra coding scheme allows up to 35 prediction modes. In this paper, we propose intra-mode dependent residual transform using 2D-KLT for 4×4, 8×8, 16×16 and 32×32 blocks. Unlike H.265/HEVC and former standards, the transform is not separable and has higher degree of freedom. It does not require coefficient scanning process. Preliminary results demonstrate BD-rate gain of up to 2.30% (average except screen contents), 2.35% (all average) and 12.67% (maximum) compared to HM10.0 anchor.
Conference Paper
Google has recently finalized a next generation open-source video codec called VP9, as part of the libvpx repository of the WebM project (http://www.webmproject.org/). Starting from the VP8 video codec released by Google in 2010 as the baseline, various enhancements and new tools were added, resulting in the next-generation VP9 bit-stream. This paper provides a brief technical overview of VP9 along with comparisons with other state-of-the-art video codecs H.264/AVC and HEVC on standard test sets. Results show VP9 to be quite competitive with mainstream state-of-the-art codecs.
Article
Depth image compression is important for compact representation of 3D visual data in "texture-plus-depth" format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-imagebased rendering (DIBR) using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, etc. In this paper, we leverage on the observation that "neighboring pixels of similar depth have similar motion" to efficiently encode depth video. Specifically, we divide a depth block containing two zones of distinct values (e.g., foreground and background) into two arbitrarily shaped regions (subblocks) along the dividing boundary before performing separate motion prediction (MP). While such arbitrarily shaped sub-block MP can lead to very small prediction residuals (resulting in few bits required for residual coding), it incurs an overhead to transmit the dividing boundaries for subblock identification at decoder. To minimize this overhead, we first devise a scheme called arithmetic edge coding (AEC) to efficiently code boundaries that divide blocks into subblocks. Specifically, we propose to incorporate the boundary geometrical correlation in an adaptive arithmetic coder in the form of a statistical model. Then, we propose two optimization procedures to further improve the edge coding performance of AEC for a given depth image. The first procedure operates within a code block, and allows lossy compression of the detected block boundary to lower the cost of AEC, with an option to augment boundary depth pixel values matching the new boundary, given the augmented pixels do not adversely affect synthesized view distortion. The second procedure operates across code blocks, and systematically identifies blocks along an object contour that should be coded using sub-block MP via a rate-distortion optimized trellis. Experimental results show an average overall bitrate reduction of up to 33% over classical H.264/AVC.
Article
The directional intra prediction (IP) in H.264/AVC and HEVC tends to cause the residue to be anisotropic. To transform the IP residue, Mode Dependent Directional Transform (MDDT) based on Karhunen Loève transform (KLT) can achieve better energy compaction than DCT, with one transform assigned to each prediction mode. However, due to the data variation, different residue blocks with the same IP mode may not have the same statistical properties. Instead of constraining one transform for each IP mode, in this paper, we propose a novel rate-distortion optimized transform (RDOT) scheme which allows a set of specially trained transforms to be available to all modes, and each block can choose its preferred transform to minimize the rate-distortion (RD) cost. We define a cost function which is an estimate of the true RD cost and use the Lloyd-type algorithm (a sequence of transform optimization and data reclassification alternately) to find the optimal set of transforms. The proposed RDOT scheme is implemented in HM9.0 software of HEVC. Experimental results suggest that RDOT effectively achieves 1.6% BD-Rate reduction under the Intra Main condition and 1.6% BD-Rate reduction under the Intra High Efficiency (HE) 10bit condition.
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Chapter
Half-title pageSeries pageTitle pageCopyright pageDedicationPrefaceAcknowledgementsContentsList of figuresHalf-title pageIndex
Conference Paper
In this paper, a novel intra coding scheme is proposed. The proposed scheme improves H.264 intra coding from three aspects: 1) H.264 intra prediction is enhanced with additional bi-directional intra prediction modes; 2) H.264 integer transform is supplemented with directional transforms for some prediction modes; and 3) residual coefficient coding in CAVLC is improved. Compared to H.264, together the improvements can bring on average 7% and 10% coding gain for CABAC and for CAVLC, respectively, with average coding gain of 12% for HD sequences.
Article
We propose a novel method for constructing wavelet transforms of functions defined on the vertices of an arbitrary finite weighted graph. Our approach is based on defining scaling using the graph analogue of the Fourier domain, namely the spectral decomposition of the discrete graph Laplacian L. Given a wavelet generating kernel g and a scale parameter t, we define the scaled wavelet operator . The spectral graph wavelets are then formed by localizing this operator by applying it to an indicator function. Subject to an admissibility condition on g, this procedure defines an invertible transform. We explore the localization properties of the wavelets in the limit of fine scales. Additionally, we present a fast Chebyshev polynomial approximation algorithm for computing the transform that avoids the need for diagonalizing L. We highlight potential applications of the transform through examples of wavelets on graphs corresponding to a variety of different problem domains.
Article
Given a probability distribution in R^n with general (non-white) covariance, a classical estimator of the covariance matrix is the sample covariance matrix obtained from a sample of N independent points. What is the optimal sample size N = N(n) that guarantees estimation with a fixed accuracy in the operator norm? Suppose the distribution is supported in a centered Euclidean ball of radius \sqrt{n}. We conjecture that the optimal sample size is N = O(n) for all distributions with finite fourth moment, and we prove this up to an iterated logarithmic factor. This problem is motivated by the optimal theorem of Rudelson which states that N = O(n \log n) for distributions with finite second moment, and a recent result of Adamczak, Litvak, Pajor and Tomczak-Jaegermann which guarantees that N = O(n) for sub-exponential distributions.
Article
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.
Article
Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems. A more detailed overview of the theory (without proofs) can be found in Vapnik (1995). In Vapnik (1998) one can find detailed description of the theory (including proofs).
Article
Discusses various aspects of transform coding, including: source coding, constrained source coding, the standard theoretical model for transform coding, entropy codes, Huffman codes, quantizers, uniform quantization, bit allocation, optimal transforms, transforms visualization, partition cell shapes, autoregressive sources, transform optimization, synthesis transform optimization, orthogonality and independence, and departures form the standard model
Article
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
Article
We examine the performance of the Karhunen-Loeve transform (KLT) for transform coding applications. The KLT has long been viewed as the best available block transform for a system that orthogonally transforms a vector source, scalar quantizes the components of the transformed vector using optimal bit allocation, and then inverse transforms the vector. This paper treats fixed-rate and variable-rate transform codes of non-Gaussian sources. The fixed-rate approach uses an optimal fixed-rate scalar quantizer to describe the transform coefficients; the variable-rate approach uses a uniform scalar quantizer followed by an optimal entropy code, and each quantized component is encoded separately. Earlier work shows that for the variable-rate case there exist sources on which the KLT is not unique and the optimal quantization and coding stage matched to a "worst" KLT yields performance as much as 1.5 dB worse than the optimal quantization and coding stage matched to a "best" KLT. In this paper, we strengthen that result to show that in both the fixed-rate and the variable-rate coding frameworks there exist sources for which the performance penalty for using a "worst" KLT can be made arbitrarily large. Further, we demonstrate in both frameworks that there exist sources for which even a best KLT gives suboptimal performance. Finally, we show that even for vector sources where the KLT yields independent coefficients, the KLT can be suboptimal for fixed-rate coding.
Article
Each Discrete Cosine Transform uses N real basis vectors whose components are cosines. In the DCT-4, for example, the jth component of v k is cos(j + 1 2 )(k + 1 2 ) ß N . These basis vectors are orthogonal and the transform is extremely useful in image processing. If the vector x gives the intensities along a row of pixels, its cosine series P c k v k has the coefficients c k = (x; v k )=N . They are quickly computed from an FFT. But a direct proof of orthogonality, by calculating inner products, does not reveal how natural these cosine vectors are. We prove orthogonality in a different way. Each DCT basis contains the eigenvectors of a symmetric "second difference" matrix. By varying the boundary conditions we get the established transforms DCT-1 through DCT-4. Other combinations lead to four additional cosine transforms. The type of boundary condition (Dirichlet or Neumann, centered at a meshpoint or a midpoint) determines the applications that are appropriate for each transfor...
IEEE International Conference on Image Processing ( ICIP )
  • G. Fracastoro
  • F. Verdoja
  • M. Grangetto
  • E. Magli