Article

Analyzing the Optimality of Predictive Transform Coding Using Graph-Based Models

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

In this letter, we provide a theoretical analysis of optimal predictive transform coding based on the Gaussian Markov random field (GMRF) model. It is shown that the eigen-analysis of the precision matrix of the GMRF model is optimal in decorrelating the signal. The resulting graph transform degenerates to the well-known 2-D discrete cosine transform (DCT) for a particular 2-D first order GMRF, although it is not a unique optimal solution. Furthermore, we present an optimal scheme to perform predictive transform coding based on conditional probabilities of a GMRF model. Such an analysis can be applied to both motion prediction and intra-frame predictive coding, and may lead to improvements in coding efficiency in the future.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This operator is essential for graph spectral decomposition to derive the associated GFT basis. Zhang et al. in [30] formulate the attributes in a point cloud as signals following a Gaussian Markov Random Field (GMRF) model and construct graphs in the point cloud. Then, the redundancy among point cloud attribute signals can be optimally decorrelated by the eigenvectorbased transform matrix derived from the graph Laplacian. ...
... Building on the insights from [30], it points out that when a signal follows a GMRF model, the eigenvectors of its precision matrix can optimally decorrelate the signal within an underlying graph structure. Once the spatial prediction on those GMRF signals is complete, with the constructed graph structure, the eigenanalysis of the GMRF model's graph Laplacian matrix can serve as the optimal transform on those signals' residual to compact its energy for quantization and entropy coding. ...
... , x m ) T represent the point cloud attribute signals within a GMRF model. The density function of x with mean µ and a precision matrix Q can be defined as [30]: ...
Article
Full-text available
There is a pressing need across various applications for efficiently compressing point clouds. While the Moving Picture Experts Group introduced the geometry-based point cloud compression (G-PCC) standard, its attribute compression scheme falls short of eliminating signal frequency-domain redundancy. This paper proposes a texture-guided graph transform optimization scheme for point cloud attribute compression. We formulate the attribute transform coding task as a graph optimization problem, considering both the decorrelation capability of the graph transform and the sparsity of the optimized graph within a tailored joint optimization framework. First, the point cloud is reorganized and segmented into local clusters using a Hilbert-based scheme, enhancing spatial correlation preservation. Second, the inter-cluster attribute prediction and intra-cluster prediction are conducted on local clusters to remove spatial redundancy and extract texture priors. Third, the underlying graph structure in each cluster is constructed in a joint rate–distortion–sparsity optimization process, guided by geometry structure and texture priors to achieve optimal coding performance. Finally, point cloud attributes are efficiently compressed with the optimized graph transform. Experimental results show the proposed scheme outperforms the state of the art with significant BD-BR gains, surpassing G-PCC by 31.02%, 30.71%, and 32.14% in BD-BR gains for Y, U, and V components, respectively. Subjective evaluation of the attribute reconstruction quality further validates the superiority of our scheme.
... In order to fully exploit intrinsic temporal correlations for compact representation, we propose optimal inter-prediction and predictive transform coding with refined motion estimation for attributes of dynamic point clouds. Firstly, assuming the Gaussian Markov Random Fields (GMRF) model [20] and a spatio-temporal graph representation for dynamic point clouds, we derive optimal inter-prediction and predictive transform coding for the prediction residual, which depends on the precision matrix in the GMRF model similar to [21]. However, it is often complicated to estimate the precision matrix statistically. ...
... GFT is a content-adaptive linear transform and has been shown to be superior in compressing certain types of signals, e.g. mesh geometry [36], depth maps [37] [38], and images/videos [21,39]. ...
... Given the reference set x t−1 , the inter-prediction problem is essentially predicting x t from x t−1 . As discussed in [21], the optimal inter-prediction is the conditional expectation µ xt|xt−1 of x t given x t−1 under the GMRF model. Any other predictor will yield non-zero expected prediction error. ...
Preprint
Full-text available
As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as tele-presence, navigation for autonomous driving and heritage reconstruction. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. We thus propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model for attributes of dynamic point clouds, where the optimal predictive transform proves to be the Generalized Graph Fourier Transform (GGFT). Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of point clouds. Finally, we construct a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained $\lambda$-Q model. Experimental results show that we achieve 13.4% bitrate reduction on average and up to 25.8% over the state-of-the-art Region-adaptive Hierarchical Transform method.
... In particular, spectral graph theory has been recently bridged with signal processing, where the graph is used to model local relations between signal samples [57,60]. As an example, graph-based signal processing is emerging as a novel approach in the design of energy compacting image transformations [27,28,39,64,70]. ...
... Our proposed graph-based approach is founded on two recent findings: First, Zhang and Florêncio [70] have shown that a Laplacian model can be used as an estimation of the precision matrix Q of an image, under the assumption that the image follows a Gaussian Markov random field (GMRF) model. This amounts to using a function of the partial correlation between nodes as graph weights. ...
... Although this formulation might look similar to the one of RXD given in (4), some important differences have to be noted. First, the model used is not the inverse of the covariance matrix C −1 , but an arbitrary Laplacian model; this is a generalization over RXD, because if the image follows a GMRF model, then a Laplacian can be constructed to estimate the precision matrix [70], but if this is not the case a Laplacian model can be computed according to any knowledge of the domain. Second, the Laplacian matrix can be used to capture both spatial and spectral characteristics as we will detail in Sect. ...
Article
Full-text available
Reed–Xiaoli detector (RXD) is recognized as the benchmark algorithm for image anomaly detection; however, it presents known limitations, namely the dependence over the image following a multivariate Gaussian model, the estimation and inversion of a high-dimensional covariance matrix, and the inability to effectively include spatial awareness in its evaluation. In this work, a novel graph-based solution to the image anomaly detection problem is proposed; leveraging the graph Fourier transform, we are able to overcome some of RXD’s limitations while reducing computational cost at the same time. Tests over both hyperspectral and medical images, using both synthetic and real anomalies, prove the proposed technique is able to obtain significant gains over performance by other algorithms in the state of the art.
... Graphs have indeed been shown to be useful tools to describe intrinsic image structures, hence to define the supports of de-correlating transforms. Fourier-like transforms, called graph Fourier transform (GFT) [1] and many variants [2], [3], [4], [5], [6], [7] have been shown to be powerful tools for coding piecewise smooth and natural 2D images. An interesting review can be found in [8]. ...
... By encoding pixel similarities into the weights associated to edges, the undirected graph encodes the image structure. A Fourier-like transform for graph signals called graph Fourier transform (GFT) [1] and many variants [2], [3], [4], [5], [6], [7] have been used as adaptive transforms for coding piecewise smooth and natural images. A spectrum of graph frequencies can be defined through the eigen-decomposition of the graph Laplacian matrix L defined as L = D − A, where D is a diagonal degree matrix whose i th diagonal element D ii is equal to the sum of the weights of all edges incident to node i. ...
Article
Graph-based transforms have been shown to be powerful tools in terms of image energy compaction. However, when the size of the support increases to best capture signal dependencies, the computation of the basis functions becomes rapidly untractable. This problem is in particular compelling for high dimensional imaging data such as light fields. The use of local transforms with limited supports is a way to cope with this computational difficulty. Unfortunately, the locality of the support may not allow us to fully exploit long term signal dependencies present in both the spatial and angular dimensions of light fields. This paper describes sampling and prediction schemes with local graph-based transforms enabling to efficiently compact the signal energy and exploit dependencies beyond the local graph support. The proposed approach is investigated and is shown to be very efficient in the context of spatio-angular transforms for quasi-lossless compression of light fields.
... Thus, an important attribute of the GFT is related to its flexibility, since one can decide on the degree of accuracy with which image structures are represented on the graph [23]. It has also been shown [24] that not only does the GFT approximate the KLT for a piece-wise first-order autoregressive process but also that the GFT optimally decorrelates images following a Gauss-Markov random field model [25]. Consequently, ...
... While a more complex integration of post-transform encoder parts would be possible, due to the adaptive nature of the biorthogonal graph filterbanks, it is expected that these transforms perform competitively, regardless of the probabilistic characteristics of the pre-transform data. For theoretical result on transform performance, readers can see Reference [25], where it is proven that the GFT optimally decorrelates images following a Gauss-Markov random field model. ...
Article
Full-text available
Hyperspectral images are depictions of scenes represented across many bands of the electromagnetic spectrum. The large size of these images as well as their unique structure requires the need for specialized data compression algorithms. The redundancies found between consecutive spectral components and within components themselves favor algorithms that exploit their particular structure. One novel technique with applications to hyperspectral compression is the use of spectral graph filterbanks such as the GraphBior transform, that leads to competitive results. Such existing graph based filterbank transforms do not yield integer coefficients, making them appropriate only for lossy image compression schemes. We propose here two integer-to-integer transforms that are used in the biorthogonal graph filterbanks for the purpose of the lossless compression of hyperspectral scenes. Firstly, by applying a Triangular Elementary Rectangular Matrix decomposition on GraphBior filters and secondly by adding rounding operations to the spectral graph lifting filters. We examine the merit of our contribution by testing its performance as a spatial transform on a corpus of hyperspectral images; and share our findings through a report and analysis of our results.
... The graph structure estimation of a GMRF model naturally boils down to the estimation of the precision matrix (inverse covariance matrix) by means of maximum likelihood estimation. As is pointed out in the literature, the precision matrix is popularly structured as a graph Laplacian [5,6] and the corresponding GMRF models are named Laplacian GMRF models. A graph Laplacian is a positive semidefinite (PSD) matrix with non-positive off-diagonal entries and a zero rowsum [7]: ...
... The penalty term vec (Θ) 1 promotes elementwise sparsity in Θ for the sake of data interpretability and avoiding potential singularity issues [13]. After these two pioneering works, Friedman et al. [14] came up with an efficient computational method to solve (5) and proposed the well-known GLasso algorithm, which is a coordinate descent procedure by nature. ...
Article
Full-text available
In this paper, we study the graph Laplacian estimation problem under a given connectivity topology. We aim at enriching the unified graph learning framework proposed by Egilmez et al. and improve the optimality performance of the Combinatorial Graph Laplacian (CGL) case. We apply the wellknown Alternating Direction Method of Multipliers (ADMM) and Majorization-Minimization (MM) algorithmic frameworks and propose two algorithms, namely, GLE-ADMM and GLE-MM, for graph Laplacian estimation. Both algorithms can achieve an optimality gap as low as 10􀀀4, around three orders of magnitude more accurate than the benchmark. In addition, we find that GLE-ADMM is more computationally efficient in a dense topology (e.g., an almost complete graph) while GLE-MM is more suitable for sparse graphs (e.g., trees). Furthermore, we consider exploiting the leading eigenvectors of the sample covariance matrix as a nominal eigensubspace and propose a third algorithm named GLENE, which is also based on ADMM. Numerical experiments show that the inclusion of a nominal eigensubspace significantly improves the estimation of the graph Laplacian when the sample size is smaller than or comparable to the problem dimension.
... The graph can then be used as a support for defining and computing de-correlation transforms, which is a critical step in image compression. Fourier-like transforms, called graph Fourier transform (GFT) [1] and many variants [2,3,4,5,6,7] have been shown to be powerful tools for coding piecewise smooth and natural 2D images. An interesting review can be found in [8]. ...
... By encoding pixel similarities into the weights associated to edges, the undirected graph encodes the image structure. A Fourier-like transform for graph signals called graph Fourier transform (GFT) [1] and many variants [2,3,4,5,6,7] have been used as adaptive transforms for coding piecewise smooth and natural images. ...
Preprint
Graph-based transforms have been shown to be powerful tools in terms of image energy compaction. However, when the support increases to best capture signal dependencies, the computation of the basis functions becomes rapidly untractable. This problem is in particular compelling for high dimensional imaging data such as light fields. The use of local transforms with limited supports is a way to cope with this computational difficulty. Unfortunately, the locality of the support may not allow us to fully exploit long term signal dependencies present in both the spatial and angular dimensions in the case of light fields. This paper describes sampling and prediction schemes with local graph-based transforms enabling to efficiently compact the signal energy and exploit dependencies beyond the local graph support. The proposed approach is investigated and is shown to be very efficient in the context of spatio-angular transforms for quasi-lossless compression of light fields.
... GFT is a content-adaptive linear transform and has been shown to be very useful in compressing certain types of signals, e.g. mesh geometry [9], depth maps [10] [11], and other images/videos [12,13,14]. ...
... Herein, RAHT is a wavelet-based method. We take the 1D DCT method as a baseline, since 1D DCT is a special case of GFT, as analyzed in [12]. The MP3DG-PCC is a widely adopted open-source point cloud compression API introduced by 3D Graphics (3DG) group of MPEG. ...
... Some of these algorithms include Transform Coding, Predictive Coding, Wavelet Coding, Vector Quantization, and Fractal Compression. Predictive coding [24], [25], a spatial domain technique, removes redundancy within an image by de-correlating similar neighboring pixels. Transform coding [26]- [29] uses reversible linear transform coefficients to compress images. ...
... The method is distinguished from the others in computing the inverse of the covariance matrix using different techniques. These techniques involve Laplacian matrix construction together Cauchy function which can estimate precision matrix [26]. It is also used to capture both spatial and spectral features of the data [27,28]. ...
Article
Full-text available
Anomaly detection techniques have been widely studied by researchers to locate targets that stand out from their backgrounds. Low-rank and sparse matrix decomposition model (LSDM), on the other hand, is an encouraging method to take advantage of the low-rank property of hyperspectral images and extract information from both background and anomalies. In this work, a hybrid anomaly detection method is proposed by fusing LSDM model with the Laplacian matrix to distinguish anomalies effectively. The proposed method steps are twofold. First, the high dimensional data is decomposed as low-rank and sparse matrices by robust subspace recovery algorithm as preprocessing step. After the decomposition process, Mahalanobis distance is applied to the sparse part of the data. Different than previous studies, the inverse of the covariance matrix is computed by the Laplacian matrix. The proposed approach achieves the best detection results, according to the experimental findings. The superiority of the proposed algorithm is highlighted by comparing the state-of-the-art algorithms based on four hyperspectral images.
... Therefore, the low order assumption from Proposition 3 may only be kept if the topology of these pixels is arranged to be close anyhow. This can for example be achieved by using graph-based models [107], which are not considered here, so larger neighborhoods must be assumed. A homogeneously textured region typically belongs to a single object such that motion of this region is also homogeneous. ...
Thesis
Full-text available
The analysis of clinical use cases suggests that one of the most important aspects in storing and transmitting medical image data still consists in compression with its focus on time-efficiently reducing storage demand while maintaining image quality. Current appliances typically perform compression using common signal processing codecs like described in JPEG or MPEG/ITU-T standards. As such codecs have been designed mostly for compression of camera-captured natural scenes, efficiency can be improved by exploitation of deviating image treatment and image characteristics in medicine. Using a bottom-up approach, this thesis introduces novel techniques both for pixel-wise prediction, mostly used in lossless and high-quality scenarios, as well as for frame-wise prediction, most useful in medium-quality scenarios. Because of diverse image characteristics across modalities, focus is restricted to computed tomography and for frame prediction in particular to dynamic 3-D+t cardiac acquisitions. Apart from traditional approaches, foremostly numerical optimization like linear and nonlinear least-squares but also discrete algorithms are utilized in order to find context region, mean, and variance of predictions, determine local image structures, weight rate versus distortion, identify motion between frames, remove noise from predictors, etc. Backward-adaptive autoregression approaches are thoroughly compared and extended to 3-D images, adaptive context selection and boundary treatment, closed probability distribution estimation, and lots of other procedures in order to make them usable within real codecs like a massively parallel implementation on GPUs or the presented Open Source framework Vanilc. Vanilc's compression ratio is shown to beat all algorithms from literature with implementations available for comparison the author is aware of. Next to alternative developments intended for use with small contexts like Burg or EMP predictors, also a lossy application has been designed, outperforming on the one hand established codecs like HM or VTM at high qualities, while featuring on the other hand a noise removing behavior that in reality even enhances image quality as proven by phantom reconstruction simulations. For medium-quality image compression three deformation compensation methods are proposed to replace block-based compensation in dynamic data evincing heart movements. One of them models physiological 3-D muscle contractions and is again realized in Nvidia CUDA. Together with Vanilc compression of motion information and frequency-filtered combination with axially preceding slices, it surpasses the rate-distortion performance of modern inter predictors like the one realized in HM. A harmonized deformation inversion algorithm for applications like motion-compensated temporal filtering or intermediate image interpolation completes the thesis.
... If conditions (46) and (47) are true, then T a QT a = Q. To show the converse, we use the same strategy used to prove Theorem 5, where we right multiply by P γ and P 2−γ . ...
Preprint
Full-text available
We study the design of filter banks for signals defined on the nodes of graphs. We propose novel two channel filter banks, that can be applied to arbitrary graphs, given a positive semi definite variation operator, while using downsampling operators on arbitrary vertex partitions. The proposed filter banks also satisfy several desirable properties, including perfect reconstruction, and critical sampling, while having efficient implementations. Our results generalize previous approaches only valid for the normalized Laplacian of bipartite graphs. We consider graph Fourier transforms (GFTs) given by the generalized eigenvectors of the variation operator. This GFT basis is orthogonal in an alternative inner product space, which depends on the choices of downsampling sets and variation operators. We show that the spectral folding property of the normalized Laplacian of bipartite graphs, at the core of bipartite filter bank theory, can be generalized for the proposed GFT if the inner product matrix is chosen properly. We give a probabilistic interpretation to the proposed filter banks using Gaussian graphical models. We also study orthogonality properties of tree structured filter banks, and propose a vertex partition algorithm for downsampling. We show that the proposed filter banks can be implemented efficiently on 3D point clouds, with hundreds of thousands of points (nodes), while also improving the color signal representation quality over competing state of the art approaches.
... Assuming that the point cloud is octree-decomposed, each occupied leaf node can be interpreted as a representative voxel as illustrated in Fig Here, represents the -th node of the graph, and is the weight value connecting the -th and -th nodes. The color signal defined on the constructed graph nodes can be transformed with the help of a precision matrix (Cha Zhang & Florê ncio, 2013) and further entropy coded. Although the graph transform which is equivalent to Karhunen-Loeve Transform (KLT) in (Cha Zhang, Florencio, and Loop 2014) has better performance than traditional DCT (Discrete Cosine Transform) methods, it creates isolated sub-graphs when coding sparse point clouds. ...
Thesis
With the rapid growth of multimedia content, 3D objects are becoming more and more popular. Most of the time, they are modeled as complex polygonal meshes or dense point clouds, providing immersive experiences in different industrial and consumer multimedia applications. The point cloud, which is easier to acquire than mesh and is widely applicable, has raised many interests in both the academic and commercial worlds.A point cloud is a set of points with different properties such as their geometrical locations and the associated attributes (e.g., color, material properties, etc.). The number of the points within a point cloud can range from a thousand, to constitute simple 3D objects, up to billions, to realistically represent complex 3D scenes. Such huge amounts of data bring great technological challenges in terms of transmission, processing, and storage of point clouds.In recent years, numerous research works focused their efforts on the compression of meshes, while less was addressed for point clouds. We have identified two main approaches in the literature: a purely geometric one based on octree decomposition, and a hybrid one based on both geometry and video coding. The first approach can provide accurate 3D geometry information but contains weak temporal consistency. The second one can efficiently remove the temporal redundancy yet a decrease of geometrical precision can be observed after the projection. Thus, the tradeoff between compression efficiency and accurate prediction needs to be optimized.We focused on exploring the temporal correlations between dynamic dense point clouds. We proposed different approaches to improve the compression performance of the MPEG (Moving Picture Experts Group) V-PCC (Video-based Point Cloud Compression) test model, which provides state-of-the-art compression on dynamic dense point clouds.First, an octree-based adaptive segmentation is proposed to cluster the points with different motion amplitudes into 3D cubes. Then, motion estimation is applied to these cubes using affine transformation. Gains in terms of rate-distortion (RD) performance have been observed in sequences with relatively low motion amplitudes. However, the cost of building an octree for the dense point cloud remains expensive while the resulting octree structures contain poor temporal consistency for the sequences with higher motion amplitudes.An anatomical structure is then proposed to model the motion of the point clouds representing humanoids more inherently. With the help of 2D pose estimation tools, the motion is estimated from 14 anatomical segments using affine transformation.Moreover, we propose a novel solution for color prediction and discuss the residual coding from prediction. It is shown that instead of encoding redundant texture information, it is more valuable to code the residuals, which leads to a better RD performance.Although our contributions have improved the performances of the V-PCC test models, the temporal compression of dynamic point clouds remains a highly challenging task. Due to the limitations of the current acquisition technology, the acquired point clouds can be noisy in both geometry and attribute domains, which makes it challenging to achieve accurate motion estimation. In future studies, the technologies used for 3D meshes may be exploited and adapted to provide temporal-consistent connectivity information between dynamic 3D point clouds.
... For instance, the combinatorial graph Laplacian corresponds to the precision matrix of an attractive, DC-intrinsic GMRF. Further, as the eigenvectors of the precision matrix (the inverse of the covariance matrix) constitute the basis of the KLT, the GFT approximates the KLT under a family of statistical processes, as proved in different ways in [10], [29]- [31]. This indicates the GFT is approximately the optimal linear transform for signal decorrelation, which is beneficial to the compression of geometric data as will be discussed in Section IV-C2. ...
Article
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.
... when an underlying model is a form of the Gaussian Markov Random Field model [21]. By L and related underlying graph G, we basically model a reversed Karhunen-Loeve stochastic process. ...
Preprint
Full-text available
Determining the number of groups or dimension of a feature space related to an initial dataset projected to null-space of the Laplace-Beltrami-type operators is a fundamental problem of applications exploiting a spectral clustering techniques. This paper theoretically focuses on generalizing and providing minor comments to a previous work by Bruneau et al., who proposed modification of the Bartlett test that is commonly used in the principal component analysis, to estimate the number of groups related to normalized spectral clustering approaches. The generalization is based on a relation between the distributions of the spectrum associated with a covariance matrix and graph Laplacian, which allows us to use the modified Bartlett test for unnormalized spectral clustering as well. Other comments follow previous works by Lawley and James, which allow us testing subsets of eigenvalues by involving likelihood ratio statistic and linkage factors. Solving issues arising from limits of floating-point arithmetic are demonstrated on benchmarks employing spectral clustering for 2-phase volumetric image segmentation. On a same problem, analysing spectral clustering in divide-merge settings is presented.
... when an underlying model is a form of the Gaussian Markov Random Field model [21]. By L and related underlying graph G, we basically model a reversed Karhunen-Loeve stochastic process. ...
Preprint
Full-text available
Determining the number of groups or dimension of a feature space related to an initial dataset projected to null-space of the Laplace-Beltrami-type operators is a fundamental problem of applications exploiting a spectral clustering techniques. This paper theoretically focuses on generalizing and providing minor comments to a previous work by Bruneau et al., who proposed modification of the Bartlett test that is commonly used in the principal component analysis, to estimate the number of groups related to normalized spectral clustering approaches. The generalization is based on a relation between the distributions of the spectrum associated with a covariance matrix and graph Laplacian, which allows us to use the modified Bartlett test for unnormalized spectral clustering as well. Other comments follow previous works by Lawley and James, which allow us testing subsets of eigenvalues by involving likelihood ratio statistic and linkage factors. Solving issues arising from limits of floating-point arithmetic are demonstrated on benchmarks employing spectral clustering for $2$-phase volumetric image segmentation. On a same problem, analysing spectral clustering in divide-merge settings is presented.
... The connectivity between adjacent nodes in both spatial and spectral domains can be established, as shown in Figure 2b. Regarding to the weight determination, partial correlation [39] might be used as the weight under the assumption that the image follows a Gaussian Markov random field (GMRF) model. However, this approach still needs to compute the estimate and inversion of the covariance matrix, which has an expensive compute cost. ...
Article
Full-text available
The accuracy of anomaly detection in hyperspectral images (HSIs) faces great challenges due to the high dimensionality, redundancy of data, and correlation of spectral bands. In this paper, to further improve the detection accuracy, we propose a novel anomaly detection method based on texture feature extraction and a graph dictionary-based low rank decomposition (LRD). First, instead of using traditional clustering methods for the dictionary, the proposed method employs the graph theory and designs a graph Laplacian matrix-based dictionary for LRD. The robust information of the background matrix in the LRD model is retained, and both the low rank matrix and the sparse matrix are well separated while preserving the correlation of background pixels. To further improve the detection performance, we explore and extract texture features from HSIs and integrate with the low-rank model to obtain the sparse components by decomposition. The detection results from feature maps are generated in order to suppress background components similar to anomalies in the sparse matrix and increase the strength of real anomalies. Experiments were run on one synthetic dataset and three real datasets to evaluate the performance. The results show that the performance of the proposed method yields competitive results in terms of average area under the curve (AUC) for receiver operating characteristic (ROC), i.e., 0.9845, 0.9962, 0.9699, and 0.9900 for different datasets, respectively. Compared with seven other state-of-the-art algorithms, our method yielded the highest average AUC for ROC in all datasets.
... For instance, the combinatorial graph Laplacian corresponds to the precision matrix of an attractive, DC-intrinsic GMRF. Further, as the eigenvectors of the precision matrix (the inverse of the covariance matrix) constitute the basis of the KLT, the GFT approximates the KLT under a family of statistical processes, as proved from different ways in [10], [31]- [33]. This indicates the GFT is approximately the optimal linear transform for signal decorrelation, which is beneficial to the compression of geometric data as will be discussed in Section IV-C2. ...
Preprint
Full-text available
Geometric data acquired from real-world scenes, e.g., 2D depth images, 3D point clouds, and 4D dynamic point clouds, have found a wide range of applications including immersive telepresence, autonomous driving, surveillance, etc. Due to irregular sampling patterns of most geometric data, traditional image/video processing methodologies are limited, while Graph Signal Processing (GSP)---a fast-developing field in the signal processing community---enables processing signals that reside on irregular domains and plays a critical role in numerous applications of geometric data from low-level processing to high-level analysis. To further advance the research in this field, we provide the first timely and comprehensive overview of GSP methodologies for geometric data in a unified manner by bridging the connections between geometric data and graphs, among the various geometric data modalities, and with spectral/nodal graph filtering techniques. We also discuss the recently developed Graph Neural Networks (GNNs) and interpret the operation of these networks from the perspective of GSP. We conclude with a brief discussion of open problems and challenges.
... Unlike general GGM with a general positive definite precision matrix, the precision matrix in Laplacian constrained GGM enjoys the spectral property that its eigenvalues and eigenvectors can be interpreted as spectral frequencies and Fourier basis (Shuman et al., 2013), which is very useful in computing graph Fourier transform in graph signal processing Shuman et al., 2013), and graph convolutional networks (Bruna et al., 2014;Niepert et al., 2016;Ruiz et al., 2019). Dong et al. (2016); Egilmez et al. (2017); Gadde and Ortega (2015); Zhang and Florêncio (2012) formulated the graph signals as random variables under the Laplacian constrained GGM. The learned graph under Laplacian constrained GGM favours smooth graph signal representations (Dong et al., 2016), since the graph Laplacian quadratic term quantifies the smoothness of graph signals (Kalofolias, 2016;Kumar et al., 2020). ...
Preprint
Full-text available
We consider the problem of learning a sparse graph under Laplacian constrained Gaussian graphical models. This problem can be formulated as a penalized maximum likelihood estimation of the precision matrix under Laplacian structural constraints. Like in the classical graphical lasso problem, recent works made use of the $\ell_1$-norm regularization with the goal of promoting sparsity in Laplacian structural precision matrix estimation. However, we find that the widely used $\ell_1$-norm is not effective in imposing a sparse solution in this problem. Through empirical evidence, we observe that the number of nonzero graph weights grows with the increase of the regularization parameter. From a theoretical perspective, we prove that a large regularization parameter will surprisingly lead to a fully connected graph. To address this issue, we propose a nonconvex estimation method by solving a sequence of weighted $\ell_1$-norm penalized sub-problems and prove that the statistical error of the proposed estimator matches the minimax lower bound. To solve each sub-problem, we develop a projected gradient descent algorithm that enjoys a linear convergence rate. Numerical experiments involving synthetic and real-world data sets from the recent COVID-19 pandemic and financial stock markets demonstrate the effectiveness of the proposed method. An open source $\mathsf{R}$ package containing the code for all the experiments is available at https://github.com/mirca/sparseGraph.
... Most image and video compression systems make use of transform coding, where correlations among pixels can be exploited in order to concentrate most signal energy in a few frequencies. The widely used discrete cosine transform (DCT) [1] has been shown to achieve optimal decorrelation when when pixel data can be modeled as a Markov random field with high correlation [2]. In recent years, graph signal processing (GSP) tools [3,4,5] have been applied to image and video coding to enhance coding efficiency [6,7,8]. ...
Preprint
In image and video coding applications, distortion has been traditionally measured using mean square error (MSE), which suggests the use of orthogonal transforms, such as the discrete cosine transform (DCT). Perceptual metrics such as Structural Similarity (SSIM) are typically used after encoding, but not tied to the encoding process. In this paper, we consider an alternative framework where the goal is to optimize a weighted MSE metric, where different weights can be assigned to each pixel so as to reflect their relative importance in terms of perceptual image quality. For this purpose, we propose a novel transform coding scheme based on irregularity-aware graph Fourier transform (IAGFT), where the induced IAGFT is orthogonal, but the orthogonality is defined with respect to an inner product corresponding to the weighted MSE. We propose to use weights derived from local variances of the input image, such that the weighted MSE aligns with SSIM. In this way, the associated IAGFT can achieve a coding efficiency improvement in SSIM with respect to conventional transform coding based on DCT. Our experimental results show a compression gain in terms of multi-scale SSIM on test images.
... To reduce the amount of overheads, the existing studies [13], [14] use a fitting function based on Gaussian Markov random field (GMRF) [15], [16] model to approximate the power information of linearly-transformed video signals only with few parameters. Specifically, [13] realizes overhead reduction in soft delivery of single-view video using a GMRF-based fitting function. ...
Conference Paper
Full-text available
Soft delivery, i.e., analog transmission, has been proposed to provide graceful video/image quality even in unstable wireless channels. However, existing analog schemes require a significant amount of metadata for power allocation and decoding operations. It causes large overheads and quality degradation due to rate and power losses. Although the amount of overheads can be reduced by introducing Gaussian Markov random field (GMRF) model, the model mismatch can degrade reconstruction quality. In this paper, we propose a novel analog transmission scheme to simultaneously reduce the overheads and yield better reconstruction quality. The proposed scheme uses a deep neural network (DNN) for metadata compression and decompression. Specifically, the metadata is compressed into few variables using the proposed DNN-based metadata encoder before transmission. The variables are then transmitted and decompressed at the receiver for high-quality video/image reconstruction. Evaluations using test images demonstrate that our proposed scheme reduces overheads by 80.0 % with 11.2 dB improvement of reconstruction quality compared to the existing analog transmission schemes.
... Intuitive models are convenient to analyze since usually such graphs are not highly connected thus the precision matrix is sparse. The very [114]. The 1D-ADST is a Graph Based Transform derived from the generalized graph Laplacian L g of a line graph whose weights are all equal to w u and having a single self-loop on the first sample with the same weight w u [32]. ...
Thesis
Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.
... In the literature, there are a few studies on model-based transform designs for image and video coding. In [25], [26], the authors present a graph-based probabilistic framework for predictive video coding and use that to justify the optimality of DCT, yet optimal graph/transform design is out of their scope. In our previous work [27], we present a comparison of various instances of different graph learning problems for nonseparable image modeling. ...
Preprint
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of transforms called graph-based transforms (GBTs) for video compression, and proposes two different techniques to design GBTs. In the first technique, we formulate an optimization problem to learn graphs from data and provide solutions for optimal separable and nonseparable GBT designs, called GL-GBTs. The optimality of the proposed GL-GBTs is also theoretically analyzed based on Gaussian-Markov random field (GMRF) models for intra and inter predicted block signals. The second technique develops edge-adaptive GBTs (EA-GBTs) in order to flexibly adapt transforms to block signals with image edges (discontinuities). The advantages of EA-GBTs are both theoretically and empirically demonstrated. Our experimental results demonstrate that the proposed transforms can significantly outperform the traditional Karhunen-Loeve transform (KLT).
... This makes it possible to construct a set of graphs on small neighborhoods of the point cloud. The graph transform [Zhang and Florencio 2013], which is equivalent to Karhunen-Loeve Transform (KLT), is finally applied in order to decorrelate the geometry signal. However, the transform performance highly depends on the topological information of the resulting graphs. ...
Conference Paper
Full-text available
In recent years, 3D point clouds have enjoyed a great popularity for representing both static and dynamic 3D objects. When compared to 3D meshes, they offer the advantage of providing a simpler, denser and more close-to-reality representation. However, point clouds always carry a huge amount of data. For a typical example of a point cloud with 0.7 million points per 3D frame at 30 fps, the point cloud raw video needs a bandwidth around 500MB/s. Thus, efficient compression methods are mandatory for ensuring the storage/transmission of such data, which include both geometry and attribute information. In the last years, the issue of 3D point cloud compression (3D-PCC) has emerged as a new field of research. In addition, an ISO/MPEG standardization process on 3D-PCC is currently on-going. In this paper, a comprehensive overview of the 3D-PCC state-of-the-art methods is proposed. Different families of approaches are identified, described in details and summarized, including 1D traversal compression, 2D-oriented techniques, which take leverage of existing 2D image/video compression technologies and finally purely 3D approaches, based on a direct analysis of the 3D data.
... For example, pixel data can be modeled by a 4-connected grid, where all nodes are connected to their 4 immediate neighbors only. When this grid is uniformly weighted, the 2D DCT is shown to be its GFT, and provides an optimal decorrelation of block data modeled by a 2D Gaussian Markov model [44]. In this paper, we focus on grids with nearly regular topologies (e.g., all internal nodes have the same number of neighbors) or particular symmetry properties. ...
Preprint
The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations with angle $\pi/4$) can be used to reduce GFT computation cost when the graph is bipartite or satisfies certain symmetry properties based on node pairing. We also propose a graph decomposition method based on graph topological symmetry, which allows us to identify and exploit butterfly structures in stages. This method is particularly useful for graphs that are nearly regular or have some specific structures, e.g., line graphs, cycle graphs, grid graphs, and human skeletal graphs. Though butterfly stages based on graph topological symmetry cannot be used for general graphs, they are useful in applications, including video compression and human action analysis, where symmetric graphs, such as symmetric line graphs and human skeletal graphs, are used. Our proposed fast GFT implementations are shown to reduce computation costs significantly, in terms of both number of operations and empirical runtimes.
Chapter
The essence of transform coding is to transform signals from one domain (e.g., time domain) to another domain (e.g., frequency domain) with a set of orthogonal basis. The transform is beneficial to eliminate signal correlation and reduce data redundancy. In this chapter, we will introduce some important transforms, including the commonly used discrete cosine transform (DCT), the wavelet transform, and the graph Fourier transform (GFT). We also show several applications of transform-based methods in point cloud attribute compression.
Article
With the increasing attention in various 3D safety-critical applications, point cloud learning models have been shown to be vulnerable to adversarial attacks. Although existing 3D attack methods achieve high success rates, they delve into the data space with point-wise perturbation, which may neglect the geometric characteristics. Instead, we propose point cloud attacks from a new perspective—the graph spectral domain attack, aiming to perturb graph transform coefficients in the spectral domain that correspond to varying certain geometric structures. Specifically, leveraging on graph signal processing, we first adaptively transform the coordinates of points onto the spectral domain via graph Fourier transform (GFT) for compact representation. Then, we analyze the influence of different spectral bands on the geometric structure, based on which we propose to perturb the GFT coefficients via a learnable graph spectral filter. Considering the low-frequency components mainly contribute to the rough shape of the 3D object, we further introduce a low-frequency constraint to limit perturbations within imperceptible high-frequency components. Finally, the adversarial point cloud is generated by transforming the perturbed spectral representation back to the data domain via the inverse GFT. Experimental results demonstrate the effectiveness of the proposed attack in terms of both the imperceptibility and attack success rates.
Article
Adaptive transform coding is gaining more and more attention for better mining of image content over fixed transforms such as discrete cosine transform (DCT). As a special case, graph transform learning establishes a novel paradigm for the graph-based transforms. However, there still exists a challenge for graph transform learning-based image codecs design on natural image compression, and graph representation cannot describe regular image samples well over graph-structured data. Therefore, in this paper, we propose a cross-channel graph-based transform (CCGBT) for natural color image compression. We observe that neighboring pixels having similar intensities should have similar values in the chroma channels, which means that the prominent structure of the luminance channel is related to the contours of the chrominance channels. A collaborative design of the learned graphs and their corresponding distinctive transforms lies in the assumption that a sufficiently small block can be considered smooth, meanwhile, guaranteeing the compression of the luma and chroma signals at the cost of a small overhead for coding the description of the designed luma graph. In addition, a color image compression framework based on the CCGBT is designed for comparing DCT on the classic JPEG codec. The proposed method benefits from its flexible transform block design on arbitrary sizes to exploit image content better than the fixed transform. The experimental results show that the unified graph-based transform outperforms conventional DCT, while close to discrete wavelet transform on JPEG2000 at high bit-rates.
Article
3-D point clouds facilitate 3-D visual applications with detailed information of objects and scenes but bring about enormous challenges to design efficient compression technologies. The irregular signal statistics and high-order geometric structures of 3-D point clouds cannot be fully exploited by existing sparse representation and deep learning based point cloud attribute compression schemes and graph dictionary learning paradigms. In this paper, we propose a novel $p$ -Laplacian embedding graph dictionary learning framework that jointly exploits the varying signal statistics and high-order geometric structures for 3-D point cloud attribute compression. The proposed framework formulates a nonconvex minimization constrained by $p$ -Laplacian embedding regularization to learn a graph dictionary varying smoothly along the high-order geometric structures. An efficient alternating optimization paradigm is developed by harnessing ADMM to solve the nonconvex minimization. To our best knowledge, this paper proposes the first graph dictionary learning framework for point cloud compression. Furthermore, we devise an efficient layered compression scheme that integrates the proposed framework to exploit the correlations of 3-D point clouds in a structured fashion. Experimental results demonstrate that the proposed framework is superior to state-of-the-art transform-based methods in $M$ -term approximation and point cloud attribute compression and outperforms recent MPEG G-PCC reference software.
Article
Spectral photoacoustic imaging (PAI) is a new technology that is able to provide 3D geometric structure associated with 1D wavelength-dependent absorption information of the interior of a target in a non-invasive manner. It has potentially broad applications in clinical and medical diagnosis. Unfortunately, the usability of spectral PAI is severely affected by a time-consuming data scanning process and complex noise. Therefore in this study, we propose a reliability-aware restoration framework to recover clean 4D data from incomplete and noisy observations. To the best of our knowledge, this is the first attempt for the 4D spectral PA data restoration problem that solves data completion and denoising simultaneously. We first present a sequence of analyses, including modeling of data reliability in the depth and spectral domains, developing an adaptive correlation graph, and analyzing local patch orientation. On the basis of these analyses, we explore global sparsity and local self-similarity for restoration. We demonstrated the effectiveness of our proposed approach through experiments on real data captured from patients, where our approach outperformed the state-of-the-art methods in both objective evaluation and subjective assessment.
Article
The worldwide commercialization of fifth generation (5G) wireless networks are pushing toward the deployment of immersive and high-quality VR-based telepresence systems. Among them, 3D object is generally digitized and represented as point cloud. However, realistically reconstructed 3D point clouds generally contain thousands up to millions of points, which brings a huge amount of data. Therefore, efficient compression of point cloud is an essential part to enable emerging immersive 3D visual communication. In point cloud compression, the graph transform is an effective tool to compact the energy of color signals on the voxels in the 3D space. However, as the eigenbasis of the graph transform is obtained from the graph Laplacian of the constructed graph, the corresponding eigenvalues will be related to the probability distributions of the transformed coefficients, which finally affect the coding efficiency of entropy coding for the quantized coefficients. To overcome the interdependence between graph transform and entropy coding, this paper proposes a jointly optimized graph transform and entropy coding scheme for compressing point clouds. Firstly, we modify the traditional graph Laplacian constructed on the geometry of the point clouds by multiplying a color signal-related matrix. Secondly, we theoretically devise the expected rate and distortion induced by quantization on the graph transformed coefficients. Finally, we propose a Lagrangian multiplier based algorithm to derive the optimum scaling matrix given a quantization parameter. Experimental results are presented to demonstrate that the proposed joint graph transform and entropy coding scheme can significantly outperform its transform coding based counterparts in compressing the color attribute of point clouds.
Article
Full-text available
Hyperspectral anomaly detection is an alluring topic in hyperspectral image processing. As one of the most famous hyperspectral anomaly detection algorithms, Reed-Xiaoli detector is widely studied since it is understandable and easy to implement. However, the estimation of inverse covariance matrix may be time-consuming and easily corrupted by the anomalies. To solve these problems, we propose a novel ensemble graph Laplacian-based anomaly detector which comprises two main steps. Firstly, a multiple random sampling strategy is applied to improve the detection accuracies and robustness. Secondly, we can obtain multiple detection results through a graph Laplacian-based solution, and these results are further fused through ensemble learning. Experimental results on one simulated and two real hyperspectral datasets demonstrate the superiority of the proposed method.
Article
Block-based compression scheme shows remarkable success in image and video coding. However, existing tree-type block partition methods usually divide point clouds into clusters with few or disjoint points due to the irregular sampling, which is adverse for the subsequent transform to exploit the local region correlation. Moreover, the widely-used optimal transform, e.g., Discrete Cosine Transform (DCT), is deduced with the assumption of the specified probability model. Thus these transforms cannot adequately conform to diverse statistical characteristics of point cloud attributes. To address the above two problems, we propose a block-adaptive codec with the optimized transform for point cloud attribute compression, including Progressive Clustering (PC) for block partition and Region-Aware Signal Modeling (RASM) for transform. PC is designed via the split-and-merge strategy. In the splitting phase, over-segmented clusters are obtained by iteratively searching the nearest neighbors to maintain the spatial continuity of points in one cluster. In the merging step, blocks are determined by merging over-segmented clusters under the guidance of minimizing texture complexity, which adjusts block sizes to adapt to texture variation. RASM adaptively captures the color correlations to respond to diverse statistical characteristics by optimizing the overall Rate-Distortion (RD) cost. Then, the optimal transform bases are obtained via the eigen-decomposition of the color correlation representation with respect to the Gaussian Markov Random Field, where input colors are obtained considering the linear relationship between geometry and color variations. The linear coefficients also need to be encoded to combine geometry to reconstruct the transform bases at the decoder in the same way as the encoder. Particularly, we accelerate the RASM by constraining the independent relationship between signals in the local region, which hardly influences RD performance. Extensive experiments not only indicate the effectiveness of PC and RASM but also show the significant RD gains achieved by our approach compared with several state-of-the-art platforms.
Article
Hyperspectral target detection in complex backgrounds is a challenging and important research topic in the remote sensing field. Traditional target detectors consider the background spectrum to obey a Gaussian distribution. However, this distribution may not meet the requirements in real hyperspectral images. In addition, the background and spatial information of most existing target detection algorithms are rarely fully utilized. Therefore, a new weighted Cauchy distance graph (WCDG) and local adaptive collaborative representation detection (CGCRD) is proposed. First, a WCDG similarity measure is designed. In order to adjust the effect of target pixels on the graph model, a weighted Cauchy distance Laplace matrix is constructed, and then the matrix is applied to the matched filter detector. Second, local adaptive collaborative representation strategy is developed. The penalty coefficient is weighted by the local spatial Euclidean distance combined with the Pearson correlation coefficient, and then the detection result is obtained based on the residual. Finally, aforementioned two strategies are fused to fully utilize the spatial and spectral information. A 176-band hyperspectral image (BIT-HSI-I) dataset is collected for the target detection task. The related algorithms are performed on the BIT-HSI-I dataset, and the detection results demonstrate that the proposed algorithm has better detection performance than other state-of-the-art algorithms.
Article
Octree (OT) geometry partitioning has been acknowledged as an efficient representation in state-of-the-art point cloud compression (PCC) schemes. In this work, an adaptive geometry partition and coding scheme is proposed to improve the OT based coding framework. First, quad-tree (QT) and binary-tree (BT) partitions are introduced as alternative geometry partition modes for the first time under the context of OT-based point cloud compression. The adaptive geometry partition scheme enables flexible three-dimensional (3D) space representations and higher coding efficiency. However, exhaustive searching for the optimal partition from all possible combinations of OT, QT and BT is impractical because the entire search space could be huge. Therefore, two hyper-parameters are introduced to specify the conditions on which QT and BT partitions will be applied. Once the two parameters are determined, the partition mode can be derived according to the geometry shape of current coding node. To investigate the impact of different partition combinations on the coding gains, we conduct thorough mathematical and experimental analyses. Based on the analyses, an adaptive parameter selection scheme is presented to optimize the coding efficiency adaptively, where multi-resolution features are extracted from the partition pyramid and a decision tree model is trained for the optimal hyper-parameters. The proposed adaptive geometry partition scheme has shown significant coding gains, and it has been adopted in the state-of-the-art MPEG Geometry based PCC (G-PCC) standard. For the sparser point clouds, the bit savings are up to 10.8% and 3.5% for lossy and lossless geometry coding without significant complexity increment.
Chapter
This chapter reviews well‐established solutions to the problem of graph learning that adopt a statistical or physical perspective. The graph learning problem may consist of finding the optimal weights of the edges such that the resulting graph‐based transforms, having been adapted to the actual image structure, may lead to efficient transform coding of the image. The chapter examines a series of recent GSP‐based approaches and shows how signal processing tools and concepts can be utilized to provide novel solutions to the graph learning problem. The smoothness property of the graph signal is associated with a multivariate Gaussian distribution, which also underlies the idea of classical approaches for learning graphical models, such as the graphical Lasso. Image processing can benefit significantly from graph learning technique. The chapter discusses some general directions for future work by focusing more on graph inference for image processing applications.
Article
Full-text available
Image transforms are necessary for image and video compression. Analytic transforms are powerful in compacting natural signals for wider exploitation. Various methods have been introduced to represent such data as a small number of bases, and several of these methods use machine learning, usually based on sparse coding, to outperform analytic transforms. They show sufficient data compaction abilities. However, these methods focus only on data compaction and reconstruction performance, without considering computational issues during implementation. We introduce a new framework for a more efficient transform based on a two-dimensional discrete cosine transform (DCT) and its characteristics. We aimed to improve the data compaction ability of transforms to levels better or similar to that of the DCT and other data-driven transforms, with fast and efficient implementation. We focused on the properties of the DCT, including horizontal and vertical directional information, and approximated its direction using the transform. Our framework was designed by rotating some of the DCT bases to fit this direction. As expected, our framework achieves a transform design with minimized computation for efficient implementation. It does not require an iterative algorithm or brute-force methods to find the best transform matrix or other parameters, thereby making it much faster than other methods. Our framework is 10 times faster than the steerable DCT (SDCT) and twice as fast as the eight-level SDCT with minimum performance reduction. Experimental validation with various images indicates that the proposed method sufficiently approaches the performance of the other transforms despite faster implementation.
Article
In this article, a survey of the point cloud compression (PCC) methods by organizing them with respect to the data structure, coding representation space, and prediction strategies is presented. Two paramount families of approaches reported in the literature--the projection- and octree-based methods--are proven to be efficient for encoding dense and sparse point clouds, respectively. These approaches are the pillars on which the Moving Picture Experts Group Committee developed two PCC standards published as final international standards in 2020 and early 2021, respectively, under the names: video-based PCC and geometry-based PCC. After surveying the current approaches for PCC, the technologies underlying the two standards are described in detail from an encoder perspective, providing guidance for potential standard implementors. In addition, experiment evaluations in terms of compression performances for both solutions are provided.
Article
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of transforms called graph-based transforms (GBTs) for video compression, and proposes two different techniques to design GBTs. In the first technique, we formulate an optimization problem to learn graphs from data and provide solutions for optimal separable and nonseparable GBT designs, called GL-GBTs. The optimality of the proposed GL-GBTs is also theoretically analyzed based on Gaussian-Markov random field (GMRF) models for intra and inter predicted block signals. The second technique develops edge-adaptive GBTs (EA-GBTs) in order to flexibly adapt transforms to block signals with image edges (discontinuities). The advantages of EA-GBTs are both theoretically and empirically demonstrated. Our experimental results show that the proposed transforms can significantly outperform the traditional Karhunen-Loeve transform (KLT).
Conference Paper
Graph learning is one of the most important tasks in machine learning, statistics and signal processing. In this paper, we focus on the problem of learning the generalized graph Laplacian (GGL) and propose an efficient algorithm to solve it. We first fully exploit the sparsity structure hidden in the objective function by utilizing soft-thresholding technique to transform the GGL problem into an equivalent problem. Moreover, we propose a fast proximal point algorithm (PPA) to solve the transformed GGL problem and establish the linear convergence rate of our algorithm. Extensive numerical experiments on both synthetic data and real data demonstrate that the soft-thresholding technique accelerates our PPA method and PPA can outperform the current state-of-the-art method in terms of speed.
Article
In this paper, we propose a new graph-based transform and illustrate its potential application to signal compression. Our approach relies on the careful design of a graph that optimizes the overall rate-distortion performance through an effective graph-based transform. We introduce a novel graph estimation algorithm, which uncovers the connectivities between the graph signal values by taking into consideration the coding of both the signal and the graph topology in rate-distortion terms. In particular, we introduce a novel coding solution for the graph by treating the edge weights as another graph signal that lies on the dual graph. Then, the cost of the graph description is introduced in the optimization problem by minimizing the sparsity of the coefficients of its graph Fourier transform (GFT) on the dual graph. In this way, we obtain a convex optimization problem whose solution defines an efficient transform coding strategy. The proposed technique is a general framework that can be applied to different types of signals, and we show two possible application fields, namely natural image coding and piecewise smooth image coding. The experimental results show that the proposed graph-based transform outperforms classical fixed transforms such as DCT for both natural and piecewise smooth images. In the case of depth map coding, the obtained results are even comparable to the state-of-the-art graph-based coding method, that are specifically designed for depth map images.
Article
The graph Fourier transform (GFT) is an important tool for graph signal processing, with applications ranging from graph-based image processing to spectral clustering. However, unlike the discrete Fourier transform, the GFT typically does not have a fast algorithm. In this work, we develop new approaches to accelerate the GFT computation. In particular, we show that Haar units (Givens rotations with angle $\pi/4$ ) can be used to reduce GFT computation cost when the graph is bipartite or satisfies certain symmetry properties based on node pairing. We also propose a graph decomposition method based on graph topological symmetry, which allows us to identify and exploit butterfly structures in stages. This method is particularly useful for graphs that are nearly regular or have some specific structures, e.g., line graphs, cycle graphs, grid graphs, and human skeletal graphs. Though butterfly stages based on graph topological symmetry cannot be used for general graphs, they are useful in applications, including video compression and human action analysis, where symmetric graphs, such as symmetric line graphs and human skeletal graphs, are used. Our proposed fast GFT implementations are shown to reduce computation costs significantly, in terms of both number of operations and empirical runtimes.
Conference Paper
Full-text available
The wavefront pattern captures the unfolding of a parallel computation in which data elements are laid out as a logical multidimensional grid and the dependency graph favours a diagonal sweep across the grid. In the emerging area of spectral graph analysis, the computing often consists in a wavefront running over a tiled matrix, involving expensive linear algebra kernels. While these applications might benefit from parallel heterogeneous platforms (multi-core with GPUs),programming wavefront applications directly with high-performance linear algebra libraries yields code that is complex to write and optimize for the specific application. We advocate a methodology based on two abstractions (linear algebra and parallel pattern-based run-time), that allows to develop portable, self-configuring, and easy-to-profile code on hybrid platforms.
Conference Paper
Full-text available
This paper proposes a new approach to combined spatial (Intra) prediction and adaptive transform coding in block-based video and image compression. Context-adaptive spatial prediction from available, previously decoded boundaries of the block, is followed by optimal transform coding of the prediction residual. The derivation of both the prediction and the adaptive transform for the prediction error, assumes a separable first-order Gauss-Markov model for the image signal. The resulting optimal transform is shown to be a close relative of the sine transform with phase and frequencies such that basis vectors tend to vanish at known boundaries and maximize energy at unknown boundaries. The overall scheme switches between the above sine-like transform and discrete cosine transform (per direction, horizontal or vertical) depending on the prediction and boundary information. It is implemented within the H.264/AVC intra mode, is shown in experiments to significantly outperform the standard intra mode, and achieve significant reduction of the blocking effect.
Conference Paper
Full-text available
We propose a complete video encoder based on directional “non-separable” transforms that allow spatial and temporal correlation to be jointly exploited. These lifting-based wavelet transforms are applied on graphs that link pixels in a video sequence based on motion information. In this paper, we first consider a low complexity version of this transform, which can operate on subgraphs without significant loss in performance. We then study coefficient reordering techniques that lead to a more realistic and efficient encoder that the one we presented in our earlier work. Our proposed technique shows encouraging results as compared to a comparable scheme based on the DCT transform.
Conference Paper
Full-text available
In this work a new set of edge-adaptive transforms (EATs) is presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid filtering across edges in each image block, thus, they avoid creating large high frequency coefficients. These transforms are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between DCT and EAT in an RD-optimized manner. These transforms are applied to coding depth maps used for view synthesis in a multi-view video coding system, and provides up to 29% bit rate reduction for a fixed quality in the synthesized views.
Article
Full-text available
Over the past two decades, there have been various studies on the distributions of the DCT coefficients for images. However, they have concentrated only on fitting the empirical data from some standard pictures with a variety of well-known statistical distributions, and then comparing their goodness of fit. The Laplacian distribution is the dominant choice balancing simplicity of the model and fidelity to the empirical data. Yet, to the best of our knowledge, there has been no mathematical justification as to what gives rise to this distribution. We offer a rigorous mathematical analysis using a doubly stochastic model of the images, which not only provides the theoretical explanations necessary, but also leads to insights about various other observations from the literature. This model also allows us to investigate how certain changes in the image statistics could affect the DCT coefficient distributions
Article
Full-text available
This paper presents multiresolution models for Gauss-Markov random fields (GMRFs) with applications to texture segmentation. Coarser resolution sample fields are obtained by subsampling the sample field at fine resolution. Although the Markov property is lost under such resolution transformation, coarse resolution non-Markov random fields can be effectively approximated by Markov fields. We present two techniques to estimate the GMRF parameters at coarser resolutions from the fine resolution parameters, one by minimizing the Kullback-Leibler distance and another based on local conditional distribution invariance. We also allude to the fact that different GMRF parameters at the fine resolution can result in the same probability measure after subsampling and present the results for first- and second-order cases. We apply this multiresolution model to texture segmentation. Different texture regions in an image are modeled by GMRFs and the associated parameters are assumed to be known. Parameters at lower resolutions are estimated from the fine resolution parameters. The coarsest resolution data is first segmented and the segmentation results are propagated upward to the finer resolution. We use the iterated conditional mode (ICM) minimization at all resolutions. Our experiments with synthetic, Brodatz texture, and real satellite images show that the multiresolution technique results in a better segmentation and requires lesser computation than the single resolution algorithm
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Article
The use of mode-dependent transforms for coding directional intra prediction residuals has been previously shown to provide coding gains, but the transform matrices have to be derived from training. In this paper, we derive a set of separable mode-dependent transforms by using a simple separable, directional, and anisotropic image correlation model. Our analysis shows that only one additional transform, the odd type-3 discrete sine transform (ODST-3), is required for the optimal implementation of mode-dependent transforms. In addition, the four-point ODST-3 also has a structure that can be exploited to reduce the operation count of the transform operation. Experimental results show that in terms of coding efficiency, our proposed approach matches or improves upon the performance of a mode-dependent transforms approach that uses transform matrices obtained through training.
Article
The theoretical and practical aspects of transform coding systems for processing still or moving images are discussed. Among the specific topics considered are: the statistical properties of images; orthogonal transforms for image coding; and transform coefficient quantization and bit allocation. Some practical methods of image coding are described, including: interframe coding; intraframe coding; and transform coding of color data. The application of human visual models to the assessment of image quality is also discussed. Techniques for measuring rms error in coded images are given in an appendix.
Conference Paper
In this paper, a novel intra coding scheme is proposed. The proposed scheme improves H.264 intra coding from three aspects: 1) H.264 intra prediction is enhanced with additional bi-directional intra prediction modes; 2) H.264 integer transform is supplemented with directional transforms for some prediction modes; and 3) residual coefficient coding in CAVLC is improved. Compared to H.264, together the improvements can bring on average 7% and 10% coding gain for CABAC and for CAVLC, respectively, with average coding gain of 12% for HD sequences.
Conference Paper
In this paper, we derive separable KLTs for coding H.264/AVC intra prediction residuals, using a simple image correlation model. Our analysis shows that for some intra prediction modes, we can in fact just use the DCT for performing either the row-wise or column-wise transform. Furthermore, we also compute the KLT that should be used based on the image correlation model, which happens to have sinosuidal terms. The 4 4 transform also has a structure that can be exploited to reduce the operation count of the transform operation. In our simplified implementation of mode-dependent directional transforms (MDDT), we only need to make use of two matrices: the DCT and the derived KLT. Our experimental results show that in terms of coding efficiency, our proposed approach has similar performance when compared with MDDT. More importantly, compared to MDDT, our approach requires no training and has lower computational and storage costs.
Article
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
Article
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.
Article
The expectation maximization method for maximum likelihood image reconstruction in emission tomography, based on the Poisson distribution of the statistically independent components of the image and measurement vectors, is extended to a maximum aposteriori image reconstruction using a multivariate Gaussian a priori probability distribution of the image vector. The approach is equivalent to a penalized maximum likelihood estimation with a special choice of the penalty function. The expectation maximization method is applied to find the a posteriori probability maximizer. A simple iterative formula is derived for a penalty function that is a weighted sum of the squared deviations of image vector components from their a priori mean values. The method is demonstrated to be superior to pure likelihood maximization, in that the penalty function prevents the occurrence of irregular high amplitude patterns in the image with a large number of iterations (the so-called "checkerboard effect" or "noise artifact").
Article
The problem of texture classification arises in several disciplines such as remote sensing, computer vision, and image analysis. In this paper we present two feature extraction methods for the classification of textures using two-dimensional (2-D) Markov random field (MRF) models. It is assumed that the given M × M texture is generated by a Gaussian MRF model. In the first method, the least square (LS) estimates of model parameters are used as features. In the second method, using the notion of sufficient statistics, it is shown that the sample correlations over a symmetric window including the origin are optimal features for classification. Simple minimum distance classifiers using these two feature sets yield good classification accuracies for a seven class problem.
Article
A discrete cosine transform (DCT) is defined and an algorithm to compute it using the fast Fourier transform is developed. It is shown that the discrete cosine transform can be used in the area of digital processing for the purposes of pattern recognition and Wiener filtering. Its performance is compared with that of a class of orthogonal transforms and is found to compare closely to that of the Karhunen-Loève transform, which is known to be optimal. The performances of the Karhunen-Loève and discrete cosine transforms are also found to compare closely with respect to the rate-distortion criterion.
Article
Each Discrete Cosine Transform uses N real basis vectors whose components are cosines. In the DCT-4, for example, the jth component of v k is cos(j + 1 2 )(k + 1 2 ) ß N . These basis vectors are orthogonal and the transform is extremely useful in image processing. If the vector x gives the intensities along a row of pixels, its cosine series P c k v k has the coefficients c k = (x; v k )=N . They are quickly computed from an FFT. But a direct proof of orthogonality, by calculating inner products, does not reveal how natural these cosine vectors are. We prove orthogonality in a different way. Each DCT basis contains the eigenvectors of a symmetric "second difference" matrix. By varying the boundary conditions we get the established transforms DCT-1 through DCT-4. Other combinations lead to four additional cosine transforms. The type of boundary condition (Dirichlet or Neumann, centered at a meshpoint or a midpoint) determines the applications that are appropriate for each transfor...
Improvement of intra coding by bidirectional intra prediction and 1 dimensional directional unified transform
  • A Tanizawa
  • J Yamaguchi
  • T Shiodera
  • T Chujoh
  • T Yamakage
A. Tanizawa, J. Yamaguchi, T. Shiodera, T. Chujoh, and T. Yamakage, " Improvement of intra coding by bidirectional intra prediction and 1 dimensional directional unified transform, " in Doc. JCTVC-B042, MPEG-H/JCT-VC, 2010.
Jointly optimal intra pre-diction and adaptive primary transform
  • A Saxena
  • F C A Fernandes
A. Saxena and F. C. A. Fernandes, " Jointly optimal intra pre-diction and adaptive primary transform, " in Doc. JCTVC-C108, MPEG-H/JCT-VC, 2010.
Simplified MDDT (SMDDT) for intra prediction residual
  • H Yang
  • J Zhou
  • H Yu
H. Yang, J. Zhou, and H. Yu, " Simplified MDDT (SMDDT) for intra prediction residual, " in Doc. JCTVC-B039, MPEG-H/JCT-VC, 2010.