Conference PaperPDF Available

Graph-based Transforms for Inter Predicted Video Coding

Authors:

Abstract and Figures

In video coding, motion compensation is an essential tool to obtain residual block signals whose transform coefficients are encoded. This paper proposes novel graph-based transforms (GBTs) for coding inter-predicted residual block signals. Our contribution is twofold: (i) We develop edge adaptive GBTs (EA-GBTs) derived from graphs estimated from residual blocks, and (ii) we design template adaptive GBTs (TA-GBTs) by introducing simplified graph templates generating different set of GBTs with low transform signaling overhead. Our experimental results show that proposed methods significantly outperform traditional DCT and KLT in terms of rate-distortion performance.
Content may be subject to copyright.
GRAPH-BASED TRANSFORMS FOR INTER PREDICTED VIDEO CODING
Hilmi E. Egilmez
, Amir Said
, Yung-Hsuan Chao
and Antonio Ortega
Signal & Image Processing Institute, University of Southern California, Los Angeles, CA, USA
Qualcomm Technologies, San Diego, CA, USA
hegilmez@usc.edu, asaid@qti.qualcomm.com, yunghsuc@usc.edu, ortega@sipi.usc.edu
ABSTRACT
In video coding, motion compensation is an essential tool to ob-
tain residual block signals whose transform coefficients are en-
coded. This paper proposes novel graph-based transforms (GBTs)
for coding inter-predicted residual block signals. Our contribu-
tion is twofold: (i) We develop edge adaptive GBTs (EA-GBTs)
derived from graphs estimated from residual blocks, and (ii) we de-
sign template adaptive GBTs (TA-GBTs) by introducing simplified
graph templates generating different set of GBTs with low transform
signaling overhead. Our experimental results show that proposed
methods significantly outperform traditional DCT and KLT in terms
of rate-distortion performance.
Index Terms Transform, signal processing on graphs, graph-
based transforms, video coding, video compression.
1. INTRODUCTION
In video coding standards including HEVC [1], inter-prediction is
a very important building block that significantly improves coding
efficiency by exploiting high temporal redundancy between video
blocks. In general, samples of residual blocks obtained from inter-
prediction have low energy, so their transform coefficients can be
efficiently encoded. However, some residual blocks may have high
energy due to high motion activity and occlusions, so that better en-
ergy compacting transforms are needed to improve coding gains.
Typically, in conventional video coding architectures as shown in
Fig. 1, a fixed transform such as discrete cosine transform (DCT) is
employed to accommodate complexity constraints of encoding. The
main problem of using a fixed block linear transform is the implicit
assumption that all residual blocks have the same isotropic statisti-
cal properties. Yet in practice, residual blocks can have very dif-
ferent statistical characteristics depending on video content. Better
compression can be achieved by using different transforms that can
adapt to statistical properties of residual blocks. But, such adapta-
tion requires to encode additional side information, called transform
signaling overhead, that is used by the decoder to identify the trans-
forms used at the encoder. Therefore, it is important to design trans-
forms that adapt common residual block characteristics with low sig-
naling overhead.
This paper presents two different type of transforms exploiting
statistical characteristics of inter-predicted residual blocks. The pro-
posed transforms fall into the category of graph-based transforms
(GBTs) where we first design a graph capturing some signal char-
acteristics observed from inter-predicted residual blocks, and asso-
ciated orthogonal transforms are then derived from the designated
graph. In our first design, which is edge adaptive GBT (EA-GBT),
This work has been supported in part by LG Electronics.
Delay
Delay
Prediction
Prediction
Inverse
transform
Inverse
transform
Transform
selection
Dequantization
Dequantization
Quantization
Entropy
decoding
Entropy
encoding
x
d
p
y
˜
x
c
c
y
p
˜
x
T
T
Encoder
Decoder
Fig. 1: An overall block diagram for hybrid video coding, using a
combination of predictive and transform coding.
we allow flexible adaptation for each residual block. Firstly, edge de-
tection is performed for each residual block, and based on detected
edges we construct a weighted graph which captures signal variation
characteristics in the block. Then, an EA-GBT is generated using
the weighted graph. Note that, this method can create a large sig-
naling overhead depending on the graph information has to be sent
to the decoder. Our second design proposes template adaptive GBTs
(TA-GBTs) which are derived based on a set of simplified graph
templates capturing basic statistical characteristics of inter-predicted
residual blocks. Thus, graph information can be efficiently sent to
the decoder via signaling indexes of corresponding graph templates.
By selecting different subsets of graph templates, the signaling over-
head can be significantly reduced without losing adaptivity, espe-
cially when a few graph templates are sufficient to capture block
signal characteristics.
In the literature, several adaptive transform approaches have
been proposed. Most similar to our work, Shen et.al. [2] propose
edge adaptive transforms (EAT) specifically for depth map com-
pression. Although our paper adopts some basic concepts originally
introduced in [2] for designing EA-GBTs, our graph construction
method is different. Hu et.al. [3] extends EATs by optimizing weak-
link weights for piecewise smooth image compression. In both [2]
and [3], authors propose methods specific to depth map compres-
sion, but our work focuses on encoding inter-predicted residual
blocks. Related to inter-predicted coding, Liu and Flierl [4] propose
motion adaptive transforms based on vertex weighted graphs for
coding motion-connected pixels. Their approach is not block based
and in their graph construction, unlike in our work, vertex weights
are adjusted using a measure called motion scale factor. Most of
the related recent works are on intra-predicted adaptive transforms.
In [5], Takamura and Shimizu develop intra-mode dependent KLTs,
and Han et.al. [6] introduce hybrid DCT/ADST transform for intra-
(a) (b)
Fig. 2: Graphs (a) connecting each pixel with its four nearest neigh-
boring pixels (4-connected) and (b) connecting each pixel with pix-
els that are 1-hop away (8-connected).
predicted transform coding. To best of our knowledge, our paper
is the first work that proposes GBTs for encoding inter-predicted
residual blocks by exploiting their statistical characteristics.
The rest of the paper is organized as follows. In Section 2 we
introduce GBTs. Section 3 discusses inter-predicted residual signal
characteristics used in designing proposed GBTs. In Section 4, the
proposed EA-GBTs and TA-GBTs are described. The experimental
results are presented in Section 5, and Section 6 draws some conclu-
sions based on experimental results.
2. PRELIMINARIES
In graph signal processing [7, 8], signals are supported on an undi-
rected, weighted and connected graph, G(N , E, A), where signal
values are attached to nodes of the graph (N ) and its links (E) cap-
ture inter-sample relations among signal’s samples. The adjacency
matrix, A, represents the graph’s link weights. For a given graph,
G(N , E, A), we define graph-based transforms (GBTs) using its
combinatorial Laplacian,
L = D A (1)
where D is the diagonal degree matrix. In order to find the GBT as-
sociated with graph G, we perform eigen-decomposition of the graph
Laplacian, that is
L = TΛT
t
, (2)
where the columns of T are the basis vectors of the corresponding
GBT. Since L is a real symmetric matrix, it has a complete set of
orthonormal eigenvectors.
A graph is completely defined by an adjacency matrix, so we can
create different transforms by designing graph-link weights (i.e., A).
For example as shown in Fig. 2, an image block can be represented
as a graph so that different connectivity patterns lead to different
interpretations in graph transform domain.
3. INTER-PREDICTED RESIDUAL BLOCK SIGNAL
CHARACTERISTICS
In this section, we investigate some statistical properties of inter-
predicted residual blocks that we consider in our transform designs.
In general, inter-predicted residual block signals have small valued
(low energy) samples because of high temporal redundancy among
video blocks. This is very important for effective compression, since
it leads to sparse quantized coefficients which can be encoded effi-
ciently. However, large prediction errors are possible in case of high
motion activity and occlusions which lead to large transform coeffi-
cients requiring more bits for encoding. Based on our observations
on residual block signals obtained using HEVC encoder (HM-14),
residual signal samples that are close to boundaries of the blocks
(a) Harbour (b) Soccer
Fig. 3: Sample variance values calculated over 8 × 8 residual blocks
(a) Harbour (b) Soccer
Fig. 4: Similarity graphs for 8 × 8 residual blocks where partial
correlation values between nearest neighboring pixels are shown.
have larger values mainly because of occlusions leading to partial
mismatches between reference and predicted blocks. Fig. 3 illus-
trates sample variance values calculated over 8 × 8 residual blocks
of Harbour and Soccer sequences
1
. Note that for both sequences,
sample variance (i.e., energy) is larger around the boundaries and
corners of the residual blocks.
Moreover, Fig. 4(a) and (b) show similarity graphs trained for
8 × 8 inter-predicted residual blocks over Harbour and Soccer video
sequences, respectively. As a measure of inter-pixel similarity, par-
tial correlation values are calculated based on the precision matrix,
J, where J is defined as the inverse of the covariance matrix [9],
calculated for each video sequence. The weighted graphs demon-
strate that the similarity between the pixels near boundaries of a
residual block is smaller compared to the pixels around the center
of the block.
It is important to note that the statistical characteristics of inter-
predicted residuals discussed in this section are not specific to Har-
bour and Soccer video sequences. According to our experiments,
these characteristics are fairly general and applies to different se-
quences and residual block sizes. These characteristics are exploited
in our GBT designs discussed in the next section.
4. PROPOSED GRAPH-BASED TRANSFORMS
4.1. Edge Adaptive GBT (EA-GBT)
In designing edge adaptive graph based transforms (EA-GBT), we
first (i) generate a uniformly weighted graph, then (ii) based on dif-
ferences between pixels (i.e., edges), graph links are pruned or their
weights are adjusted (weakened). By doing this, the transforms as-
sociated to designed graphs can exploit different block signal char-
acteristics and therefore GBTs provide better representation of resid-
ual signals. In particular, we propose to implement following steps
to construct EA-GBTs:
1
We show statistical properties of Harbour and Soccer sequences, since
both have high motion activity.
1. Based on the size of the residual block of interest, we create
a nearest neighbor (4-connected) graph with link weights all
equal to 1 as shown in Fig. 2(a) for 8 × 8 blocks.
2. Given a residual block, we apply Prewitt operator to calculate
gradient in vertical and horizontal direction.
3. We detect edges based on thresholding on gradient values.
4. Depending on angle value (directionality) of an edge, the
weights of some graph links are reduced.
5. Weak graph link weights can be chosen in the range of [0,1).
Based on our experiments, instead of assigning zero weights
(may lead to disconnected components), small weights pro-
vide better compression. To reduce signaling overhead, we
experimentally select a single weak link weight set to 0.001.
6. After designing a graph, the associated GBT is constructed as
discussed in Section 2.
Fig. 5 illustrates a sample graph design, obtained by the procedure
above, where link weights of the original 4-connected graph is weak-
ened based on the edges observed in a given residual block. Thus,
the transforms associated to constructed graphs can adapt to differ-
ent residual block signals. Although the resulting transforms can
provide efficient coding for transform coefficients, the overall cod-
ing performance may not be sufficient due to signaling overhead
of graph information, especially if multiple weak link weights are
used. To address this problem, we propose to use a single weak link
weight so that an edge-map codec such as arithmetic edge encoder
(AEC) [10] can be employed to efficiently send graph information.
In addition, based on our experiments, signaling graph information
for small blocks (e.g., 4 × 4) may result in excessive bit overheads.
In order to efficiently encode graph information for such blocks, we
propose to combine the graphs obtained from neighboring blocks
and then the combined graph is encoded using the AEC encoder.
4.2. Template Adaptive GBT (TA-GBT)
In this section, we propose a fixed set of GBTs derived from a set
of graph templates considering the inter-predicted residual signal
characteristics discussed in Section 3. The main observation we
exploit in our design is that sharp transitions (i.e., most of the en-
ergy) appear around the corners of inter-predicted residual blocks.
This is mainly due to mismatched regions (i.e., occlusions) in inter-
prediction. The basic building blocks of the proposed graph template
construction are as follows:
1. We choose a base graph that is a uniformly weighted graph,
G
uni
where two examples are shown in Fig. 2. In this work, we
employ nearest-neighbor image model, so 4-connected grid
graph is used (see Fig. 2(a)).
2. By adjusting a subset of links’ weights in G
uni
, K differ-
ent graphs are constructed. These different graphs are called
graph templates {G
k
}
K
k=1
which define GBTs.
3. The statistical properties of inter-predicted residual blocks
can be captured by reducing the weights of links in G
uni
con-
necting pixels around the corners of a transform block. Par-
ticularly in this work, K = 16 templates are generated by
repeating different combinations of a rectangular pattern to
denote weak links around the corners of the graph, G
uni
.
4. For a selected set of graph templates, the associated GBTs are
constructed as discussed in Section 2.
(a) Inter-predicted residual block (b) Associated graph
Fig. 5: An example of edge adaptive graph construction based on a
residual block signal. The graph’s weak links correspond to sharp
transitions (i.e., edges) in the residual block.
Fig. 6: Graph templates for 8×8 blocks with index {1,2,3,...,15,16}.
Fig. 6 shows five of the sixteen graph templates designed for 8 × 8
transform blocks which lead to 16 different GBTs. Similarly, we
also generate 16 transforms for 4 × 4 residual blocks. Note that the
first template corresponds to traditional 2-D DCT [9].
In order to adaptively select the best transform, we introduce a
graph Laplacian based quadratic cost which measures residual signal
variation on a given graph. Formally, for a given residual block sig-
nal d we select the transform whose associated graph representation
(G
k
) solves the following optimization problem,
minimize
k
d
t
L(G
k
)d = d
t
TΛT
t
d = a
t
Λa =
N
X
i=1
λ
i
a
i
2
(3)
where L(G
k
) is the combinatorial graph Laplacian of graph G
k
, a is
the vector of transform coefficients, N is the number of samples in
the residual block, λ
i
denotes the eigenvalues of the graph Laplacian
in increasing order (i.e., λ
i
λ
i+1
for i {1, ..., N 1}) and a
i
is
the transform coefficient associated with λ
i
. This criterion is a way
of measuring energy compaction, so that the larger λ is, the larger
penalty for its transform coefficients are. Since the first eigenvalue,
λ
1
, is zero [7], then
N
X
i=1
λ
i
a
i
2
=
N
X
i=2
λ
i
a
i
2
. (4)
which induces no penalty for transform coefficient a
1
(i.e., DC com-
ponent).
5. RESULTS
In this section, we compare the rate-distortion (RD) performance of
the proposed transforms by benchmarking against DCT and KLT. In
our simulations, we generate residual block signals for five test se-
quences, Foreman, Mobile, City, Harbour and Soccer, using HEVC
(HM-14) encoder where transform units are fixed to either 4 × 4
or 8 × 8. We test the performance of different transforms on inter-
predicted residual blocks only. After transforming residual blocks,
the transform coefficients are uniformly quantized and then encoded
using a symbol grouping-based arithmetic entropy encoder called
AGP, which uses an amplitude group partition technique to effi-
ciently encode image transform coefficients [11]. The AGP encoder
allows us to fairly compare the rate-distortion performance of differ-
ent transforms since AGP can flexibly learn and exploit amplitude
Table 1: Percentage reduction in bitrate (bits/pixel) with respect to average bitrate obtained using DCT.
PSNR Transform
4 × 4 block transform 8 × 8 block transform
352×288 704×576
Average
352×288 704×576
Average
(dB) Foreman Mobile City Harbour Soccer Foreman Mobile City Harbour Soccer
32
EA-GBT(RO) -90.44 18.40 3.98 -0.30 7.28 7.63 N/A N/A N/A N/A N/A N/A
EA-GBT -423.02 -21.46 -99.45 -97.66 -66.38 -67.26 -159.07 13.54 -5.71 -12.64 9.18 0.93
TA-GBT 0.77 11.76 6.38 4.46 7.67 7.68 0.41 7.11 7.69 3.98 8.55 6.77
KLT 2.75 3.32 7.40 2.35 5.42 4.19 7.24 2.29 7.86 1.77 7.13 4.02
34
EA-GBT(RO) 7.19 15.13 18.43 12.57 17.61 15.17 N/A N/A N/A N/A N/A N/A
EA-GBT -62.04 -14.37 -36.62 -46.78 -25.45 -30.25 6.99 8.63 12.84 5.75 15.01 9.97
TA-GBT 7.61 8.88 5.83 6.02 7.09 6.23 8.30 4.65 3.80 4.35 4.18 4.20
KLT -0.20 1.49 5.38 5.74 4.13 3.40 1.34 1.21 4.31 3.73 3.90 2.65
36
EA-GBT(RO) 20.44 10.76 13.79 10.95 11.93 12.61 N/A N/A N/A N/A N/A N/A
EA-GBT -15.31 -12.76 -23.49 -30.20 -18.79 -19.58 21.26 4.89 7.03 4.88 6.63 7.42
TA-GBT 7.06 6.76 4.58 0.17 4.97 4.77 4.19 3.03 1.11 2.17 1.53 1.67
KLT 2.01 0.58 3.85 4.63 3.28 2.66 -0.53 0.60 3.30 4.68 3.77 2.24
30
31
32
33
34
35
36
37
38
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Average PSNR
BPP
EA-GBT(RO)
EA-GBT
TA-GBT
KLT
DCT
30
31
32
33
34
35
36
37
38
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Average PSNR
BPP
EA-GBT
TA-GBT
KLT
DCT
Fig. 7: Average PSNR vs. BPP results (left) for 4×4 blocks and (right) for 8×8 blocks. EA-GBT(RO) corresponds to the method (only
applied to 4 × 4 blocks) that reduces signaling overhead of EA-GBT by combining graph information at neighboring blocks.
distribution of transform coefficients. For ordering of the quantized
coefficients, we employ zig-zag scanning for DCT coefficients, and
descending and ascending order of eigenvalues are used for KLT and
GBTs coefficients, respectively. To send transform signaling infor-
mation for EA-GBT, we use the arithmetic edge codec (AEC) [10]
to efficiently code graph information. To further reduce the over-
head of graph coding for 4 × 4 blocks (EA-GBT(RO)), we combine
the graphs obtained from neighboring blocks and the resulting larger
graph is encoded using AEC. For TA-GBT, the transform indexes
are signaled as the side information. After decoding the quantized
transform coefficients using AGP decoder, we reconstruct the video
blocks and measure PSNR with respect to the original video blocks.
The average RD performances of different transforms are pre-
sented in Fig. 7 in terms of PSNR and total bits spent per-pixel
(BPP) for encoding quantized transform coefficients, motion vectors
and transform signaling overheads. More comprehensive results are
available in Table 1 where we show percent bit reductions for each
video sequence gained by using GBTs and KLT at different PSNR
values (i.e., 32, 34 and 36 dBs) with respect to using DCT. Aver-
age percent reductions (corresponding to Fig. 7) are also given in
Table 1. Note that, positive values in the table means that the better
RD performance is achieved compared to using DCT. According to
these results:
For 4 × 4 blocks, RD performance of EA-GBT is the worst
among all transforms due to the excessive graph signaling over-
head. However, the signaling overhead of EA-GBT is signif-
icantly reduced by combining the graph information of neigh-
boring blocks (see EA-GBT(RO) in Table 1 and in Fig. 7).
EA-GBT(RO) and EA-GBT outperform all other transforms at
high-rate coding of 4 × 4 and 8 × 8 blocks, respectively. On the
other hand, TA-GBT provides a reasonable coding gain for both
low-rate and high-rate coding with respect to DCT.
6. CONCLUSIONS
In this paper, we have proposed two novel transforms, EA-GBT and
TA-GBT, for inter-predicted residual block signals, and their rate-
distortion (RD) performance is compared against traditional DCT
and KLT. The inspection of the experimental results lead us to fol-
lowing conclusions:
Proposed EA-GBT provides 9.9% coding gain at 34dB PSNR
with respect to DCT for 8 × 8 residual blocks. For 4 × 4 blocks,
15.2% gain can be achieved using EA-GBT. However, at low
bitrates corresponding to 30-32dB PSNR, the graph signal-
ing overhead exceeds the bit reduction gained using EA-GBT.
Therefore, we propose to use TA-GBT for coding at low bitrates.
Proposed TA-GBT nicely captures the characteristics of 4 × 4
residual blocks with low transform signaling overhead. At 34dB
PSNR, it provides 6.2% bitrate reduction with respect to DCT on
average. For 8 × 8 blocks, the reduction is relatively less, that
is 4.2%. Using more graph templates can improve the coding
gain, since more different signal characteristics can be captured
in 8 × 8 or larger blocks.
For 4 × 4 blocks, it is inefficient to directly send graphs as the
side-information. By exploiting the graph information from the
neighboring blocks, we show that the signaling overhead can be
significantly reduced.
7. REFERENCES
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand,
“Overview of the High Efficiency Video Coding (HEVC) Stan-
dard, IEEE Trans. Circuits Syst. Video Technol., vol. 22,
no. 12, pp. 1649–1668, Dec. 2012.
[2] G. Shen, W.-S. Kim, S. Narang, A. Ortega, J. Lee, and H. Wey,
“Edge-adaptive transforms for efficient depth map coding, in
Picture Coding Symposium (PCS), 2010, Dec 2010, pp. 566–
569.
[3] W. Hu, G. Cheung, A. Ortega, and O. Au, “Multi-resolution
graph fourier transform for compression of piecewise smooth
images, IEEE Transactions on Image Processing, vol. PP,
no. 99, pp. 1–1, 2014.
[4] D. Liu and M. Flierl, “Motion-adaptive transforms based on
the laplacian of vertex-weighted graphs, in Data Compression
Conference (DCC), 2014, March 2014, pp. 53–62.
[5] S. Takamura and A. Shimizu, “On intra coding using mode
dependent 2D-KLT,” in Proc. 30th Picture Coding Symp., San
Jose, CA, Dec. 2013, pp. 137–140.
[6] J. Han, A. Saxena, V. Melkote, and K. Rose, “Jointly opti-
mized spatial prediction and block transform for video and im-
age coding, IEEE Trans. Image Process., vol. 21, no. 4, pp.
1874–1884, Apr. 2012.
[7] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
P. Vandergheynst, “The emerging field of signal processing on
graphs,IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98,
May 2013.
[8] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing
on graphs, IEEE Trans. Signal Process., vol. 61, no. 7, pp.
1644–1656, Apr. 2013.
[9] C. Zhang and D. Florencio, Analyzing the optimality of pre-
dictive transform coding using graph-based models, IEEE
Signal Process. Lett., vol. 20, no. 1, pp. 106–109, 2013.
[10] I. Daribo, D. Florencio, and G. Cheung, Arbitrarily shaped
motion prediction for depth video compression using arith-
metic edge coding, IEEE Transactions on Image Processing,
vol. 23, no. 11, pp. 4696–4708, Nov 2014.
[11] A. Said and W. A. Pearlman, “Low-complexity waveform cod-
ing via alphabet and sample-set partitioning, in SPIE Visual
Communications and Image Processing, 1997, pp. 25–37.
... coding [17,25,26]. GBT was combined with SVD for pictures of the respective labor [2,27]. Wang et al. [28] introduced an image encryption mechanism based on multiple chaos. ...
Article
Full-text available
The use of Internet technology has led to the availability of different multimedia data in various formats. The unapproved customers misuse multimedia information by conveying them on various web objections to acquire cash deceptively without the first copyright holder’s intervention. Due to the rise in cases of COVID-19, lots of patient information are leaked without their knowledge, so an intelligent technique is required to protect the integrity of patient data by placing an invisible signal known as a watermark on the medical images. In this paper, a new method of watermarking is proposed on both standard and medical images. The paper addresses the use of digital rights management in medical field applications such as embedding the watermark in medical images related to neurodegenerative disorders, lung disorders, and heart issues. The various quality parameters are used to figure out the evaluation of the developed method. In addition, the testing of the watermarking scheme is done by applying various signal processing attacks.
... Recently, graph-based signal processing techniques have gained the attention of researchers. One of the applications of graphical processing is the graph-oriented conversion, which is often used to compress information [6,7]. A method for audio compression by Graph-based Transform is reported in [8], which proposes this method over a popular conventional method, namely DCT (discrete cosine transformation). ...
... GSP techniques have been used in various applications in the past decade, e.g., sensor networks [18], biological networks, brain connectivity [19], Electrocardiogram (ECG) signal analysis [20], image, and video processing [21]. Specifically, researchers have shown that GSP can be a prospective field for detecting anomalies in different types of networks [22], [23]. ...
Preprint
Smart grids are large and complex cyber physical infrastructures that require real-time monitoring for ensuring the security and reliability of the system. Monitoring the smart grid involves analyzing continuous data-stream from various measurement devices deployed throughout the system, which are topologically distributed and structurally interrelated. In this paper, graph signal processing (GSP) has been used to represent and analyze the power grid measurement data. It is shown that GSP can enable various analyses for the power grid's structured data and dynamics of its interconnected components. Particularly, the effects of various cyber and physical stresses in the power grid are evaluated and discussed both in the vertex and the graph-frequency domains of the signals. Several techniques for detecting and locating cyber and physical stresses based on GSP techniques have been presented and their performances have been evaluated and compared. The presented study shows that GSP can be a promising approach for analyzing the power grid's data.
Article
Adaptive transform coding is gaining more and more attention for better mining of image content over fixed transforms such as discrete cosine transform (DCT). As a special case, graph transform learning establishes a novel paradigm for the graph-based transforms. However, there still exists a challenge for graph transform learning-based image codecs design on natural image compression, and graph representation cannot describe regular image samples well over graph-structured data. Therefore, in this paper, we propose a cross-channel graph-based transform (CCGBT) for natural color image compression. We observe that neighboring pixels having similar intensities should have similar values in the chroma channels, which means that the prominent structure of the luminance channel is related to the contours of the chrominance channels. A collaborative design of the learned graphs and their corresponding distinctive transforms lies in the assumption that a sufficiently small block can be considered smooth, meanwhile, guaranteeing the compression of the luma and chroma signals at the cost of a small overhead for coding the description of the designed luma graph. In addition, a color image compression framework based on the CCGBT is designed for comparing DCT on the classic JPEG codec. The proposed method benefits from its flexible transform block design on arbitrary sizes to exploit image content better than the fixed transform. The experimental results show that the unified graph-based transform outperforms conventional DCT, while close to discrete wavelet transform on JPEG2000 at high bit-rates.
Article
Monitoring the smart grid involves analyzing continuous data-stream from various measurement devices deployed throughout the system, which are topologically distributed and structurally interrelated. In this paper, a graph signal processing (GSP) framework is used to represent and analyze the inter-related smart grid measurement data for security and reliability analyses. The effects of various cyber and physical stresses in the system are evaluated in different GSP domains including vertex domain, graph-frequency domain, and the joint vertex-frequency domain. Two novel techniques based on vertex-frequency energy distribution, and the local smoothness of graph signals are proposed and their performance have been evaluated for detecting and locating various cyber and physical stresses. Based on the presented analyses, the proposed techniques show promising performance for detecting sophisticated stresses with no sharp changes at the onset, for detecting abrupt load changes, and also for locating stresses.
Chapter
This chapter reviews well‐established solutions to the problem of graph learning that adopt a statistical or physical perspective. The graph learning problem may consist of finding the optimal weights of the edges such that the resulting graph‐based transforms, having been adapted to the actual image structure, may lead to efficient transform coding of the image. The chapter examines a series of recent GSP‐based approaches and shows how signal processing tools and concepts can be utilized to provide novel solutions to the graph learning problem. The smoothness property of the graph signal is associated with a multivariate Gaussian distribution, which also underlies the idea of classical approaches for learning graphical models, such as the graphical Lasso. Image processing can benefit significantly from graph learning technique. The chapter discusses some general directions for future work by focusing more on graph inference for image processing applications.
Chapter
This chapter presents methods for building graph Fourier transforms (GFTs) for image and video compression. A key insight is that classical transforms, such as the discrete sine/cosine transform (DCT) or the Karhunen–Loeve transform (KLT), can be interpreted from a graph perspective. The chapter considers two sets of techniques for designing graphs, from which the associated GFTs are derived: Graph learning oriented GFT (GL‐GFT), and Block‐adaptive GFT. The graph spectral approaches aim to find graph Laplacian matrices, which denote the inverse covariances for the models of interest. The chapter discusses more specific 1D line models, with rigorous derivations of two separate Gaussian Markov random fields for intra‐ and inter‐predicted blocks. The experimental results demonstrated that GL‐GFTs can provide considerable coding gains with respect to standard transform coding schemes using/DCT. In comparison with the KLTs obtained from sample covariances, GL‐GFTs are more robust and provide better generalization.
Article
In many state-of-the-art compression systems, signal transformation is an integral part of the encoding and decoding process, where transforms provide compact representations for the signals of interest. This paper introduces a class of transforms called graph-based transforms (GBTs) for video compression, and proposes two different techniques to design GBTs. In the first technique, we formulate an optimization problem to learn graphs from data and provide solutions for optimal separable and nonseparable GBT designs, called GL-GBTs. The optimality of the proposed GL-GBTs is also theoretically analyzed based on Gaussian-Markov random field (GMRF) models for intra and inter predicted block signals. The second technique develops edge-adaptive GBTs (EA-GBTs) in order to flexibly adapt transforms to block signals with image edges (discontinuities). The advantages of EA-GBTs are both theoretically and empirically demonstrated. Our experimental results show that the proposed transforms can significantly outperform the traditional Karhunen-Loeve transform (KLT).
Conference Paper
In many video coding systems, separable transforms (such as two-dimensional DCT-2) have been used to code block residual signals obtained after prediction. This paper proposes a parametric approach to build graph-based separable transforms (GBSTs) for video coding. Specifically, a GBST is derived from a pair of line graphs, whose weights are determined based on two non-negative parameters. As certain choices of those parameters correspond to the discrete sine and cosine transform types used in recent video coding standards (including DCT-2, DST-7 and DCT-8), this paper further optimizes these graph parameters to better capture residual block statistics and improve video coding efficiency. The proposed GBSTs are tested on the Versatile Video Coding (VVC) reference software, and the experimental results show that about 0.4% average coding gain is achieved over the existing set of separable transforms constructed based on DCT-2, DST-7 and DCT-8 in VVC.
Article
Full-text available
Piecewise smooth (PWS) images (e.g., depth maps or animation images) contain unique signal characteristics such as sharp object boundaries and slowly-varying interior surfaces. Leveraging on recent advances in graph signal processing, in this paper we propose to compress PWS images using suitable Graph Fourier Transforms (GFT) to minimize the total signal representation cost of each pixel block, considering both the sparsity of the signal's transform coefficients and the compactness of transform description. Unlike fixed transforms such as the Discrete Cosine Transform (DCT), we can adapt GFT to a particular class of pixel blocks. In particular, we select one among a defined search space of GFTs to minimize total representation cost via our proposed algorithms, leveraging on graph optimization techniques such as spectral clustering and minimum graph cuts. Further, for practical implementation of GFT we introduce two techniques to reduce computation complexity. First, at the encoder we low-pass filter and down-sample a high-resolution (HR) pixel block to obtain a low-resolution (LR) one, so that a LR-GFT can be employed. At the decoder, up-sampling and interpolation are performed adaptively along HR boundaries coded using arithmetic edge coding (AEC), so that sharp object boundaries can be well preserved. Second, instead of computing GFT from a graph in real-time via eigen-decomposition, the most popular LR-GFTs are pre-computed and stored in a table for lookup during encoding and decoding. Using depth maps and computer-graphics images as examples of PWS images, experimental results show that our proposed multi-resolution (MR)-GFT scheme outperforms H.264 intra by 6:8 dB on average in PSNR at the same bit rate.
Article
Full-text available
In social settings, individuals interact through webs of relationships. Each individual is a node in a complex network (or graph) of interdependencies and generates data, lots of data. We label the data by its source, or formally stated, we index the data by the nodes of the graph. The resulting signals (data indexed by the nodes) are far removed from time or image signals indexed by well ordered time samples or pixels. DSP, discrete signal processing, provides a comprehensive, elegant, and efficient methodology to describe, represent, transform, analyze, process, or synthesize these well ordered time or image signals. This paper extends to signals on graphs DSP and its basic tenets, including filters, convolution, z-transform, impulse response, spectral representation, Fourier transform, frequency response, and illustrates DSP on graphs by classifying blogs, linear predicting and compressing data from irregularly located weather stations, or predicting behavior of customers of a mobile service provider.
Conference Paper
Full-text available
In this work a new set of edge-adaptive transforms (EATs) is presented as an alternative to the standard DCTs used in image and video coding applications. These transforms avoid filtering across edges in each image block, thus, they avoid creating large high frequency coefficients. These transforms are then combined with the DCT in H.264/AVC and a transform mode selection algorithm is used to choose between DCT and EAT in an RD-optimized manner. These transforms are applied to coding depth maps used for view synthesis in a multi-view video coding system, and provides up to 29% bit rate reduction for a fixed quality in the synthesized views.
Article
Full-text available
This paper proposes a novel approach to jointly optimize spatial prediction and the choice of the subsequent transform in video and image compression. Under the assumption of a separable first-order Gauss-Markov model for the image signal, it is shown that the optimal Karhunen-Loeve Transform, given available partial boundary information, is well approximated by a close relative of the discrete sine transform (DST), with basis vectors that tend to vanish at the known boundary and maximize energy at the unknown boundary. The overall intraframe coding scheme thus switches between this variant of the DST named asymmetric DST (ADST), and traditional discrete cosine transform (DCT), depending on prediction direction and boundary information. The ADST is first compared with DCT in terms of coding gain under ideal model conditions and is demonstrated to provide significantly improved compression efficiency. The proposed adaptive prediction and transform scheme is then implemented within the H.264/AVC intra-mode framework and is experimentally shown to significantly outperform the standard intra coding mode. As an added benefit, it achieves substantial reduction in blocking artifacts due to the fact that the transform now adapts to the statistics of block edges. An integer version of this ADST is also proposed.
Conference Paper
Full-text available
We propose a new low-complexity entropy-coding method to be used for coding waveform signals. It is based on the combination of two schemes: (1) an alphabet partitioning method to reduce the complexity of the entropy-coding process; (2) a new recursive set partitioning entropy-coding process that achieves rates smaller than first order entropy even with fast Huffman adaptive codecs. Numerical results with its application for lossy and loss-less image compression show the efficacy of the new method, comparable to the best known methods
Conference Paper
H.265/HEVC intra coding scheme allows up to 35 prediction modes. In this paper, we propose intra-mode dependent residual transform using 2D-KLT for 4×4, 8×8, 16×16 and 32×32 blocks. Unlike H.265/HEVC and former standards, the transform is not separable and has higher degree of freedom. It does not require coefficient scanning process. Preliminary results demonstrate BD-rate gain of up to 2.30% (average except screen contents), 2.35% (all average) and 12.67% (maximum) compared to HM10.0 anchor.
Article
Depth image compression is important for compact representation of 3D visual data in "texture-plus-depth" format, where texture and depth maps from one or more viewpoints are encoded and transmitted. A decoder can then synthesize a freely chosen virtual view via depth-imagebased rendering (DIBR) using nearby coded texture and depth maps as reference. Further, depth information can be used in other image processing applications beyond view synthesis, such as object identification, segmentation, etc. In this paper, we leverage on the observation that "neighboring pixels of similar depth have similar motion" to efficiently encode depth video. Specifically, we divide a depth block containing two zones of distinct values (e.g., foreground and background) into two arbitrarily shaped regions (subblocks) along the dividing boundary before performing separate motion prediction (MP). While such arbitrarily shaped sub-block MP can lead to very small prediction residuals (resulting in few bits required for residual coding), it incurs an overhead to transmit the dividing boundaries for subblock identification at decoder. To minimize this overhead, we first devise a scheme called arithmetic edge coding (AEC) to efficiently code boundaries that divide blocks into subblocks. Specifically, we propose to incorporate the boundary geometrical correlation in an adaptive arithmetic coder in the form of a statistical model. Then, we propose two optimization procedures to further improve the edge coding performance of AEC for a given depth image. The first procedure operates within a code block, and allows lossy compression of the detected block boundary to lower the cost of AEC, with an option to augment boundary depth pixel values matching the new boundary, given the augmented pixels do not adversely affect synthesized view distortion. The second procedure operates across code blocks, and systematically identifies blocks along an object contour that should be coded using sub-block MP via a rate-distortion optimized trellis. Experimental results show an average overall bitrate reduction of up to 33% over classical H.264/AVC.
Conference Paper
Motion information in image sequences connects pixels that are highly correlated. In this paper, we consider vertex-weighted graphs that are formed by motion vector information. The vertex weights are defined by scale factors which are introduced to improve the energy compaction of motion-adaptive transforms. Further, we relate the vertex-weighted graph to a subspace constraint of the transform. Finally, we propose a subspace-constrained transform (SCT) that achieves optimal energy compaction for the given constraint. The subspace constraint is derived from the underlying motion information only and requires no additional information. Experimental results on energy compaction confirm that the motion-adaptive SCT outperforms motion-compensated orthogonal transforms while approaching the theoretical performance of the Karhunen Loeve Transform (KLT) along given motion trajectories.
Article
In this letter, we provide a theoretical analysis of optimal predictive transform coding based on the Gaussian Markov random field (GMRF) model. It is shown that the eigen-analysis of the precision matrix of the GMRF model is optimal in decorrelating the signal. The resulting graph transform degenerates to the well-known 2-D discrete cosine transform (DCT) for a particular 2-D first order GMRF, although it is not a unique optimal solution. Furthermore, we present an optimal scheme to perform predictive transform coding based on conditional probabilities of a GMRF model. Such an analysis can be applied to both motion prediction and intra-frame predictive coding, and may lead to improvements in coding efficiency in the future.
Article
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.