Content uploaded by Hilmi E. Egilmez
Author content
All content in this area was uploaded by Hilmi E. Egilmez on May 31, 2015
Content may be subject to copyright.
GRAPH-BASED TRANSFORMS FOR INTER PREDICTED VIDEO CODING
Hilmi E. Egilmez
†
, Amir Said
∗
, Yung-Hsuan Chao
†
and Antonio Ortega
†
†
Signal & Image Processing Institute, University of Southern California, Los Angeles, CA, USA
∗
Qualcomm Technologies, San Diego, CA, USA
hegilmez@usc.edu, asaid@qti.qualcomm.com, yunghsuc@usc.edu, ortega@sipi.usc.edu
ABSTRACT
In video coding, motion compensation is an essential tool to ob-
tain residual block signals whose transform coefficients are en-
coded. This paper proposes novel graph-based transforms (GBTs)
for coding inter-predicted residual block signals. Our contribu-
tion is twofold: (i) We develop edge adaptive GBTs (EA-GBTs)
derived from graphs estimated from residual blocks, and (ii) we de-
sign template adaptive GBTs (TA-GBTs) by introducing simplified
graph templates generating different set of GBTs with low transform
signaling overhead. Our experimental results show that proposed
methods significantly outperform traditional DCT and KLT in terms
of rate-distortion performance.
Index Terms— Transform, signal processing on graphs, graph-
based transforms, video coding, video compression.
1. INTRODUCTION
In video coding standards including HEVC [1], inter-prediction is
a very important building block that significantly improves coding
efficiency by exploiting high temporal redundancy between video
blocks. In general, samples of residual blocks obtained from inter-
prediction have low energy, so their transform coefficients can be
efficiently encoded. However, some residual blocks may have high
energy due to high motion activity and occlusions, so that better en-
ergy compacting transforms are needed to improve coding gains.
Typically, in conventional video coding architectures as shown in
Fig. 1, a fixed transform such as discrete cosine transform (DCT) is
employed to accommodate complexity constraints of encoding. The
main problem of using a fixed block linear transform is the implicit
assumption that all residual blocks have the same isotropic statisti-
cal properties. Yet in practice, residual blocks can have very dif-
ferent statistical characteristics depending on video content. Better
compression can be achieved by using different transforms that can
adapt to statistical properties of residual blocks. But, such adapta-
tion requires to encode additional side information, called transform
signaling overhead, that is used by the decoder to identify the trans-
forms used at the encoder. Therefore, it is important to design trans-
forms that adapt common residual block characteristics with low sig-
naling overhead.
This paper presents two different type of transforms exploiting
statistical characteristics of inter-predicted residual blocks. The pro-
posed transforms fall into the category of graph-based transforms
(GBTs) where we first design a graph capturing some signal char-
acteristics observed from inter-predicted residual blocks, and asso-
ciated orthogonal transforms are then derived from the designated
graph. In our first design, which is edge adaptive GBT (EA-GBT),
This work has been supported in part by LG Electronics.
Delay
Delay
Prediction
Prediction
Inverse
transform
Inverse
transform
Transform
selection
Dequantization
Dequantization
Quantization
Entropy
decoding
Entropy
encoding
x
d
p
y
˜
x
c
c
y
p
˜
x
T
T
Encoder
Decoder
Fig. 1: An overall block diagram for hybrid video coding, using a
combination of predictive and transform coding.
we allow flexible adaptation for each residual block. Firstly, edge de-
tection is performed for each residual block, and based on detected
edges we construct a weighted graph which captures signal variation
characteristics in the block. Then, an EA-GBT is generated using
the weighted graph. Note that, this method can create a large sig-
naling overhead depending on the graph information has to be sent
to the decoder. Our second design proposes template adaptive GBTs
(TA-GBTs) which are derived based on a set of simplified graph
templates capturing basic statistical characteristics of inter-predicted
residual blocks. Thus, graph information can be efficiently sent to
the decoder via signaling indexes of corresponding graph templates.
By selecting different subsets of graph templates, the signaling over-
head can be significantly reduced without losing adaptivity, espe-
cially when a few graph templates are sufficient to capture block
signal characteristics.
In the literature, several adaptive transform approaches have
been proposed. Most similar to our work, Shen et.al. [2] propose
edge adaptive transforms (EAT) specifically for depth map com-
pression. Although our paper adopts some basic concepts originally
introduced in [2] for designing EA-GBTs, our graph construction
method is different. Hu et.al. [3] extends EATs by optimizing weak-
link weights for piecewise smooth image compression. In both [2]
and [3], authors propose methods specific to depth map compres-
sion, but our work focuses on encoding inter-predicted residual
blocks. Related to inter-predicted coding, Liu and Flierl [4] propose
motion adaptive transforms based on vertex weighted graphs for
coding motion-connected pixels. Their approach is not block based
and in their graph construction, unlike in our work, vertex weights
are adjusted using a measure called motion scale factor. Most of
the related recent works are on intra-predicted adaptive transforms.
In [5], Takamura and Shimizu develop intra-mode dependent KLTs,
and Han et.al. [6] introduce hybrid DCT/ADST transform for intra-
(a) (b)
Fig. 2: Graphs (a) connecting each pixel with its four nearest neigh-
boring pixels (4-connected) and (b) connecting each pixel with pix-
els that are 1-hop away (8-connected).
predicted transform coding. To best of our knowledge, our paper
is the first work that proposes GBTs for encoding inter-predicted
residual blocks by exploiting their statistical characteristics.
The rest of the paper is organized as follows. In Section 2 we
introduce GBTs. Section 3 discusses inter-predicted residual signal
characteristics used in designing proposed GBTs. In Section 4, the
proposed EA-GBTs and TA-GBTs are described. The experimental
results are presented in Section 5, and Section 6 draws some conclu-
sions based on experimental results.
2. PRELIMINARIES
In graph signal processing [7, 8], signals are supported on an undi-
rected, weighted and connected graph, G(N , E, A), where signal
values are attached to nodes of the graph (N ) and its links (E) cap-
ture inter-sample relations among signal’s samples. The adjacency
matrix, A, represents the graph’s link weights. For a given graph,
G(N , E, A), we define graph-based transforms (GBTs) using its
combinatorial Laplacian,
L = D − A (1)
where D is the diagonal degree matrix. In order to find the GBT as-
sociated with graph G, we perform eigen-decomposition of the graph
Laplacian, that is
L = TΛT
t
, (2)
where the columns of T are the basis vectors of the corresponding
GBT. Since L is a real symmetric matrix, it has a complete set of
orthonormal eigenvectors.
A graph is completely defined by an adjacency matrix, so we can
create different transforms by designing graph-link weights (i.e., A).
For example as shown in Fig. 2, an image block can be represented
as a graph so that different connectivity patterns lead to different
interpretations in graph transform domain.
3. INTER-PREDICTED RESIDUAL BLOCK SIGNAL
CHARACTERISTICS
In this section, we investigate some statistical properties of inter-
predicted residual blocks that we consider in our transform designs.
In general, inter-predicted residual block signals have small valued
(low energy) samples because of high temporal redundancy among
video blocks. This is very important for effective compression, since
it leads to sparse quantized coefficients which can be encoded effi-
ciently. However, large prediction errors are possible in case of high
motion activity and occlusions which lead to large transform coeffi-
cients requiring more bits for encoding. Based on our observations
on residual block signals obtained using HEVC encoder (HM-14),
residual signal samples that are close to boundaries of the blocks
(a) Harbour (b) Soccer
Fig. 3: Sample variance values calculated over 8 × 8 residual blocks
(a) Harbour (b) Soccer
Fig. 4: Similarity graphs for 8 × 8 residual blocks where partial
correlation values between nearest neighboring pixels are shown.
have larger values mainly because of occlusions leading to partial
mismatches between reference and predicted blocks. Fig. 3 illus-
trates sample variance values calculated over 8 × 8 residual blocks
of Harbour and Soccer sequences
1
. Note that for both sequences,
sample variance (i.e., energy) is larger around the boundaries and
corners of the residual blocks.
Moreover, Fig. 4(a) and (b) show similarity graphs trained for
8 × 8 inter-predicted residual blocks over Harbour and Soccer video
sequences, respectively. As a measure of inter-pixel similarity, par-
tial correlation values are calculated based on the precision matrix,
J, where J is defined as the inverse of the covariance matrix [9],
calculated for each video sequence. The weighted graphs demon-
strate that the similarity between the pixels near boundaries of a
residual block is smaller compared to the pixels around the center
of the block.
It is important to note that the statistical characteristics of inter-
predicted residuals discussed in this section are not specific to Har-
bour and Soccer video sequences. According to our experiments,
these characteristics are fairly general and applies to different se-
quences and residual block sizes. These characteristics are exploited
in our GBT designs discussed in the next section.
4. PROPOSED GRAPH-BASED TRANSFORMS
4.1. Edge Adaptive GBT (EA-GBT)
In designing edge adaptive graph based transforms (EA-GBT), we
first (i) generate a uniformly weighted graph, then (ii) based on dif-
ferences between pixels (i.e., edges), graph links are pruned or their
weights are adjusted (weakened). By doing this, the transforms as-
sociated to designed graphs can exploit different block signal char-
acteristics and therefore GBTs provide better representation of resid-
ual signals. In particular, we propose to implement following steps
to construct EA-GBTs:
1
We show statistical properties of Harbour and Soccer sequences, since
both have high motion activity.
1. Based on the size of the residual block of interest, we create
a nearest neighbor (4-connected) graph with link weights all
equal to 1 as shown in Fig. 2(a) for 8 × 8 blocks.
2. Given a residual block, we apply Prewitt operator to calculate
gradient in vertical and horizontal direction.
3. We detect edges based on thresholding on gradient values.
4. Depending on angle value (directionality) of an edge, the
weights of some graph links are reduced.
5. Weak graph link weights can be chosen in the range of [0,1).
Based on our experiments, instead of assigning zero weights
(may lead to disconnected components), small weights pro-
vide better compression. To reduce signaling overhead, we
experimentally select a single weak link weight set to 0.001.
6. After designing a graph, the associated GBT is constructed as
discussed in Section 2.
Fig. 5 illustrates a sample graph design, obtained by the procedure
above, where link weights of the original 4-connected graph is weak-
ened based on the edges observed in a given residual block. Thus,
the transforms associated to constructed graphs can adapt to differ-
ent residual block signals. Although the resulting transforms can
provide efficient coding for transform coefficients, the overall cod-
ing performance may not be sufficient due to signaling overhead
of graph information, especially if multiple weak link weights are
used. To address this problem, we propose to use a single weak link
weight so that an edge-map codec such as arithmetic edge encoder
(AEC) [10] can be employed to efficiently send graph information.
In addition, based on our experiments, signaling graph information
for small blocks (e.g., 4 × 4) may result in excessive bit overheads.
In order to efficiently encode graph information for such blocks, we
propose to combine the graphs obtained from neighboring blocks
and then the combined graph is encoded using the AEC encoder.
4.2. Template Adaptive GBT (TA-GBT)
In this section, we propose a fixed set of GBTs derived from a set
of graph templates considering the inter-predicted residual signal
characteristics discussed in Section 3. The main observation we
exploit in our design is that sharp transitions (i.e., most of the en-
ergy) appear around the corners of inter-predicted residual blocks.
This is mainly due to mismatched regions (i.e., occlusions) in inter-
prediction. The basic building blocks of the proposed graph template
construction are as follows:
1. We choose a base graph that is a uniformly weighted graph,
G
uni
where two examples are shown in Fig. 2. In this work, we
employ nearest-neighbor image model, so 4-connected grid
graph is used (see Fig. 2(a)).
2. By adjusting a subset of links’ weights in G
uni
, K differ-
ent graphs are constructed. These different graphs are called
graph templates {G
k
}
K
k=1
which define GBTs.
3. The statistical properties of inter-predicted residual blocks
can be captured by reducing the weights of links in G
uni
con-
necting pixels around the corners of a transform block. Par-
ticularly in this work, K = 16 templates are generated by
repeating different combinations of a rectangular pattern to
denote weak links around the corners of the graph, G
uni
.
4. For a selected set of graph templates, the associated GBTs are
constructed as discussed in Section 2.
(a) Inter-predicted residual block (b) Associated graph
Fig. 5: An example of edge adaptive graph construction based on a
residual block signal. The graph’s weak links correspond to sharp
transitions (i.e., edges) in the residual block.
Fig. 6: Graph templates for 8×8 blocks with index {1,2,3,...,15,16}.
Fig. 6 shows five of the sixteen graph templates designed for 8 × 8
transform blocks which lead to 16 different GBTs. Similarly, we
also generate 16 transforms for 4 × 4 residual blocks. Note that the
first template corresponds to traditional 2-D DCT [9].
In order to adaptively select the best transform, we introduce a
graph Laplacian based quadratic cost which measures residual signal
variation on a given graph. Formally, for a given residual block sig-
nal d we select the transform whose associated graph representation
(G
k
) solves the following optimization problem,
minimize
k
d
t
L(G
k
)d = d
t
TΛT
t
d = a
t
Λa =
N
X
i=1
λ
i
a
i
2
(3)
where L(G
k
) is the combinatorial graph Laplacian of graph G
k
, a is
the vector of transform coefficients, N is the number of samples in
the residual block, λ
i
denotes the eigenvalues of the graph Laplacian
in increasing order (i.e., λ
i
≥ λ
i+1
for i ∈ {1, ..., N − 1}) and a
i
is
the transform coefficient associated with λ
i
. This criterion is a way
of measuring energy compaction, so that the larger λ is, the larger
penalty for its transform coefficients are. Since the first eigenvalue,
λ
1
, is zero [7], then
N
X
i=1
λ
i
a
i
2
=
N
X
i=2
λ
i
a
i
2
. (4)
which induces no penalty for transform coefficient a
1
(i.e., DC com-
ponent).
5. RESULTS
In this section, we compare the rate-distortion (RD) performance of
the proposed transforms by benchmarking against DCT and KLT. In
our simulations, we generate residual block signals for five test se-
quences, Foreman, Mobile, City, Harbour and Soccer, using HEVC
(HM-14) encoder where transform units are fixed to either 4 × 4
or 8 × 8. We test the performance of different transforms on inter-
predicted residual blocks only. After transforming residual blocks,
the transform coefficients are uniformly quantized and then encoded
using a symbol grouping-based arithmetic entropy encoder called
AGP, which uses an amplitude group partition technique to effi-
ciently encode image transform coefficients [11]. The AGP encoder
allows us to fairly compare the rate-distortion performance of differ-
ent transforms since AGP can flexibly learn and exploit amplitude
Table 1: Percentage reduction in bitrate (bits/pixel) with respect to average bitrate obtained using DCT.
PSNR Transform
4 × 4 block transform 8 × 8 block transform
352×288 704×576
Average
352×288 704×576
Average
(dB) Foreman Mobile City Harbour Soccer Foreman Mobile City Harbour Soccer
32
EA-GBT(RO) -90.44 18.40 3.98 -0.30 7.28 7.63 N/A N/A N/A N/A N/A N/A
EA-GBT -423.02 -21.46 -99.45 -97.66 -66.38 -67.26 -159.07 13.54 -5.71 -12.64 9.18 0.93
TA-GBT 0.77 11.76 6.38 4.46 7.67 7.68 0.41 7.11 7.69 3.98 8.55 6.77
KLT 2.75 3.32 7.40 2.35 5.42 4.19 7.24 2.29 7.86 1.77 7.13 4.02
34
EA-GBT(RO) 7.19 15.13 18.43 12.57 17.61 15.17 N/A N/A N/A N/A N/A N/A
EA-GBT -62.04 -14.37 -36.62 -46.78 -25.45 -30.25 6.99 8.63 12.84 5.75 15.01 9.97
TA-GBT 7.61 8.88 5.83 6.02 7.09 6.23 8.30 4.65 3.80 4.35 4.18 4.20
KLT -0.20 1.49 5.38 5.74 4.13 3.40 1.34 1.21 4.31 3.73 3.90 2.65
36
EA-GBT(RO) 20.44 10.76 13.79 10.95 11.93 12.61 N/A N/A N/A N/A N/A N/A
EA-GBT -15.31 -12.76 -23.49 -30.20 -18.79 -19.58 21.26 4.89 7.03 4.88 6.63 7.42
TA-GBT 7.06 6.76 4.58 0.17 4.97 4.77 4.19 3.03 1.11 2.17 1.53 1.67
KLT 2.01 0.58 3.85 4.63 3.28 2.66 -0.53 0.60 3.30 4.68 3.77 2.24
30
31
32
33
34
35
36
37
38
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Average PSNR
BPP
EA-GBT(RO)
EA-GBT
TA-GBT
KLT
DCT
30
31
32
33
34
35
36
37
38
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
Average PSNR
BPP
EA-GBT
TA-GBT
KLT
DCT
Fig. 7: Average PSNR vs. BPP results (left) for 4×4 blocks and (right) for 8×8 blocks. EA-GBT(RO) corresponds to the method (only
applied to 4 × 4 blocks) that reduces signaling overhead of EA-GBT by combining graph information at neighboring blocks.
distribution of transform coefficients. For ordering of the quantized
coefficients, we employ zig-zag scanning for DCT coefficients, and
descending and ascending order of eigenvalues are used for KLT and
GBTs coefficients, respectively. To send transform signaling infor-
mation for EA-GBT, we use the arithmetic edge codec (AEC) [10]
to efficiently code graph information. To further reduce the over-
head of graph coding for 4 × 4 blocks (EA-GBT(RO)), we combine
the graphs obtained from neighboring blocks and the resulting larger
graph is encoded using AEC. For TA-GBT, the transform indexes
are signaled as the side information. After decoding the quantized
transform coefficients using AGP decoder, we reconstruct the video
blocks and measure PSNR with respect to the original video blocks.
The average RD performances of different transforms are pre-
sented in Fig. 7 in terms of PSNR and total bits spent per-pixel
(BPP) for encoding quantized transform coefficients, motion vectors
and transform signaling overheads. More comprehensive results are
available in Table 1 where we show percent bit reductions for each
video sequence gained by using GBTs and KLT at different PSNR
values (i.e., 32, 34 and 36 dBs) with respect to using DCT. Aver-
age percent reductions (corresponding to Fig. 7) are also given in
Table 1. Note that, positive values in the table means that the better
RD performance is achieved compared to using DCT. According to
these results:
• For 4 × 4 blocks, RD performance of EA-GBT is the worst
among all transforms due to the excessive graph signaling over-
head. However, the signaling overhead of EA-GBT is signif-
icantly reduced by combining the graph information of neigh-
boring blocks (see EA-GBT(RO) in Table 1 and in Fig. 7).
• EA-GBT(RO) and EA-GBT outperform all other transforms at
high-rate coding of 4 × 4 and 8 × 8 blocks, respectively. On the
other hand, TA-GBT provides a reasonable coding gain for both
low-rate and high-rate coding with respect to DCT.
6. CONCLUSIONS
In this paper, we have proposed two novel transforms, EA-GBT and
TA-GBT, for inter-predicted residual block signals, and their rate-
distortion (RD) performance is compared against traditional DCT
and KLT. The inspection of the experimental results lead us to fol-
lowing conclusions:
• Proposed EA-GBT provides 9.9% coding gain at 34dB PSNR
with respect to DCT for 8 × 8 residual blocks. For 4 × 4 blocks,
15.2% gain can be achieved using EA-GBT. However, at low
bitrates corresponding to 30-32dB PSNR, the graph signal-
ing overhead exceeds the bit reduction gained using EA-GBT.
Therefore, we propose to use TA-GBT for coding at low bitrates.
• Proposed TA-GBT nicely captures the characteristics of 4 × 4
residual blocks with low transform signaling overhead. At 34dB
PSNR, it provides 6.2% bitrate reduction with respect to DCT on
average. For 8 × 8 blocks, the reduction is relatively less, that
is 4.2%. Using more graph templates can improve the coding
gain, since more different signal characteristics can be captured
in 8 × 8 or larger blocks.
• For 4 × 4 blocks, it is inefficient to directly send graphs as the
side-information. By exploiting the graph information from the
neighboring blocks, we show that the signaling overhead can be
significantly reduced.
7. REFERENCES
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand,
“Overview of the High Efficiency Video Coding (HEVC) Stan-
dard,” IEEE Trans. Circuits Syst. Video Technol., vol. 22,
no. 12, pp. 1649–1668, Dec. 2012.
[2] G. Shen, W.-S. Kim, S. Narang, A. Ortega, J. Lee, and H. Wey,
“Edge-adaptive transforms for efficient depth map coding,” in
Picture Coding Symposium (PCS), 2010, Dec 2010, pp. 566–
569.
[3] W. Hu, G. Cheung, A. Ortega, and O. Au, “Multi-resolution
graph fourier transform for compression of piecewise smooth
images,” IEEE Transactions on Image Processing, vol. PP,
no. 99, pp. 1–1, 2014.
[4] D. Liu and M. Flierl, “Motion-adaptive transforms based on
the laplacian of vertex-weighted graphs,” in Data Compression
Conference (DCC), 2014, March 2014, pp. 53–62.
[5] S. Takamura and A. Shimizu, “On intra coding using mode
dependent 2D-KLT,” in Proc. 30th Picture Coding Symp., San
Jose, CA, Dec. 2013, pp. 137–140.
[6] J. Han, A. Saxena, V. Melkote, and K. Rose, “Jointly opti-
mized spatial prediction and block transform for video and im-
age coding,” IEEE Trans. Image Process., vol. 21, no. 4, pp.
1874–1884, Apr. 2012.
[7] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
P. Vandergheynst, “The emerging field of signal processing on
graphs,” IEEE Signal Process. Mag., vol. 30, no. 3, pp. 83–98,
May 2013.
[8] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing
on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp.
1644–1656, Apr. 2013.
[9] C. Zhang and D. Florencio, “Analyzing the optimality of pre-
dictive transform coding using graph-based models,” IEEE
Signal Process. Lett., vol. 20, no. 1, pp. 106–109, 2013.
[10] I. Daribo, D. Florencio, and G. Cheung, “Arbitrarily shaped
motion prediction for depth video compression using arith-
metic edge coding,” IEEE Transactions on Image Processing,
vol. 23, no. 11, pp. 4696–4708, Nov 2014.
[11] A. Said and W. A. Pearlman, “Low-complexity waveform cod-
ing via alphabet and sample-set partitioning,” in SPIE Visual
Communications and Image Processing, 1997, pp. 25–37.