Content uploaded by Muhammad Shahid Farid
Author content
All content in this area was uploaded by Muhammad Shahid Farid on Aug 07, 2015
Content may be subject to copyright.
BLIND DEPTH QUALITY ASSESSMENT USING HISTOGRAM SHAPE
ANALYSIS
Muhammad Shahid Farid, Maurizio Lucenteforte, Marco Grangetto
Dipartimento di Informatica, Universit
`
a degli Studi di Torino
Corso Svizzera 185, 10149 Torino, Italy
ABSTRACT
Multiview videos plus depth (MVD) is a popular 3D video rep-
resentation where pixel depth information is exploited to gener-
ate additional views to provide 3D experience. Quality assess-
ment of MVD data is of paramount importance since the latest
research results show that existing 2D quality metrics are not suit-
able for MVD. This paper focuses on depth quality assessment
and presents a novel algorithm to estimate the distortion in depth
videos induced by compression. The proposed algorithm is no-
reference and does not require any prior training or modeling. The
proposed method is based solely on the statistical analysis of the
compression sensitive pixels of depth images. The experimental
results worked out on a standard MVD dataset show that the pro-
posed algorithm exhibits a very high correlation with conventional
full-reference metrics.
Index Terms — Depth image quality metric, Free-viewpoint
TV, Depth image based rendering, Quality assessment
1. INTRODUCTION
Multiview-video-plus-depth format for 3D content representation
has been adopted for current and future 3D television technolo-
gies e.g. free-viewpoint television (FTV) [1] and Super Multiview
(SMV) displays [2]. The gray scale depth image represents the
per pixel depth value of the corresponding texture image which
is exploited to generate novel views through depth image based
rendering (DIBR) [3]. In MVD format only few views with their
associated depth maps are coded and transmitted.
The compression of MVD data is indeed an important activity
in 3D television framework and much attention has been devoted
to this research area. To efficiently compress MVD data various
coding formats have been proposed and new tools have been de-
veloped, e.g., [4–7]. Advanced Video Coding (H.264/AVC) [8]
has been used in past to encode the texture videos and depth videos
independently, also known as simulcast coding. The novel High
Efficiency Video Coding (HEVC) [9] is the current state of the
art video coding tool. The Joint Collaborative Team on 3D Video
Coding Extension Development (JCT-3V) has recently developed
extensions of HEVC to efficiently encode multiview videos and
MVD data. Multiview-HEVC (MV-HEVC) [10] extends the HEVC
syntax to encode MVD without additional coding tools whereas
3D-HEVC [11] is expressively dedicated to the design of novel
coding techniques for MVD. 3D-HEVC encodes the base view
with its depth map using unmodified HEVC whereas the depen-
dent views and their depth maps are encoded by exploiting ad-
ditional coding tools. 3D-HEVC achieves the best compression
This work was partially supported by Universit
´
a degli Studi di Torino
and Compagnia di San Paolo under project AMALFI (ORTO119W8J).
ratio for MVD data [11]. To achieve autostereoscopy additional
intermediate viewpoints can be generated with DIBR on the re-
ceiver side. Given a DIBR algorithm, the perceptual quality of the
rendered images depends on both texture and depth image quality.
Quality of the depth map is particularly important as the com-
pression artifacts in depth maps can cause structural and textural
distortions in the synthesized image [12–14] resulting in poor 3D
experience.
3D image and video quality assessment is a more difficult and
complex problem compared to its 2D counterpart. Earlier, 2D im-
age quality metrics have been used to quantify the quality of 3D
images (video plus depth) and stereoscopic images. In this con-
text, the 2D metrics have been used in two ways: some metrics
estimate the quality by assessing each texture image separately
and aggregating the values without considering the depth images.
Others exploit the depth maps in addition to texture image qual-
ity to predict the overall quality. However, due to different nature
of acquisition, representation, transmission and rendering, 3D im-
ages are affected by different types of quality artifacts [15, 16].
Recent studies [17, 18] tested various existing 2D image quality
metrics to assess the quality of stereoscopic and 3D images and
concluded that none of the existing 2D quality metrics is suitable
in this context.
Ekmekcioglu et.al [19] proposed a 3D quality assessment al-
gorithm based on weighted PSNR and SSIM [20]. They propose
to weight each pixel quality value (PSNR or SSIM) with the corre-
sponding depth value to increase the contribution of pixels closer
to the camera; indeed, according to their study the closer the pixel,
the larger the impact on visual perception. The 3D QA proposed
in [21] combines SSIM and C4 [22] with disparity estimation to
compute a single quality metric. The two measures are then inte-
grated (globally or locally) to obtain the final quality value. Boev
et. al. [23] proposed a full-reference multi-scale stereo video QA
algorithm that computes the monoscopic artifacts from the texture
images and stereoscopic artifacts from the disparity images. Cy-
clopean images are constructed from the reference and the test
stereopairs with block based matching; SSIM is used to quan-
tify the monoscopic artifacts (2D artifacts like blur, noise, etc).
The perceptual disparity maps computed for test and reference
stereopairs are compared to estimate the binocular distortions (e.g.
keystone, color distortion).
Most existing 3D quality metrics are full reference and few
consider depth maps in the evaluation. As already described, qual-
ity of depth images is very important due to their role in interme-
diate view generation. Moreover, no-reference 3D quality evalua-
tion is of fundamental importance since the corresponding original
views may not be available; indeed, cost, hardware and bandwidth
constraints usually impose to capture a limited set of views and
the quality of the synthesized view must be estimated in absence
of the corresponding reference. Furthermore, as the depth im-
ages are gray scale textureless images usually consisting of large
homogeneous or linearly changing regions with sharp edges rep-
resenting objects’ boundaries, the conventional 2D visual quality
metrics such as SSIM [20] are not effective to assess the quality
of depth images. As an answer to the mentioned issues, this paper
proposes ‘Blind depth quality metric’ (BDQM), a no-reference al-
gorithm to assess the quality of compressed depth images. The
major contributions of the paper are:
• the proposal of a novel no-reference depth quality metric
BDQM for blind evaluation of depth compression artifact;
• the shape of the histogram of compression sensitive depth
pixels is exploited to estimate the depth quality; in partic-
ular, we show that as the compression ratio is increased
the histogram around depth transitions flattens because of
smoothing;
• BDQM is used to predict the quality of depth images un-
dergoing HEVC compression at various bitrates.
The rest of the paper is organized as follows. In Sect. 2 the pro-
posed algorithm is described. In Sect. 3 experimental results and
comparisons with existing techniques are presented. The research
is concluded in Sect. 4 with a discussion on its various aspects and
possible applications as future work.
2. PROPOSED DEPTH IMAGE QUALITY METRIC
The proposed quality metric works in two steps; first, the com-
pression sensitivity map (CSM) of the depth image is computed
to locate the pixels which are the most susceptible to compression
artifacts. Second, for each compression sensitive pixel (CSP) a
histogram of the neighborhood is constructed and analyzed to de-
termine the quality index. BDQM builds on the key observation
that the histogram around a CSP gets flattened when increasing
the amount of compression; indeed, compression mostly affects
the sharp discontinuities of the depth image. The proposed algo-
rithm exploits the shape of the histogram to predict depth quality.
The following subsections describe each step in detail.
2.1. Computing compression sensitivity map
It is well known that the boundary regions between objects at dif-
ferent depth levels are highly susceptible to compression artifact
compared to the flat homogeneous areas of depth images. There-
fore, the magnitude of the depth gradient can be a simple and ef-
fective means for evaluating compression sensitivity. Let I be an
M × N depth image. The compression sensitivity map (CSM) of
I is computed from its gradient magnitudes as:
CSM =
q
G
2
x
+ G
2
y
(1)
where G
x
and G
y
are gradients along horizontal and vertical di-
rections and can be computed with Sobel filters.
The gradient magnitude can be used to select the compres-
sion sensitive depth pixels that will be used to estimate the qual-
ity index in the following section. Fig. 1a shows a depth image
from Poznan Street sequence (View 5, 1st Frame) and its corre-
sponding gradient representing the CSM (Fig. 1b). Thresholding
by dropping the pixels with CSM ≤ τ is used to locate the most
compression sensitive pixels; please note that this choice has also
a positive side effect since it dramatically reduces the computa-
tional cost of the whole metric. As an example, Fig. 1c shows the
CSM after thresholding with τ = 4.
(a) (b)
(c)
Figure 1. Depth saliency detection: a depth image (a), its CSM (b), thresh-
olded CSM (τ = 4) (c).
2.2. Depth quality index
The CSM computed in the previous step is used to estimate the
quality of the depth image. The CSPs defined above belong to
the sharp discontinuities representing the boundaries between two
usually very flat or linearly changing regions at different depth
levels. To quantify the effect of compression the neighborhood
of the CSPs is examined to determine the smoothness induced by
quantization. A local histogram is constructed and analyzed to in-
fer the presence of compression effects. As the CSPs lie on or in
the proximity of the boundary between two different depth levels,
the histogram appears to be very peaked around two bins. In pres-
ence of compression, the depth transitions tend to be smoothed
and the effect can be captured by a local histogram where the two
peaks are less pronounced and the values are more equally dis-
tributed in between. Fig. 2 shows a sample histogram of a CSP
(neighborhood of size 15×15) from Poznan Street sequence com-
pressed with HEVC with quantization parameter QP=30 (Fig. 2a)
and QP=42 (Fig. 2b), respectively. The histogram is computed
onto 10 equal bins. Two very high peaks with values above 85
can observed in Fig. 2a showing that the depth values are concen-
trated around two bins whereas the rest of the histogram is very
sparse and almost empty. In Fig. 2b it can be noted that the his-
togram of the same region exhibits lower peaks and higher valley
between them when QP=42: a drop of 30 and 15 can be observed
in the two peaks respectively along with increased values of the
bins in-between. We can conclude that higher compression makes
the histogram flatter.
To predict the quality of a depth image we propose to estimate
the histogram dispersion by measuring the area which lies on top
1 2 3 4
5 6
7 8 9 10
0
20
40
60
80
100
Bin
Bin Size
(a)
1 2 3 4 5 6 7 8 9 10
0
20
40
60
80
100
Bin
Bin Size
(b)
Figure 2. Histogram of a salient pixel from Poznan Street test sequence,
view 5, frame 1 at QP=30 (a), and QP=42 (b).
(a) (b) (c)
Figure 3. Predicting the quality index. (a) QP=30, Q
i
= 675, (b) QP=42,
Q
i
= 525, (c) QP=46, Q
i
= 375.
of the histogram curve (see gray area in Fig. 3): the larger the
area, the less compressed is depth. An area value is associated to
each CSP and then averaged together to compute the final quality
index. Let S be the set of CSPs of depth image I and let p
i
∈ S be
a CSP with coordinates (x, y) | {1 ≤ x ≤ M ; 1 ≤ y ≤ N }.
For each p
i
∈ S, we select a patch P
i
of size w × w centered
at (x, y) and calculate the corresponding local histogram. Let H
κ
i
denote the histogram distribution of patch P
i
with κ equally sized
bins. The quality index Q
i
of p
i
is defined as:
Q
i
=
κ
X
t=1
[max(H
κ
i
) − H
κ
i
(t)] (2)
Fig. 3 graphically shows the proposed quality index. The figure
shows distribution curves of a sample CSP of the first frame of
Poznan Street test sequence coded by HEVC with different QP.
The blue line represents the histogram distribution whereas the
area inside the curve is shadowed in gray. One can note that, as
we conjectured above, the histogram area is decreasing when QP
increases. Finally, the Q
i
value of all CSPs is averaged to obtain
the quality of depth image I.
BDQM =
1
|S|
|S|
X
i=1
Q
i
(3)
where |S| represents the size of S. Blind depth quality metric
(BDQM) is computed for each frame of the depth video and the
values are averaged to predict the quality of a whole video se-
quence. BDQM is a quality measure that means the larger the
value of BDQM is, the better the quality of the depth map.
3. EXPERIMENTAL EVALUATION
In this section the proposed BDQM is tested on a number of stan-
dard depth videos undergoing HEVC compression. Each depth
video sequence is encoded at 6 different compression levels, namely
QP={26,30,34,38,42,46} using version HM 11.0 of the HEVC
reference software with Main profile. We selected HEVC as a
benchmark for depth coding since the most promising future 3D
video coding standards, e.g. 3D-HECV, will leverage on it. The
goal of our analysis here is to show that the no reference BDQM
can compete with full reference metrics. Since depth maps are
textureless gray-scale images the visual image quality metrics are
not effective to assess their quality. Peak Signal to Noise Ratio
(PSNR) is usually used to evaluate the quality of depth maps. We
compare BDQM with PSNR to evaluate its performance. In the
following we employ 5 depth videos from standard sequences in
the MPEG and HHI datasets (see details in Tab. 1). The coded
depth quality is evaluated using the proposed BDQM with param-
eters w = 15, τ = 5 and κ = 10 and compared with the PSNR
computed versus the uncoded reference.
To evaluate the performance of BDQM we chose Pearson lin-
Table 1. Test dataset details: number of frames in the video (#F), view
number (V) and frame rate (FR).
Sequence #F V View Size FR Provider
Poznan Hall2 200 7 1920 × 1088 25 Poznan Univ. of Tech.
Poznan Street 250 5 1920 × 1088 25 Poznan Univ. of Tech.
Kendo 300 1 1024 × 768 30 Nagoya University
Balloons 300 1 1024 × 768 30 Nagoya University
Book Arrival 100 10 1024 × 768 16 Fraunhofer HHI
Table 2. Performance Evaluation of proposed BDQM.
Sequence PLCC RMSE MAE
Poznan Hall2 0.9808 0.6056 0.5131
Poznan Street 0.9941 0.2438 0.2036
Kendo 0.9985 0.1588 0.1276
Balloons 0.9978 0.1554 0.1466
Book Arrival 0.9889 0.3187 0.2796
Average: 0.9920 0.2965 0.2541
ear correlation coefficient (PLCC) for prediction accuracy test and
Spearman rank order correlation coefficient (SROCC) and Kendall
rank order correlation coefficient (KROCC) for Prediction Mono-
tonicity test. To estimate the prediction error we compute Root
Mean Square Error (RMSE) and Mean Absolute Error (MAE)
measures. Before computing these performance parameters, ac-
cording to Video Quality Expert Group (VQEG) recommenda-
tions [24] the BDQM predicted scores Q are mapped to PSNR
with a monotonic nonlinear regression function. The following
logistic function outlined in [25] is used for regression mapping:
Q
p
= β
1
1
2
−
1
exp β
2
(Q − β
3
)
+ β
4
Q + β
5
(4)
wherer Q
p
are the mapped score and β
1
, · · · , β
5
are the regression
model parameters.
The performance parameters discussed above are reported in
Tab. 2 for each test sequence. The table shows that the proposed
BDQM achieves very high correlation with PSNR in every experi-
ment with an average PLCC of 0.9920. The SROCC and KROCC
are equal to 1 in all experiments as the predicted scores are mono-
tonic. The average prediction error in terms of RMSE and MAE
turns to be 0.29 and 0.25, respectively. All the collected results
demonstrate the accuracy of the proposed quality metric. To fur-
ther evaluate the reliability of BDQM the performance parameters
have been computed over the entire dataset, i.e. without consid-
ering the 5 videos as separated experiments; such an approach al-
lows one to understand if BDQM can be used not only to rank the
quality of different compression levels of the same content but also
to compare different scores of different videos. The results of this
global analysis are shown in Tab. 3. The PLCC achieved over the
entire dataset turns to be 0.9076, showing again high correlation
between BDQM and PSNR. The values of SROCC and KROCC
are equal to 0.8439 and 0.7089 respectively, demonstrating the
good monotonicity between the two metrics also when BDQM is
used to compare different video contents. Clearly, the statistics
presented in Tab. 2 and 3 show that the quality scores predicted
by the proposed metric are quite accurate and reliable. Finally,
in Fig. 4 we show the scatter plot of the predicted scores versus
PSNR over the complete dataset to let the reader visually appreci-
ate the obtained level of correlation. More details on experimental
evaluation and a software release of the proposed BDQM metric
can be found at: http://www.di.unito.it/~farid/3DQA/BDQM.html.
Table 3. Performance of BDQM over entire dataset.
PLCC SROCC KROCC RMSE MAE
0.9076 0.8439 0.7089 1.7498 1.4902
35 40 45 50 55 60 65 70 75
32
36
40
44
48
52
BDQM
PSNR
Figure 4. Scatter plot of BDQM versus PSNR over entire dataset and curve
fitted with logistic function.
4. CONCLUSIONS AND FUTURE WORK
In this paper a novel no-reference metric able to rank the com-
pression artifacts of depth maps has been presented. The pro-
posed algorithm leverages on the observation that depth images
are characterized by flat regions with sharp boundaries that are
potentially blurred after compression. The proposed algorithm
estimates depth quality by measuring the blurriness of the com-
pression sensitive regions of the depth image using a histogram
based approach. The experimental results show that BDQM ex-
hibits high prediction accuracy when compared to full reference
PSNR metric.
BDQM can be integrated with no-reference image quality met-
rics to design novel 3D image quality scores that, in addition to
texture image also consider the depth image to better estimate
the overall quality. Another future application that we foresee is
the use of BDQM within the rate distortion optimization stage of
depth map compression algorithms. Since BDQM is based on the
estimation of the quality of sharp transitions in the depth map it
is expected to be a valuable instrument for predicting textural and
structural distortions in synthesized images.
5. REFERENCES
[1] M. Tanimoto, “FTV: Free-viewpoint Television,” Signal
Process.-Image Commun., vol. 27, no. 6, pp. 555 – 570,
2012.
[2] M.P. Tehrani et al., “Proposal to consider a new work item
and its use case - rei : An ultra-multiview 3D display,”
ISO/IEC JTC1/SC29/WG11/m30022, July-Aug 2013.
[3] C. Fehn, “Depth-image-based rendering (DIBR), compres-
sion, and transmission for a new approach on 3D-TV,” in
SPIE Electron. Imaging, 2004, pp. 93–104.
[4] M. Domanski et al., “High efficiency 3D video coding us-
ing new tools based on view synthesis,” IEEE Trans. Image
Process., vol. 22, no. 9, pp. 3517–3527, 2013.
[5] M.S. Farid et al., “Panorama view with spatiotemporal oc-
clusion compensation for 3D video coding,” IEEE Trans.
Image Process., vol. 24, no. 1, pp. 205–219, Jan 2015.
[6] T. Maugey, A. Ortega, and P. Frossard, “Graph-based repre-
sentation for multiview image geometry,” IEEE Trans. Im-
age Process., vol. 24, no. 5, pp. 1573–1586, May 2015.
[7] M.S. Farid et al., “A panoramic 3D video coding with di-
rectional depth aided inpainting,” in Proc. Int. Conf. Image
Process. (ICIP), Oct 2014, pp. 3233–3237.
[8] T. Wiegand et al., “Overview of the H.264/AVC video cod-
ing standard,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 13, no. 7, pp. 560–576, July 2003.
[9] G.J. Sullivan et al., “Overview of the high efficiency video
coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video
Technol., vol. 22, no. 12, pp. 1649–1668, 2012.
[10] G.J. Sullivan et al., “Standardized Extensions of High Effi-
ciency Video Coding (HEVC),” IEEE J. Sel. Topics Signal
Process., vol. 7, no. 6, pp. 1001–1016, Dec 2013.
[11] K. Muller et al., “3D High-Efficiency Video Coding for
Multi-View Video and Depth Data,” IEEE Trans. Image Pro-
cess., vol. 22, no. 9, pp. 3366–3378, Sept 2013.
[12] P. Merkle et al., “The effects of multiview depth video com-
pression on multiview rendering,” Signal Processing: Image
Communication, vol. 24, no. 1, pp. 73–88, 2009.
[13] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edges
shape enforcement for visual enhancement of depth image
based rendering,” in IEEE 15th Int. Workshop Multimedia
Signal Process. (MMSP), 2013, pp. 406–411.
[14] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edge en-
hancement of depth based rendered images,” in Proc. Int.
Conf. Image Process. (ICIP), 2014, pp. 5452 – 5456.
[15] Q. Huynh-Thu, P. Le Callet, and M. Barkowsky, “Video
quality assessment: From 2d to 3d - challenges and future
trends,” in Proc. ICIP, Sept 2010, pp. 4025–4028.
[16] F. Speranza et al., “Effect of disparity and motion on visual
comfort of stereoscopic images,” in SPIE Electron. Imaging,
2006.
[17] E. Bosc et al., “Towards a new quality metric for 3-d synthe-
sized view assessment,” IEEE J. Sel. Topics Signal Process.,
vol. 5, no. 7, pp. 1332–1343, Nov 2011.
[18] P. Hanhart and T. Ebrahimi, “Quality assessment of a stereo
pair formed from decoded and synthesized views using ob-
jective metrics,” in Proc. 3DTV-CON, Oct 2012, pp. 1–4.
[19] E. Ekmekcioglu et al., “Depth based perceptual quality as-
sessment for synthesised camera viewpoints,” in User Cen-
tric Media, vol. 60, pp. 76–83. 2012.
[20] Z. Wang et al., “Image quality assessment: from error visi-
bility to structural similarity,” IEEE Trans. Image Process.,
vol. 13, no. 4, pp. 600–612, April 2004.
[21] A. Benoit, P. Le Callet, P. Campisi, and R. Cousseau, “Qual-
ity assessment of stereoscopic images,” EURASIP J. Image
Video Process., vol. 2008, 2009.
[22] M. Carnec, P. Le Callet, and D. Barba, “An image quality
assessment method based on perception of structural infor-
mation,” in Proc. ICIP, Sept 2003, vol. 3, pp. 2284–2298.
[23] A. Boev et al., “Towards compound stereo-video qual-
ity metric: a specific encoder-based framework,” in IEEE
Southwest Symp. Image Anal. Interp., 2006, pp. 218–222.
[24] VQEG, “RRNR-TV Group Test Plan,” 2007, Version 2.2.
[25] H.R. Sheikh, M.F. Sabir, and A.C. Bovik, “A statistical eval-
uation of recent full reference image quality assessment al-
gorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp.
3440–3451, Nov 2006.