Conference PaperPDF Available

Blind depth quality assessment using histogram shape analysis

Authors:

Abstract and Figures

Multiview videos plus depth (MVD) is a popular 3D video representation where pixel depth information is exploited to generate additional views to provide 3D experience. Quality assessment of MVD data is of paramount importance since the latest research results show that existing 2D quality metrics are not suitable for MVD. This paper focuses on depth quality assessment and presents a novel algorithm to estimate the distortion in depth videos induced by compression. The proposed algorithm is no-reference and does not require any prior training or modeling. The proposed method is based solely on the statistical analysis of the compression sensitive pixels of depth images. The experimental results worked out on a standard MVD dataset show that the proposed algorithm exhibits a very high correlation with conventional full-reference metrics.
Content may be subject to copyright.
BLIND DEPTH QUALITY ASSESSMENT USING HISTOGRAM SHAPE
ANALYSIS
Muhammad Shahid Farid, Maurizio Lucenteforte, Marco Grangetto
Dipartimento di Informatica, Universit
`
a degli Studi di Torino
Corso Svizzera 185, 10149 Torino, Italy
ABSTRACT
Multiview videos plus depth (MVD) is a popular 3D video rep-
resentation where pixel depth information is exploited to gener-
ate additional views to provide 3D experience. Quality assess-
ment of MVD data is of paramount importance since the latest
research results show that existing 2D quality metrics are not suit-
able for MVD. This paper focuses on depth quality assessment
and presents a novel algorithm to estimate the distortion in depth
videos induced by compression. The proposed algorithm is no-
reference and does not require any prior training or modeling. The
proposed method is based solely on the statistical analysis of the
compression sensitive pixels of depth images. The experimental
results worked out on a standard MVD dataset show that the pro-
posed algorithm exhibits a very high correlation with conventional
full-reference metrics.
Index Terms Depth image quality metric, Free-viewpoint
TV, Depth image based rendering, Quality assessment
1. INTRODUCTION
Multiview-video-plus-depth format for 3D content representation
has been adopted for current and future 3D television technolo-
gies e.g. free-viewpoint television (FTV) [1] and Super Multiview
(SMV) displays [2]. The gray scale depth image represents the
per pixel depth value of the corresponding texture image which
is exploited to generate novel views through depth image based
rendering (DIBR) [3]. In MVD format only few views with their
associated depth maps are coded and transmitted.
The compression of MVD data is indeed an important activity
in 3D television framework and much attention has been devoted
to this research area. To efficiently compress MVD data various
coding formats have been proposed and new tools have been de-
veloped, e.g., [4–7]. Advanced Video Coding (H.264/AVC) [8]
has been used in past to encode the texture videos and depth videos
independently, also known as simulcast coding. The novel High
Efficiency Video Coding (HEVC) [9] is the current state of the
art video coding tool. The Joint Collaborative Team on 3D Video
Coding Extension Development (JCT-3V) has recently developed
extensions of HEVC to efficiently encode multiview videos and
MVD data. Multiview-HEVC (MV-HEVC) [10] extends the HEVC
syntax to encode MVD without additional coding tools whereas
3D-HEVC [11] is expressively dedicated to the design of novel
coding techniques for MVD. 3D-HEVC encodes the base view
with its depth map using unmodified HEVC whereas the depen-
dent views and their depth maps are encoded by exploiting ad-
ditional coding tools. 3D-HEVC achieves the best compression
This work was partially supported by Universit
´
a degli Studi di Torino
and Compagnia di San Paolo under project AMALFI (ORTO119W8J).
ratio for MVD data [11]. To achieve autostereoscopy additional
intermediate viewpoints can be generated with DIBR on the re-
ceiver side. Given a DIBR algorithm, the perceptual quality of the
rendered images depends on both texture and depth image quality.
Quality of the depth map is particularly important as the com-
pression artifacts in depth maps can cause structural and textural
distortions in the synthesized image [12–14] resulting in poor 3D
experience.
3D image and video quality assessment is a more difficult and
complex problem compared to its 2D counterpart. Earlier, 2D im-
age quality metrics have been used to quantify the quality of 3D
images (video plus depth) and stereoscopic images. In this con-
text, the 2D metrics have been used in two ways: some metrics
estimate the quality by assessing each texture image separately
and aggregating the values without considering the depth images.
Others exploit the depth maps in addition to texture image qual-
ity to predict the overall quality. However, due to different nature
of acquisition, representation, transmission and rendering, 3D im-
ages are affected by different types of quality artifacts [15, 16].
Recent studies [17, 18] tested various existing 2D image quality
metrics to assess the quality of stereoscopic and 3D images and
concluded that none of the existing 2D quality metrics is suitable
in this context.
Ekmekcioglu et.al [19] proposed a 3D quality assessment al-
gorithm based on weighted PSNR and SSIM [20]. They propose
to weight each pixel quality value (PSNR or SSIM) with the corre-
sponding depth value to increase the contribution of pixels closer
to the camera; indeed, according to their study the closer the pixel,
the larger the impact on visual perception. The 3D QA proposed
in [21] combines SSIM and C4 [22] with disparity estimation to
compute a single quality metric. The two measures are then inte-
grated (globally or locally) to obtain the final quality value. Boev
et. al. [23] proposed a full-reference multi-scale stereo video QA
algorithm that computes the monoscopic artifacts from the texture
images and stereoscopic artifacts from the disparity images. Cy-
clopean images are constructed from the reference and the test
stereopairs with block based matching; SSIM is used to quan-
tify the monoscopic artifacts (2D artifacts like blur, noise, etc).
The perceptual disparity maps computed for test and reference
stereopairs are compared to estimate the binocular distortions (e.g.
keystone, color distortion).
Most existing 3D quality metrics are full reference and few
consider depth maps in the evaluation. As already described, qual-
ity of depth images is very important due to their role in interme-
diate view generation. Moreover, no-reference 3D quality evalua-
tion is of fundamental importance since the corresponding original
views may not be available; indeed, cost, hardware and bandwidth
constraints usually impose to capture a limited set of views and
the quality of the synthesized view must be estimated in absence
of the corresponding reference. Furthermore, as the depth im-
ages are gray scale textureless images usually consisting of large
homogeneous or linearly changing regions with sharp edges rep-
resenting objects’ boundaries, the conventional 2D visual quality
metrics such as SSIM [20] are not effective to assess the quality
of depth images. As an answer to the mentioned issues, this paper
proposes ‘Blind depth quality metric’ (BDQM), a no-reference al-
gorithm to assess the quality of compressed depth images. The
major contributions of the paper are:
the proposal of a novel no-reference depth quality metric
BDQM for blind evaluation of depth compression artifact;
the shape of the histogram of compression sensitive depth
pixels is exploited to estimate the depth quality; in partic-
ular, we show that as the compression ratio is increased
the histogram around depth transitions flattens because of
smoothing;
BDQM is used to predict the quality of depth images un-
dergoing HEVC compression at various bitrates.
The rest of the paper is organized as follows. In Sect. 2 the pro-
posed algorithm is described. In Sect. 3 experimental results and
comparisons with existing techniques are presented. The research
is concluded in Sect. 4 with a discussion on its various aspects and
possible applications as future work.
2. PROPOSED DEPTH IMAGE QUALITY METRIC
The proposed quality metric works in two steps; first, the com-
pression sensitivity map (CSM) of the depth image is computed
to locate the pixels which are the most susceptible to compression
artifacts. Second, for each compression sensitive pixel (CSP) a
histogram of the neighborhood is constructed and analyzed to de-
termine the quality index. BDQM builds on the key observation
that the histogram around a CSP gets flattened when increasing
the amount of compression; indeed, compression mostly affects
the sharp discontinuities of the depth image. The proposed algo-
rithm exploits the shape of the histogram to predict depth quality.
The following subsections describe each step in detail.
2.1. Computing compression sensitivity map
It is well known that the boundary regions between objects at dif-
ferent depth levels are highly susceptible to compression artifact
compared to the flat homogeneous areas of depth images. There-
fore, the magnitude of the depth gradient can be a simple and ef-
fective means for evaluating compression sensitivity. Let I be an
M × N depth image. The compression sensitivity map (CSM) of
I is computed from its gradient magnitudes as:
CSM =
q
G
2
x
+ G
2
y
(1)
where G
x
and G
y
are gradients along horizontal and vertical di-
rections and can be computed with Sobel filters.
The gradient magnitude can be used to select the compres-
sion sensitive depth pixels that will be used to estimate the qual-
ity index in the following section. Fig. 1a shows a depth image
from Poznan Street sequence (View 5, 1st Frame) and its corre-
sponding gradient representing the CSM (Fig. 1b). Thresholding
by dropping the pixels with CSM τ is used to locate the most
compression sensitive pixels; please note that this choice has also
a positive side effect since it dramatically reduces the computa-
tional cost of the whole metric. As an example, Fig. 1c shows the
CSM after thresholding with τ = 4.
(a) (b)
(c)
Figure 1. Depth saliency detection: a depth image (a), its CSM (b), thresh-
olded CSM (τ = 4) (c).
2.2. Depth quality index
The CSM computed in the previous step is used to estimate the
quality of the depth image. The CSPs defined above belong to
the sharp discontinuities representing the boundaries between two
usually very flat or linearly changing regions at different depth
levels. To quantify the effect of compression the neighborhood
of the CSPs is examined to determine the smoothness induced by
quantization. A local histogram is constructed and analyzed to in-
fer the presence of compression effects. As the CSPs lie on or in
the proximity of the boundary between two different depth levels,
the histogram appears to be very peaked around two bins. In pres-
ence of compression, the depth transitions tend to be smoothed
and the effect can be captured by a local histogram where the two
peaks are less pronounced and the values are more equally dis-
tributed in between. Fig. 2 shows a sample histogram of a CSP
(neighborhood of size 15×15) from Poznan Street sequence com-
pressed with HEVC with quantization parameter QP=30 (Fig. 2a)
and QP=42 (Fig. 2b), respectively. The histogram is computed
onto 10 equal bins. Two very high peaks with values above 85
can observed in Fig. 2a showing that the depth values are concen-
trated around two bins whereas the rest of the histogram is very
sparse and almost empty. In Fig. 2b it can be noted that the his-
togram of the same region exhibits lower peaks and higher valley
between them when QP=42: a drop of 30 and 15 can be observed
in the two peaks respectively along with increased values of the
bins in-between. We can conclude that higher compression makes
the histogram flatter.
To predict the quality of a depth image we propose to estimate
the histogram dispersion by measuring the area which lies on top
1 2 3 4
5 6
7 8 9 10
0
20
40
60
80
100
Bin
Bin Size
(a)
1 2 3 4 5 6 7 8 9 10
0
20
40
60
80
100
Bin
Bin Size
(b)
Figure 2. Histogram of a salient pixel from Poznan Street test sequence,
view 5, frame 1 at QP=30 (a), and QP=42 (b).
(a) (b) (c)
Figure 3. Predicting the quality index. (a) QP=30, Q
i
= 675, (b) QP=42,
Q
i
= 525, (c) QP=46, Q
i
= 375.
of the histogram curve (see gray area in Fig. 3): the larger the
area, the less compressed is depth. An area value is associated to
each CSP and then averaged together to compute the final quality
index. Let S be the set of CSPs of depth image I and let p
i
S be
a CSP with coordinates (x, y) | {1 x M ; 1 y N }.
For each p
i
S, we select a patch P
i
of size w × w centered
at (x, y) and calculate the corresponding local histogram. Let H
κ
i
denote the histogram distribution of patch P
i
with κ equally sized
bins. The quality index Q
i
of p
i
is defined as:
Q
i
=
κ
X
t=1
[max(H
κ
i
) H
κ
i
(t)] (2)
Fig. 3 graphically shows the proposed quality index. The figure
shows distribution curves of a sample CSP of the first frame of
Poznan Street test sequence coded by HEVC with different QP.
The blue line represents the histogram distribution whereas the
area inside the curve is shadowed in gray. One can note that, as
we conjectured above, the histogram area is decreasing when QP
increases. Finally, the Q
i
value of all CSPs is averaged to obtain
the quality of depth image I.
BDQM =
1
|S|
|S|
X
i=1
Q
i
(3)
where |S| represents the size of S. Blind depth quality metric
(BDQM) is computed for each frame of the depth video and the
values are averaged to predict the quality of a whole video se-
quence. BDQM is a quality measure that means the larger the
value of BDQM is, the better the quality of the depth map.
3. EXPERIMENTAL EVALUATION
In this section the proposed BDQM is tested on a number of stan-
dard depth videos undergoing HEVC compression. Each depth
video sequence is encoded at 6 different compression levels, namely
QP={26,30,34,38,42,46} using version HM 11.0 of the HEVC
reference software with Main profile. We selected HEVC as a
benchmark for depth coding since the most promising future 3D
video coding standards, e.g. 3D-HECV, will leverage on it. The
goal of our analysis here is to show that the no reference BDQM
can compete with full reference metrics. Since depth maps are
textureless gray-scale images the visual image quality metrics are
not effective to assess their quality. Peak Signal to Noise Ratio
(PSNR) is usually used to evaluate the quality of depth maps. We
compare BDQM with PSNR to evaluate its performance. In the
following we employ 5 depth videos from standard sequences in
the MPEG and HHI datasets (see details in Tab. 1). The coded
depth quality is evaluated using the proposed BDQM with param-
eters w = 15, τ = 5 and κ = 10 and compared with the PSNR
computed versus the uncoded reference.
To evaluate the performance of BDQM we chose Pearson lin-
Table 1. Test dataset details: number of frames in the video (#F), view
number (V) and frame rate (FR).
Sequence #F V View Size FR Provider
Poznan Hall2 200 7 1920 × 1088 25 Poznan Univ. of Tech.
Poznan Street 250 5 1920 × 1088 25 Poznan Univ. of Tech.
Kendo 300 1 1024 × 768 30 Nagoya University
Balloons 300 1 1024 × 768 30 Nagoya University
Book Arrival 100 10 1024 × 768 16 Fraunhofer HHI
Table 2. Performance Evaluation of proposed BDQM.
Sequence PLCC RMSE MAE
Poznan Hall2 0.9808 0.6056 0.5131
Poznan Street 0.9941 0.2438 0.2036
Kendo 0.9985 0.1588 0.1276
Balloons 0.9978 0.1554 0.1466
Book Arrival 0.9889 0.3187 0.2796
Average: 0.9920 0.2965 0.2541
ear correlation coefficient (PLCC) for prediction accuracy test and
Spearman rank order correlation coefficient (SROCC) and Kendall
rank order correlation coefficient (KROCC) for Prediction Mono-
tonicity test. To estimate the prediction error we compute Root
Mean Square Error (RMSE) and Mean Absolute Error (MAE)
measures. Before computing these performance parameters, ac-
cording to Video Quality Expert Group (VQEG) recommenda-
tions [24] the BDQM predicted scores Q are mapped to PSNR
with a monotonic nonlinear regression function. The following
logistic function outlined in [25] is used for regression mapping:
Q
p
= β
1
1
2
1
exp β
2
(Q β
3
)
+ β
4
Q + β
5
(4)
wherer Q
p
are the mapped score and β
1
, · · · , β
5
are the regression
model parameters.
The performance parameters discussed above are reported in
Tab. 2 for each test sequence. The table shows that the proposed
BDQM achieves very high correlation with PSNR in every experi-
ment with an average PLCC of 0.9920. The SROCC and KROCC
are equal to 1 in all experiments as the predicted scores are mono-
tonic. The average prediction error in terms of RMSE and MAE
turns to be 0.29 and 0.25, respectively. All the collected results
demonstrate the accuracy of the proposed quality metric. To fur-
ther evaluate the reliability of BDQM the performance parameters
have been computed over the entire dataset, i.e. without consid-
ering the 5 videos as separated experiments; such an approach al-
lows one to understand if BDQM can be used not only to rank the
quality of different compression levels of the same content but also
to compare different scores of different videos. The results of this
global analysis are shown in Tab. 3. The PLCC achieved over the
entire dataset turns to be 0.9076, showing again high correlation
between BDQM and PSNR. The values of SROCC and KROCC
are equal to 0.8439 and 0.7089 respectively, demonstrating the
good monotonicity between the two metrics also when BDQM is
used to compare different video contents. Clearly, the statistics
presented in Tab. 2 and 3 show that the quality scores predicted
by the proposed metric are quite accurate and reliable. Finally,
in Fig. 4 we show the scatter plot of the predicted scores versus
PSNR over the complete dataset to let the reader visually appreci-
ate the obtained level of correlation. More details on experimental
evaluation and a software release of the proposed BDQM metric
can be found at: http://www.di.unito.it/~farid/3DQA/BDQM.html.
Table 3. Performance of BDQM over entire dataset.
PLCC SROCC KROCC RMSE MAE
0.9076 0.8439 0.7089 1.7498 1.4902
35 40 45 50 55 60 65 70 75
32
36
40
44
48
52
BDQM
PSNR
Figure 4. Scatter plot of BDQM versus PSNR over entire dataset and curve
fitted with logistic function.
4. CONCLUSIONS AND FUTURE WORK
In this paper a novel no-reference metric able to rank the com-
pression artifacts of depth maps has been presented. The pro-
posed algorithm leverages on the observation that depth images
are characterized by flat regions with sharp boundaries that are
potentially blurred after compression. The proposed algorithm
estimates depth quality by measuring the blurriness of the com-
pression sensitive regions of the depth image using a histogram
based approach. The experimental results show that BDQM ex-
hibits high prediction accuracy when compared to full reference
PSNR metric.
BDQM can be integrated with no-reference image quality met-
rics to design novel 3D image quality scores that, in addition to
texture image also consider the depth image to better estimate
the overall quality. Another future application that we foresee is
the use of BDQM within the rate distortion optimization stage of
depth map compression algorithms. Since BDQM is based on the
estimation of the quality of sharp transitions in the depth map it
is expected to be a valuable instrument for predicting textural and
structural distortions in synthesized images.
5. REFERENCES
[1] M. Tanimoto, “FTV: Free-viewpoint Television, Signal
Process.-Image Commun., vol. 27, no. 6, pp. 555 570,
2012.
[2] M.P. Tehrani et al., “Proposal to consider a new work item
and its use case - rei : An ultra-multiview 3D display,
ISO/IEC JTC1/SC29/WG11/m30022, July-Aug 2013.
[3] C. Fehn, “Depth-image-based rendering (DIBR), compres-
sion, and transmission for a new approach on 3D-TV, in
SPIE Electron. Imaging, 2004, pp. 93–104.
[4] M. Domanski et al., “High efficiency 3D video coding us-
ing new tools based on view synthesis, IEEE Trans. Image
Process., vol. 22, no. 9, pp. 3517–3527, 2013.
[5] M.S. Farid et al., “Panorama view with spatiotemporal oc-
clusion compensation for 3D video coding, IEEE Trans.
Image Process., vol. 24, no. 1, pp. 205–219, Jan 2015.
[6] T. Maugey, A. Ortega, and P. Frossard, “Graph-based repre-
sentation for multiview image geometry, IEEE Trans. Im-
age Process., vol. 24, no. 5, pp. 1573–1586, May 2015.
[7] M.S. Farid et al., A panoramic 3D video coding with di-
rectional depth aided inpainting, in Proc. Int. Conf. Image
Process. (ICIP), Oct 2014, pp. 3233–3237.
[8] T. Wiegand et al., “Overview of the H.264/AVC video cod-
ing standard, IEEE Trans. Circuits Syst. Video Technol.,
vol. 13, no. 7, pp. 560–576, July 2003.
[9] G.J. Sullivan et al., “Overview of the high efficiency video
coding (HEVC) standard, IEEE Trans. Circuits Syst. Video
Technol., vol. 22, no. 12, pp. 1649–1668, 2012.
[10] G.J. Sullivan et al., “Standardized Extensions of High Effi-
ciency Video Coding (HEVC), IEEE J. Sel. Topics Signal
Process., vol. 7, no. 6, pp. 1001–1016, Dec 2013.
[11] K. Muller et al., “3D High-Efficiency Video Coding for
Multi-View Video and Depth Data,IEEE Trans. Image Pro-
cess., vol. 22, no. 9, pp. 3366–3378, Sept 2013.
[12] P. Merkle et al., “The effects of multiview depth video com-
pression on multiview rendering, Signal Processing: Image
Communication, vol. 24, no. 1, pp. 73–88, 2009.
[13] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edges
shape enforcement for visual enhancement of depth image
based rendering, in IEEE 15th Int. Workshop Multimedia
Signal Process. (MMSP), 2013, pp. 406–411.
[14] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edge en-
hancement of depth based rendered images, in Proc. Int.
Conf. Image Process. (ICIP), 2014, pp. 5452 – 5456.
[15] Q. Huynh-Thu, P. Le Callet, and M. Barkowsky, “Video
quality assessment: From 2d to 3d - challenges and future
trends, in Proc. ICIP, Sept 2010, pp. 4025–4028.
[16] F. Speranza et al., “Effect of disparity and motion on visual
comfort of stereoscopic images, in SPIE Electron. Imaging,
2006.
[17] E. Bosc et al., “Towards a new quality metric for 3-d synthe-
sized view assessment, IEEE J. Sel. Topics Signal Process.,
vol. 5, no. 7, pp. 1332–1343, Nov 2011.
[18] P. Hanhart and T. Ebrahimi, “Quality assessment of a stereo
pair formed from decoded and synthesized views using ob-
jective metrics, in Proc. 3DTV-CON, Oct 2012, pp. 1–4.
[19] E. Ekmekcioglu et al., “Depth based perceptual quality as-
sessment for synthesised camera viewpoints, in User Cen-
tric Media, vol. 60, pp. 76–83. 2012.
[20] Z. Wang et al., “Image quality assessment: from error visi-
bility to structural similarity, IEEE Trans. Image Process.,
vol. 13, no. 4, pp. 600–612, April 2004.
[21] A. Benoit, P. Le Callet, P. Campisi, and R. Cousseau, “Qual-
ity assessment of stereoscopic images, EURASIP J. Image
Video Process., vol. 2008, 2009.
[22] M. Carnec, P. Le Callet, and D. Barba, An image quality
assessment method based on perception of structural infor-
mation, in Proc. ICIP, Sept 2003, vol. 3, pp. 2284–2298.
[23] A. Boev et al., “Towards compound stereo-video qual-
ity metric: a specific encoder-based framework, in IEEE
Southwest Symp. Image Anal. Interp., 2006, pp. 218–222.
[24] VQEG, “RRNR-TV Group Test Plan,” 2007, Version 2.2.
[25] H.R. Sheikh, M.F. Sabir, and A.C. Bovik, A statistical eval-
uation of recent full reference image quality assessment al-
gorithms, IEEE Trans. Image Process., vol. 15, no. 11, pp.
3440–3451, Nov 2006.
... Moreover, the recent researches on the quality assessment of DIBR-synthesized images agree that in addition to texture images, the quality of the depth map must also be considered for a true judgment 3D quality [32,33]. In [34], a blind depth quality assessment metric (BDQM) is presented to predict the compression distortion in depth videos in the absence of reference videos. The present research is based on BDQM; however, it considers the visual attributes such as depth saliency and viewing conditions while evaluating the quality of depth videos. ...
... All the 3D-IQA metrics described in this section target estimating the quality of the 3D images using the texture and depth images. Blind depth quality metric (BDQM) [34] is the only metric which is specifically designed to estimate the quality of the depth maps. It is a reference-less metric which constructs a local histogram at each compression-sensitive pixel in the depth image and analyzes its shape to quantify the compression distortion. ...
... The DSM map is thresholded to drop the less distortion-sensitive pixels. In the third step, the BDQM [34] algorithm is used to estimate the distortion value for each distortion-sensitive pixel. Finally, the distortion values are averaged to calculate the overall quality of the depth image. ...
Article
Full-text available
Multiview video plus depth (MVD) is the most popular 3D video format due to its efficient compression and provision for novel view generation enabling the free-viewpoint applications. In addition to color images, MVD format provides depth maps which are exploited to generate intermediate virtual views using the depth image-based rendering (DIBR) techniques. Compression affects the quality of the depth maps which in turn may introduce various structural and textural distortions in the DIBR-synthesized images. Estimation of the compression-related distortion in depth maps is very important for a high-quality 3D experience. The task becomes challenging when the corresponding reference depth maps are unavailable, e.g., when evaluating the quality on the decoder side. In this paper, we present a no-reference quality assessment algorithm to estimate the distortion in the depth maps induced by compression. The proposed algorithm exploits the depth saliency and local statistical characteristics of the depth maps to predict the compression distortion. The proposed ‘depth distortion evaluator’ (DDE) is evaluated on depth videos from standard MVD database compressed with the state-of-the-art high-efficiency video coding at various quality levels. The results demonstrate that DDE can be used to effectively estimate the compression distortion in depth videos.
... Manuscript Up to now, only several metrics have been proposed to evaluate the quality of depth images. Farid et al. proposed a Blind Depth Quality Metric (BDQM) based on the sharpness of the compression sensitive regions [4]. In [5], Xiang et al. first measured the misalignment error between the edges of color and depth images. ...
... To demonstrate the advantages of the proposed DSS metric, three recent depth image quality metrics are compared, including BDQM [4], BPR [5] and SEP [7]. Furthermore, five generalpurpose no-reference (NR) IQA metrics for natural images are also employed for comparison, including NIQE [22], BRISQUE [23], BLIINDS-II [24], DESIQUE [25], and BIQA [26]. ...
Article
Full-text available
The quality of depth images is crucial for virtual view synthesis. However, the quality assessment of depth images is still largely unexplored. This letter presents a blind quality metric of Depth image based on Structural Statistics (DSS). The design philosophy is inspired by the fact that structural distortion in the depth images usually leads to geometric distortion, which is the main cause for degraded quality of synthesized views. Specifically, the statistical features for shape and orientation are calculated based on discrete orthogonal moments and gradients, generating two groups of quality-aware features. Then, the quality model is built from the extracted statistical features using a regression module. The experimental results demonstrate the effectiveness of the proposed metric.
... The DDM locates the noise sensitive pixels in the original depth maps and use them to evaluate the quality of the distorted depth maps using histogram shape analysis technique. A similar approach for totally blind evaluation of depth maps quality has been recently proposed in [43] in the particular case of HEVC depth compression only. ...
Article
Full-text available
Three Dimensional (3D) image quality assessment is a challenging problem as compared to 2D images due to their different nature of acquisition, representation, coding, and display. The additional dimension of depth in multiview video plus depth (MVD) format is exploited to obtain images at novel intermediate viewpoints using depth image based rendering (DIBR) techniques, enabling 3D television and free-viewpoint television (FTV) applications. Depth maps introduce various quality artifacts in the DIBR-synthesized (virtual) images. In this paper, we propose a novel methodology to evaluate the quality of synthesized views in absence of the corresponding original reference views. It computes the statistical characteristics of the side views from whom the virtual view is generated, and fuses this information to estimate the statistical characteristics of the cyclopean image which are compared to those of the synthesized image to evaluate its quality. In addition to texture images, the proposed algorithm also considers the depth maps in evaluating the quality of the synthesized images. The algorithm blends two quality metrics, one estimating the texture distortion in the synthesized texture image induced by compression, transmission, 3D warping, or other causes and the second one determining the distortion of the depth maps. The two metrics are combined to obtain an overall quality assessment of the synthesized image. The proposed Synthesized Image Quality Metric (SIQM) is tested on the challenging MCL-3D and SIAT-3D datasets. The evaluation results show that the proposed metric significantly improves over state-of-the-art 3D image quality assessment algorithms.
... View synthesis quality assessment (VSQA) algorithm [23] combines SSIM with a weighting map which is computed from the contrast, orientation and texture maps of the reference and the synthesized images. A blind quality assessment algorithm to assess the compression distortion in the depth maps is proposed in [24]. A morphological wavelet based peak signal-tonoise ratio measure (MW-PSNR) is proposed in [25] which estimates the structural distortion in the synthesized image to assess its quality. ...
Conference Paper
Full-text available
Multiview video plus depth (MVD) is the most popular 3D video format where the texture images contain the color information and the depth maps represent the geometry of the scene. The depth maps are exploited to obtain intermediate views to enable 3D-TV and free-viewpoint applications using the depth image based rendering (DIBR) techniques. DIBR is used to get an estimate of the intermediate views but has to cope with depth errors, occlusions, imprecise camera parameters , re-interpolation, to mention a few issues. Therefore , being able to evaluate the true perceptual quality of synthesized images is of paramount importance for a high quality 3D experience. In this paper, we present a novel algorithm to assess the quality of the synthesized images in the absence of the corresponding references. The algorithm uses the original views from which the virtual image is generated to estimate the distortion induced by the DIBR process. In particular, a block-based perceptual feature matching based on signal phase congruency metric is devised to estimate the synthesis distortion. The experiments worked out on standard DIBR synthesized database show that the proposed algorithm achieves high correlation with the subjective ratings and out-performs the existing 3D quality assessment algorithms. Index Terms— Quality assessment, depth image based rendering, view synthesis, Free-viewpoint TV
Article
Virtual view synthesis has been increasingly popular due to the wide applications of multi-view and free-viewpoint videos. In view synthesis, texture images are rendered to generate the new viewpoint with the guidance of the depth images. The quality of depth images is vital for generating high-quality synthesized views. While the impact of texture image and the rendering process on the quality of the synthesized view has been extensively studied, the quality evaluation of depth images remains largely unexplored. With this motivation, this paper presents a no-reference image quality index for depth maps by modeling the statistics of edge profiles (SEP) in a multi-scale framework. The Canny operator is first utilized to locate the edges in depth images. Then the edge profiles are constructed, based on which the first-order and second-order statistical features are extracted for portraying the distortions in depth images. Finally, the random forest is employed for building the quality assessment model for depth maps. Experiments are conducted on two annotated view synthesis image/video quality databases. The experimental results and comparisons demonstrate that the proposed metric outperforms the relevant state-of-the-art quality metrics by a large margin. Furthermore, it has better generalization ability.
Article
In three-dimensional (3D) video system, automatically predicting the quality of synthesized 3D video based on the inputs of color and depth videos is an urgent but very difficult task, while the existing full-reference methods usually measure the perceptual quality of the synthesized video. In this paper, a high efficiency view synthesis quality prediction (HEVSQP) metric for view synthesis is proposed. Based on the derived VSQP model that quantifies the influences of color and depth distortions and their interactions in determining the perceptual quality of 3D synthesized video, color-involved VSQP (CI-VSQP) and depth-involved VSQP (DI-VSQP) indexes are predicted respectively, and are combined to yield a HEVSQP index. Experimental results on our constructed NBU-3D Synthesized Video Quality Database demonstrate that, the proposed HEVSOP has a good performance evaluated on the entire synthesized video quality database, compared with other FR and no-reference video quality assessment (VQA) metrics.
Conference Paper
Full-text available
Depth image based rendering is a well-known technology for the generation of virtual views in between a limited set of views acquired by a cameras array. Intermediate views are rendered by warping image pixels based on their depth. Nonetheless, depth maps are usually imperfect as they need to be estimated through stereo matching algorithms; moreover, for representation and transmission requirements depth values are obviously quantized. Such depth representation errors translate into a warping error when generating intermediate views thus impacting on the rendered image quality. We observe that depth errors turn to be very critical when they affect the object contours since in such a case they cause significant structural distortion in the warped objects. This paper presents an algorithm to improve the visual quality of the synthesized views by enforcing the shape of the edges in presence of erroneous depth estimates. We show that it is possible to significantly improve the visual quality of the interpolated view by enforcing prior knowledge on the admissible deformations of edges under projective transformation. Both visual and objective results show that the proposed approach is very effective.
Conference Paper
Full-text available
The success of 3D and free-viewpoint television largely depends on the efficient representation and compression of 3D video in addition to viable rendering methods. This paper presents a novel 3D video coding technique based on the creation of a panorama view to compact the information of a stereoscopic pair. The panorama view represents the information that would be visible to a virtual camera with a larger field of view embracing all the available views. The information in the panorama view is then used to estimate any intermediate view using depth image based rendering. Furthermore, to fill the disocclusions in the reconstructed view a directional depth aided fast marching inpainting technique is presented. The panorama view and corresponding depth map are amenable to standard video compression. In this paper we show that using the novel HEVC standard the proposed 3D video format can be compressed very efficiently.
Article
Full-text available
The future of novel 3D display technologies largely depends on the design of efficient techniques for 3D video representation and coding. Recently, multiple view plus depth video formats have attracted many research efforts since they enable intermediate view estimation and permit to efficiently represent and compress 3D video sequences. In this paper we present Spatio-Temporal Occlusion compensation with Panorama view (STOP), a novel 3D video coding technique based on the creation of a panorama view and occlusion coding in terms of spatio-temporal offsets. The panorama picture represents most of the visual information acquired from multiple views using a single virtual view, characterized by a larger field of view. Encoding the panorama video with state of the art HECV and representing occlusions with simple spatio-temporal ancillary information STOP achieves high compression ratio and good visual quality with competitive results with respect to competing techniques. Moreover, STOP enables free view point 3D TV applications whilst allowing legacy display to get a bi-dimensional service by using a standard video codec and simple cropping operations.
Conference Paper
Full-text available
When a stereo pair is formed from a decoded view and a syn-thesized view, it is unclear how the overall quality of the stereo pair should be assessed through objective quality metrics. In this paper, this problem is addressed considering a 3D video repre-sented in the format of multiview video plus depth. The perfor-mance of different state-of-the-art 2D quality metrics is analyzed in terms of correlation with subjective perception of video quality. A set of subjective data collected through formal subjective eval-uation tests is used as benchmark. Results show that the measured quality of the decoded view has the highest correlation with per-ceived quality. If the objective quality assessment is based on the measured quality of the synthesized view, it is suggested to use VIF, VQM, MS-SSIM, or SSIM since they significantly outper-form other objective metrics, including PSNR.
Article
Full-text available
This paper describes extensions to the High Efficiency Video Coding (HEVC) standard that are active areas of current development in the relevant international standardization committees. While the first version of HEVC is sufficient to cover a wide range of applications, needs for enhancing the standard in several ways have been identified, including work on range extensions for color format and bit depth enhancement, embedded-bitstream scalability, and 3D video. The standardization of extensions in each of these areas will be completed in 2014, and further work is also planned. The design for these extensions represents the latest state of the art for video coding and its applications.
Conference Paper
Full-text available
Depth image based rendering of intermediate views with high visual quality remains a challenging goal in presence of estimated and quantized depth values. Among the other rendering artifacts we observed that edges are usually affected by significant warping errors. In particular, because of depth estimation inaccuracy around object boundaries the edges may completely loose their original shape during the warping process. Nonetheless, edges represent one of the most important cues for the human visual system. In this paper a novel technique aiming at improving the edge rendering is presented. As opposed to previous approaches, the technique exploits only texture information, thus avoiding possible errors in depth estimation. The idea is based on the enforcement of prior knowledge of the edge shape under projective transformation. The proposed algorithm works in two steps: first the damaged edges of the warped image are detected, then these latter are corrected so as to better approximate their shape in the reference view. Finally the corrected edges are rendered within the intermediate image without introducing noticeable texture artifacts. The proposed algorithm has been tested on a variety of standard video sequences exhibiting excellent results in terms of rendered image visual quality.
Article
Full-text available
In this paper, we propose a new representation for multiview image sets. Our approach relies on graphs to describe geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in the 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. This multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our GBR adapts the accuracy of the geometry representation, in contrast with depth coding, which directly compresses with losses the original geometry signal. We present the principles of this graph-based representation (GBR) and we build a complete prototype coding scheme for multiview images. Experimental results demonstrate the potential of this new representation as compared to a depth-based approach. GBR can achieve a gain of 2 dB in reconstructed quality over depth-based schemes operating at similar rates.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Conference Paper
This paper considers the visual quality assessment for view synthesis in the context of 3D video delivery chain. It is targeted to perceptually quantify the reconstruction quality of synthesised camera viewpoints. It is needed for developing better QoE models related to 3D-TV, as well as for a better representation of the effect of depth maps on views synthesis quality. In this paper, existing 2D video quality assessment methods, like PSNR and SSIM, are extended to assess the perceived quality of synthesised viewpoints based on the depth range. The performance of the extended assessment techniques is measured by correlating multiple sample video assessment scores to that of the Video Quality Metric (VQM) scores, which are a robust reflector of real subjective opinions.