Conference PaperPDF Available

Blind depth quality assessment using histogram shape analysis

July 2015

July 2015

DOI:10.1109/3DTV.2015.7169352

Conference: 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON)
At: Lisbon, Portugal

Authors:

Muhammad Shahid Farid

University of the Punjab

Maurizio Lucenteforte

Università degli Studi di Torino

Marco Grangetto

Università degli Studi di Torino

Multiview videos plus depth (MVD) is a popular 3D video representation where pixel depth information is exploited to generate additional views to provide 3D experience. Quality assessment of MVD data is of paramount importance since the latest research results show that existing 2D quality metrics are not suitable for MVD. This paper focuses on depth quality assessment and presents a novel algorithm to estimate the distortion in depth videos induced by compression. The proposed algorithm is no-reference and does not require any prior training or modeling. The proposed method is based solely on the statistical analysis of the compression sensitive pixels of depth images. The experimental results worked out on a standard MVD dataset show that the proposed algorithm exhibits a very high correlation with conventional full-reference metrics.

. Test dataset details: number of frames in the video (#F), view number (V) and frame rate (FR).

…

. Performance Evaluation of proposed BDQM.

…

. Performance of BDQM over entire dataset.

…

Figures - uploaded by Muhammad Shahid Farid

Content may be subject to copyright.

Content uploaded by Muhammad Shahid Farid

Content may be subject to copyright.

BLIND DEPTH QUALITY ASSESSMENT USING HISTOGRAM SHAPE

ANALYSIS

Muhammad Shahid Farid, Maurizio Lucenteforte, Marco Grangetto

Dipartimento di Informatica, Universit

a degli Studi di Torino

Corso Svizzera 185, 10149 Torino, Italy

ABSTRACT

Multiview videos plus depth (MVD) is a popular 3D video rep-

resentation where pixel depth information is exploited to gener-

ate additional views to provide 3D experience. Quality assess-

ment of MVD data is of paramount importance since the latest

research results show that existing 2D quality metrics are not suit-

able for MVD. This paper focuses on depth quality assessment

and presents a novel algorithm to estimate the distortion in depth

videos induced by compression. The proposed algorithm is no-

reference and does not require any prior training or modeling. The

proposed method is based solely on the statistical analysis of the

compression sensitive pixels of depth images. The experimental

results worked out on a standard MVD dataset show that the pro-

posed algorithm exhibits a very high correlation with conventional

full-reference metrics.

Index Terms — Depth image quality metric, Free-viewpoint

TV, Depth image based rendering, Quality assessment

1. INTRODUCTION

Multiview-video-plus-depth format for 3D content representation

has been adopted for current and future 3D television technolo-

gies e.g. free-viewpoint television (FTV) [1] and Super Multiview

(SMV) displays [2]. The gray scale depth image represents the

per pixel depth value of the corresponding texture image which

is exploited to generate novel views through depth image based

rendering (DIBR) [3]. In MVD format only few views with their

associated depth maps are coded and transmitted.

The compression of MVD data is indeed an important activity

in 3D television framework and much attention has been devoted

to this research area. To efﬁciently compress MVD data various

coding formats have been proposed and new tools have been de-

veloped, e.g., [4–7]. Advanced Video Coding (H.264/AVC) [8]

has been used in past to encode the texture videos and depth videos

independently, also known as simulcast coding. The novel High

Efﬁciency Video Coding (HEVC) [9] is the current state of the

art video coding tool. The Joint Collaborative Team on 3D Video

Coding Extension Development (JCT-3V) has recently developed

extensions of HEVC to efﬁciently encode multiview videos and

MVD data. Multiview-HEVC (MV-HEVC) [10] extends the HEVC

syntax to encode MVD without additional coding tools whereas

3D-HEVC [11] is expressively dedicated to the design of novel

coding techniques for MVD. 3D-HEVC encodes the base view

with its depth map using unmodiﬁed HEVC whereas the depen-

dent views and their depth maps are encoded by exploiting ad-

ditional coding tools. 3D-HEVC achieves the best compression

This work was partially supported by Universit

a degli Studi di Torino

and Compagnia di San Paolo under project AMALFI (ORTO119W8J).

ratio for MVD data [11]. To achieve autostereoscopy additional

intermediate viewpoints can be generated with DIBR on the re-

ceiver side. Given a DIBR algorithm, the perceptual quality of the

rendered images depends on both texture and depth image quality.

Quality of the depth map is particularly important as the com-

pression artifacts in depth maps can cause structural and textural

distortions in the synthesized image [12–14] resulting in poor 3D

experience.

3D image and video quality assessment is a more difﬁcult and

complex problem compared to its 2D counterpart. Earlier, 2D im-

age quality metrics have been used to quantify the quality of 3D

images (video plus depth) and stereoscopic images. In this con-

text, the 2D metrics have been used in two ways: some metrics

estimate the quality by assessing each texture image separately

and aggregating the values without considering the depth images.

Others exploit the depth maps in addition to texture image qual-

ity to predict the overall quality. However, due to different nature

of acquisition, representation, transmission and rendering, 3D im-

ages are affected by different types of quality artifacts [15, 16].

Recent studies [17, 18] tested various existing 2D image quality

metrics to assess the quality of stereoscopic and 3D images and

concluded that none of the existing 2D quality metrics is suitable

in this context.

Ekmekcioglu et.al [19] proposed a 3D quality assessment al-

gorithm based on weighted PSNR and SSIM [20]. They propose

to weight each pixel quality value (PSNR or SSIM) with the corre-

sponding depth value to increase the contribution of pixels closer

to the camera; indeed, according to their study the closer the pixel,

the larger the impact on visual perception. The 3D QA proposed

in [21] combines SSIM and C4 [22] with disparity estimation to

compute a single quality metric. The two measures are then inte-

grated (globally or locally) to obtain the ﬁnal quality value. Boev

et. al. [23] proposed a full-reference multi-scale stereo video QA

algorithm that computes the monoscopic artifacts from the texture

images and stereoscopic artifacts from the disparity images. Cy-

clopean images are constructed from the reference and the test

stereopairs with block based matching; SSIM is used to quan-

tify the monoscopic artifacts (2D artifacts like blur, noise, etc).

The perceptual disparity maps computed for test and reference

stereopairs are compared to estimate the binocular distortions (e.g.

keystone, color distortion).

Most existing 3D quality metrics are full reference and few

consider depth maps in the evaluation. As already described, qual-

ity of depth images is very important due to their role in interme-

diate view generation. Moreover, no-reference 3D quality evalua-

tion is of fundamental importance since the corresponding original

views may not be available; indeed, cost, hardware and bandwidth

constraints usually impose to capture a limited set of views and

the quality of the synthesized view must be estimated in absence

of the corresponding reference. Furthermore, as the depth im-

ages are gray scale textureless images usually consisting of large

homogeneous or linearly changing regions with sharp edges rep-

resenting objects’ boundaries, the conventional 2D visual quality

metrics such as SSIM [20] are not effective to assess the quality

of depth images. As an answer to the mentioned issues, this paper

proposes ‘Blind depth quality metric’ (BDQM), a no-reference al-

gorithm to assess the quality of compressed depth images. The

major contributions of the paper are:

• the proposal of a novel no-reference depth quality metric

BDQM for blind evaluation of depth compression artifact;

• the shape of the histogram of compression sensitive depth

pixels is exploited to estimate the depth quality; in partic-

ular, we show that as the compression ratio is increased

the histogram around depth transitions ﬂattens because of

smoothing;

• BDQM is used to predict the quality of depth images un-

dergoing HEVC compression at various bitrates.

The rest of the paper is organized as follows. In Sect. 2 the pro-

posed algorithm is described. In Sect. 3 experimental results and

comparisons with existing techniques are presented. The research

is concluded in Sect. 4 with a discussion on its various aspects and

possible applications as future work.

2. PROPOSED DEPTH IMAGE QUALITY METRIC

The proposed quality metric works in two steps; ﬁrst, the com-

pression sensitivity map (CSM) of the depth image is computed

to locate the pixels which are the most susceptible to compression

artifacts. Second, for each compression sensitive pixel (CSP) a

histogram of the neighborhood is constructed and analyzed to de-

termine the quality index. BDQM builds on the key observation

that the histogram around a CSP gets ﬂattened when increasing

the amount of compression; indeed, compression mostly affects

the sharp discontinuities of the depth image. The proposed algo-

rithm exploits the shape of the histogram to predict depth quality.

The following subsections describe each step in detail.

2.1. Computing compression sensitivity map

It is well known that the boundary regions between objects at dif-

ferent depth levels are highly susceptible to compression artifact

compared to the ﬂat homogeneous areas of depth images. There-

fore, the magnitude of the depth gradient can be a simple and ef-

fective means for evaluating compression sensitivity. Let I be an

M × N depth image. The compression sensitivity map (CSM) of

I is computed from its gradient magnitudes as:

CSM =

+ G

(1)

where G

and G

are gradients along horizontal and vertical di-

rections and can be computed with Sobel ﬁlters.

The gradient magnitude can be used to select the compres-

sion sensitive depth pixels that will be used to estimate the qual-

ity index in the following section. Fig. 1a shows a depth image

from Poznan Street sequence (View 5, 1st Frame) and its corre-

sponding gradient representing the CSM (Fig. 1b). Thresholding

by dropping the pixels with CSM ≤ τ is used to locate the most

compression sensitive pixels; please note that this choice has also

a positive side effect since it dramatically reduces the computa-

tional cost of the whole metric. As an example, Fig. 1c shows the

CSM after thresholding with τ = 4.

(a) (b)

(c)

Figure 1. Depth saliency detection: a depth image (a), its CSM (b), thresh-

olded CSM (τ = 4) (c).

2.2. Depth quality index

The CSM computed in the previous step is used to estimate the

quality of the depth image. The CSPs deﬁned above belong to

the sharp discontinuities representing the boundaries between two

usually very ﬂat or linearly changing regions at different depth

levels. To quantify the effect of compression the neighborhood

of the CSPs is examined to determine the smoothness induced by

quantization. A local histogram is constructed and analyzed to in-

fer the presence of compression effects. As the CSPs lie on or in

the proximity of the boundary between two different depth levels,

the histogram appears to be very peaked around two bins. In pres-

ence of compression, the depth transitions tend to be smoothed

and the effect can be captured by a local histogram where the two

peaks are less pronounced and the values are more equally dis-

tributed in between. Fig. 2 shows a sample histogram of a CSP

(neighborhood of size 15×15) from Poznan Street sequence com-

pressed with HEVC with quantization parameter QP=30 (Fig. 2a)

and QP=42 (Fig. 2b), respectively. The histogram is computed

onto 10 equal bins. Two very high peaks with values above 85

can observed in Fig. 2a showing that the depth values are concen-

trated around two bins whereas the rest of the histogram is very

sparse and almost empty. In Fig. 2b it can be noted that the his-

togram of the same region exhibits lower peaks and higher valley

between them when QP=42: a drop of 30 and 15 can be observed

in the two peaks respectively along with increased values of the

bins in-between. We can conclude that higher compression makes

the histogram ﬂatter.

To predict the quality of a depth image we propose to estimate

the histogram dispersion by measuring the area which lies on top

1 2 3 4

5 6

7 8 9 10

100

Bin

Bin Size

(a)

1 2 3 4 5 6 7 8 9 10

100

Bin

Bin Size

(b)

Figure 2. Histogram of a salient pixel from Poznan Street test sequence,

view 5, frame 1 at QP=30 (a), and QP=42 (b).

(a) (b) (c)

Figure 3. Predicting the quality index. (a) QP=30, Q

= 675, (b) QP=42,

= 525, (c) QP=46, Q

= 375.

of the histogram curve (see gray area in Fig. 3): the larger the

area, the less compressed is depth. An area value is associated to

each CSP and then averaged together to compute the ﬁnal quality

index. Let S be the set of CSPs of depth image I and let p

∈ S be

a CSP with coordinates (x, y) | {1 ≤ x ≤ M ; 1 ≤ y ≤ N }.

For each p

∈ S, we select a patch P

of size w × w centered

at (x, y) and calculate the corresponding local histogram. Let H

denote the histogram distribution of patch P

with κ equally sized

bins. The quality index Q

of p

is deﬁned as:

t=1

[max(H

) − H

(t)] (2)

Fig. 3 graphically shows the proposed quality index. The ﬁgure

shows distribution curves of a sample CSP of the ﬁrst frame of

Poznan Street test sequence coded by HEVC with different QP.

The blue line represents the histogram distribution whereas the

area inside the curve is shadowed in gray. One can note that, as

we conjectured above, the histogram area is decreasing when QP

increases. Finally, the Q

value of all CSPs is averaged to obtain

the quality of depth image I.

BDQM =

|S|

i=1

(3)

where |S| represents the size of S. Blind depth quality metric

(BDQM) is computed for each frame of the depth video and the

values are averaged to predict the quality of a whole video se-

quence. BDQM is a quality measure that means the larger the

value of BDQM is, the better the quality of the depth map.

3. EXPERIMENTAL EVALUATION

In this section the proposed BDQM is tested on a number of stan-

dard depth videos undergoing HEVC compression. Each depth

video sequence is encoded at 6 different compression levels, namely

QP={26,30,34,38,42,46} using version HM 11.0 of the HEVC

reference software with Main proﬁle. We selected HEVC as a

benchmark for depth coding since the most promising future 3D

video coding standards, e.g. 3D-HECV, will leverage on it. The

goal of our analysis here is to show that the no reference BDQM

can compete with full reference metrics. Since depth maps are

textureless gray-scale images the visual image quality metrics are

not effective to assess their quality. Peak Signal to Noise Ratio

(PSNR) is usually used to evaluate the quality of depth maps. We

compare BDQM with PSNR to evaluate its performance. In the

following we employ 5 depth videos from standard sequences in

the MPEG and HHI datasets (see details in Tab. 1). The coded

depth quality is evaluated using the proposed BDQM with param-

eters w = 15, τ = 5 and κ = 10 and compared with the PSNR

computed versus the uncoded reference.

To evaluate the performance of BDQM we chose Pearson lin-

Table 1. Test dataset details: number of frames in the video (#F), view

number (V) and frame rate (FR).

Sequence #F V View Size FR Provider

Poznan Hall2 200 7 1920 × 1088 25 Poznan Univ. of Tech.

Poznan Street 250 5 1920 × 1088 25 Poznan Univ. of Tech.

Kendo 300 1 1024 × 768 30 Nagoya University

Balloons 300 1 1024 × 768 30 Nagoya University

Book Arrival 100 10 1024 × 768 16 Fraunhofer HHI

Table 2. Performance Evaluation of proposed BDQM.

Sequence PLCC RMSE MAE

Poznan Hall2 0.9808 0.6056 0.5131

Poznan Street 0.9941 0.2438 0.2036

Kendo 0.9985 0.1588 0.1276

Balloons 0.9978 0.1554 0.1466

Book Arrival 0.9889 0.3187 0.2796

Average: 0.9920 0.2965 0.2541

ear correlation coefﬁcient (PLCC) for prediction accuracy test and

Spearman rank order correlation coefﬁcient (SROCC) and Kendall

rank order correlation coefﬁcient (KROCC) for Prediction Mono-

tonicity test. To estimate the prediction error we compute Root

Mean Square Error (RMSE) and Mean Absolute Error (MAE)

measures. Before computing these performance parameters, ac-

cording to Video Quality Expert Group (VQEG) recommenda-

tions [24] the BDQM predicted scores Q are mapped to PSNR

with a monotonic nonlinear regression function. The following

logistic function outlined in [25] is used for regression mapping:

= β



−

exp β

(Q − β

)



+ β

Q + β

(4)

wherer Q

are the mapped score and β

, · · · , β

are the regression

model parameters.

The performance parameters discussed above are reported in

Tab. 2 for each test sequence. The table shows that the proposed

BDQM achieves very high correlation with PSNR in every experi-

ment with an average PLCC of 0.9920. The SROCC and KROCC

are equal to 1 in all experiments as the predicted scores are mono-

tonic. The average prediction error in terms of RMSE and MAE

turns to be 0.29 and 0.25, respectively. All the collected results

demonstrate the accuracy of the proposed quality metric. To fur-

ther evaluate the reliability of BDQM the performance parameters

have been computed over the entire dataset, i.e. without consid-

ering the 5 videos as separated experiments; such an approach al-

lows one to understand if BDQM can be used not only to rank the

quality of different compression levels of the same content but also

to compare different scores of different videos. The results of this

global analysis are shown in Tab. 3. The PLCC achieved over the

entire dataset turns to be 0.9076, showing again high correlation

between BDQM and PSNR. The values of SROCC and KROCC

are equal to 0.8439 and 0.7089 respectively, demonstrating the

good monotonicity between the two metrics also when BDQM is

used to compare different video contents. Clearly, the statistics

presented in Tab. 2 and 3 show that the quality scores predicted

by the proposed metric are quite accurate and reliable. Finally,

in Fig. 4 we show the scatter plot of the predicted scores versus

PSNR over the complete dataset to let the reader visually appreci-

ate the obtained level of correlation. More details on experimental

evaluation and a software release of the proposed BDQM metric

can be found at: http://www.di.unito.it/~farid/3DQA/BDQM.html.

Table 3. Performance of BDQM over entire dataset.

PLCC SROCC KROCC RMSE MAE

0.9076 0.8439 0.7089 1.7498 1.4902

35 40 45 50 55 60 65 70 75

BDQM

PSNR

Figure 4. Scatter plot of BDQM versus PSNR over entire dataset and curve

ﬁtted with logistic function.

4. CONCLUSIONS AND FUTURE WORK

In this paper a novel no-reference metric able to rank the com-

pression artifacts of depth maps has been presented. The pro-

posed algorithm leverages on the observation that depth images

are characterized by ﬂat regions with sharp boundaries that are

potentially blurred after compression. The proposed algorithm

estimates depth quality by measuring the blurriness of the com-

pression sensitive regions of the depth image using a histogram

based approach. The experimental results show that BDQM ex-

hibits high prediction accuracy when compared to full reference

PSNR metric.

BDQM can be integrated with no-reference image quality met-

rics to design novel 3D image quality scores that, in addition to

texture image also consider the depth image to better estimate

the overall quality. Another future application that we foresee is

the use of BDQM within the rate distortion optimization stage of

depth map compression algorithms. Since BDQM is based on the

estimation of the quality of sharp transitions in the depth map it

is expected to be a valuable instrument for predicting textural and

structural distortions in synthesized images.

5. REFERENCES

[1] M. Tanimoto, “FTV: Free-viewpoint Television,” Signal

Process.-Image Commun., vol. 27, no. 6, pp. 555 – 570,

2012.

[2] M.P. Tehrani et al., “Proposal to consider a new work item

and its use case - rei : An ultra-multiview 3D display,”

ISO/IEC JTC1/SC29/WG11/m30022, July-Aug 2013.

[3] C. Fehn, “Depth-image-based rendering (DIBR), compres-

sion, and transmission for a new approach on 3D-TV,” in

SPIE Electron. Imaging, 2004, pp. 93–104.

[4] M. Domanski et al., “High efﬁciency 3D video coding us-

ing new tools based on view synthesis,” IEEE Trans. Image

Process., vol. 22, no. 9, pp. 3517–3527, 2013.

[5] M.S. Farid et al., “Panorama view with spatiotemporal oc-

clusion compensation for 3D video coding,” IEEE Trans.

Image Process., vol. 24, no. 1, pp. 205–219, Jan 2015.

[6] T. Maugey, A. Ortega, and P. Frossard, “Graph-based repre-

sentation for multiview image geometry,” IEEE Trans. Im-

age Process., vol. 24, no. 5, pp. 1573–1586, May 2015.

[7] M.S. Farid et al., “A panoramic 3D video coding with di-

rectional depth aided inpainting,” in Proc. Int. Conf. Image

Process. (ICIP), Oct 2014, pp. 3233–3237.

[8] T. Wiegand et al., “Overview of the H.264/AVC video cod-

ing standard,” IEEE Trans. Circuits Syst. Video Technol.,

vol. 13, no. 7, pp. 560–576, July 2003.

[9] G.J. Sullivan et al., “Overview of the high efﬁciency video

coding (HEVC) standard,” IEEE Trans. Circuits Syst. Video

Technol., vol. 22, no. 12, pp. 1649–1668, 2012.

[10] G.J. Sullivan et al., “Standardized Extensions of High Efﬁ-

ciency Video Coding (HEVC),” IEEE J. Sel. Topics Signal

Process., vol. 7, no. 6, pp. 1001–1016, Dec 2013.

[11] K. Muller et al., “3D High-Efﬁciency Video Coding for

Multi-View Video and Depth Data,” IEEE Trans. Image Pro-

cess., vol. 22, no. 9, pp. 3366–3378, Sept 2013.

[12] P. Merkle et al., “The effects of multiview depth video com-

pression on multiview rendering,” Signal Processing: Image

Communication, vol. 24, no. 1, pp. 73–88, 2009.

[13] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edges

shape enforcement for visual enhancement of depth image

based rendering,” in IEEE 15th Int. Workshop Multimedia

Signal Process. (MMSP), 2013, pp. 406–411.

[14] M.S. Farid, M. Lucenteforte, and M. Grangetto, “Edge en-

hancement of depth based rendered images,” in Proc. Int.

Conf. Image Process. (ICIP), 2014, pp. 5452 – 5456.

[15] Q. Huynh-Thu, P. Le Callet, and M. Barkowsky, “Video

quality assessment: From 2d to 3d - challenges and future

trends,” in Proc. ICIP, Sept 2010, pp. 4025–4028.

[16] F. Speranza et al., “Effect of disparity and motion on visual

comfort of stereoscopic images,” in SPIE Electron. Imaging,

2006.

[17] E. Bosc et al., “Towards a new quality metric for 3-d synthe-

sized view assessment,” IEEE J. Sel. Topics Signal Process.,

vol. 5, no. 7, pp. 1332–1343, Nov 2011.

[18] P. Hanhart and T. Ebrahimi, “Quality assessment of a stereo

pair formed from decoded and synthesized views using ob-

jective metrics,” in Proc. 3DTV-CON, Oct 2012, pp. 1–4.

[19] E. Ekmekcioglu et al., “Depth based perceptual quality as-

sessment for synthesised camera viewpoints,” in User Cen-

tric Media, vol. 60, pp. 76–83. 2012.

[20] Z. Wang et al., “Image quality assessment: from error visi-

bility to structural similarity,” IEEE Trans. Image Process.,

vol. 13, no. 4, pp. 600–612, April 2004.

[21] A. Benoit, P. Le Callet, P. Campisi, and R. Cousseau, “Qual-

ity assessment of stereoscopic images,” EURASIP J. Image

Video Process., vol. 2008, 2009.

[22] M. Carnec, P. Le Callet, and D. Barba, “An image quality

assessment method based on perception of structural infor-

mation,” in Proc. ICIP, Sept 2003, vol. 3, pp. 2284–2298.

[23] A. Boev et al., “Towards compound stereo-video qual-

ity metric: a speciﬁc encoder-based framework,” in IEEE

Southwest Symp. Image Anal. Interp., 2006, pp. 218–222.

[24] VQEG, “RRNR-TV Group Test Plan,” 2007, Version 2.2.

[25] H.R. Sheikh, M.F. Sabir, and A.C. Bovik, “A statistical eval-

uation of recent full reference image quality assessment al-

gorithms,” IEEE Trans. Image Process., vol. 15, no. 11, pp.

3440–3451, Nov 2006.

No-reference quality metric for HEVC compression distortion estimation in depth maps

Article

Full-text available

Feb 2020

Multiview video plus depth (MVD) is the most popular 3D video format due to its efficient compression and provision for novel view generation enabling the free-viewpoint applications. In addition to color images, MVD format provides depth maps which are exploited to generate intermediate virtual views using the depth image-based rendering (DIBR) techniques. Compression affects the quality of the depth maps which in turn may introduce various structural and textural distortions in the DIBR-synthesized images. Estimation of the compression-related distortion in depth maps is very important for a high-quality 3D experience. The task becomes challenging when the corresponding reference depth maps are unavailable, e.g., when evaluating the quality on the decoder side. In this paper, we present a no-reference quality assessment algorithm to estimate the distortion in the depth maps induced by compression. The proposed algorithm exploits the depth saliency and local statistical characteristics of the depth maps to predict the compression distortion. The proposed ‘depth distortion evaluator’ (DDE) is evaluated on depth videos from standard MVD database compressed with the state-of-the-art high-efficiency video coding at various quality levels. The results demonstrate that DDE can be used to effectively estimate the compression distortion in depth videos.

Blind Quality Index of Depth Images Based on Structural Statistics for View Synthesis

Article

Full-text available

Apr 2020

The quality of depth images is crucial for virtual view synthesis. However, the quality assessment of depth images is still largely unexplored. This letter presents a blind quality metric of Depth image based on Structural Statistics (DSS). The design philosophy is inspired by the fact that structural distortion in the depth images usually leads to geometric distortion, which is the main cause for degraded quality of synthesized views. Specifically, the statistical features for shape and orientation are calculated based on discrete orthogonal moments and gradients, generating two groups of quality-aware features. Then, the quality model is built from the extracted statistical features using a regression module. The experimental results demonstrate the effectiveness of the proposed metric.

Evaluating Virtual Image Quality using the Side-Views Information Fusion and Depth Maps

Article

Full-text available

Nov 2017
INFORM FUSION

Three Dimensional (3D) image quality assessment is a challenging problem as compared to 2D images due to their different nature of acquisition, representation, coding, and display. The additional dimension of depth in multiview video plus depth (MVD) format is exploited to obtain images at novel intermediate viewpoints using depth image based rendering (DIBR) techniques, enabling 3D television and free-viewpoint television (FTV) applications. Depth maps introduce various quality artifacts in the DIBR-synthesized (virtual) images. In this paper, we propose a novel methodology to evaluate the quality of synthesized views in absence of the corresponding original reference views. It computes the statistical characteristics of the side views from whom the virtual view is generated, and fuses this information to estimate the statistical characteristics of the cyclopean image which are compared to those of the synthesized image to evaluate its quality. In addition to texture images, the proposed algorithm also considers the depth maps in evaluating the quality of the synthesized images. The algorithm blends two quality metrics, one estimating the texture distortion in the synthesized texture image induced by compression, transmission, 3D warping, or other causes and the second one determining the distortion of the depth maps. The two metrics are combined to obtain an overall quality assessment of the synthesized image. The proposed Synthesized Image Quality Metric (SIQM) is tested on the challenging MCL-3D and SIAT-3D datasets. The evaluation results show that the proposed metric significantly improves over state-of-the-art 3D image quality assessment algorithms.

Perceptual Quality Assessment of 3D Synthesized Images

Conference Paper

Full-text available

Jul 2017

Multiview video plus depth (MVD) is the most popular 3D video format where the texture images contain the color information and the depth maps represent the geometry of the scene. The depth maps are exploited to obtain intermediate views to enable 3D-TV and free-viewpoint applications using the depth image based rendering (DIBR) techniques. DIBR is used to get an estimate of the intermediate views but has to cope with depth errors, occlusions, imprecise camera parameters , re-interpolation, to mention a few issues. Therefore , being able to evaluate the true perceptual quality of synthesized images is of paramount importance for a high quality 3D experience. In this paper, we present a novel algorithm to assess the quality of the synthesized images in the absence of the corresponding references. The algorithm uses the original views from which the virtual image is generated to estimate the distortion induced by the DIBR process. In particular, a block-based perceptual feature matching based on signal phase congruency metric is devised to estimate the synthesis distortion. The experiments worked out on standard DIBR synthesized database show that the proposed algorithm achieves high correlation with the subjective ratings and out-performs the existing 3D quality assessment algorithms. Index Terms— Quality assessment, depth image based rendering, view synthesis, Free-viewpoint TV

High-quality three-dimensional visualization system for light field microscopy using a robust-depth estimation algorithm and holographic optical element-microlens array

Article

Feb 2024
OPT LASER ENG

Munkh-Uchral Erdenebat

No-reference Quality Index of Depth Images Based on Statistics of Edge Profiles for View Synthesis

Article

Dec 2019
INFORM SCIENCES

Virtual view synthesis has been increasingly popular due to the wide applications of multi-view and free-viewpoint videos. In view synthesis, texture images are rendered to generate the new viewpoint with the guidance of the depth images. The quality of depth images is vital for generating high-quality synthesized views. While the impact of texture image and the rendering process on the quality of the synthesized view has been extensively studied, the quality evaluation of depth images remains largely unexplored. With this motivation, this paper presents a no-reference image quality index for depth maps by modeling the statistics of edge profiles (SEP) in a multi-scale framework. The Canny operator is first utilized to locate the edges in depth images. Then the edge profiles are constructed, based on which the first-order and second-order statistical features are extracted for portraying the distortions in depth images. Finally, the random forest is employed for building the quality assessment model for depth maps. Experiments are conducted on two annotated view synthesis image/video quality databases. The experimental results and comparisons demonstrate that the proposed metric outperforms the relevant state-of-the-art quality metrics by a large margin. Furthermore, it has better generalization ability.

No-Reference View Synthesis Quality Prediction for 3-D Videos Based on Color–Depth Interactions

Article

Sep 2017

In three-dimensional (3D) video system, automatically predicting the quality of synthesized 3D video based on the inputs of color and depth videos is an urgent but very difficult task, while the existing full-reference methods usually measure the perceptual quality of the synthesized video. In this paper, a high efficiency view synthesis quality prediction (HEVSQP) metric for view synthesis is proposed. Based on the derived VSQP model that quantifies the influences of color and depth distortions and their interactions in determining the perceptual quality of 3D synthesized video, color-involved VSQP (CI-VSQP) and depth-involved VSQP (DI-VSQP) indexes are predicted respectively, and are combined to yield a HEVSQP index. Experimental results on our constructed NBU-3D Synthesized Video Quality Database demonstrate that, the proposed HEVSOP has a good performance evaluated on the entire synthesized video quality database, compared with other FR and no-reference video quality assessment (VQA) metrics.

Edge enhancement of depth based rendered images

Conference Paper

Full-text available

Oct 2014

Depth image based rendering is a well-known technology for the generation of virtual views in between a limited set of views acquired by a cameras array. Intermediate views are rendered by warping image pixels based on their depth. Nonetheless, depth maps are usually imperfect as they need to be estimated through stereo matching algorithms; moreover, for representation and transmission requirements depth values are obviously quantized. Such depth representation errors translate into a warping error when generating intermediate views thus impacting on the rendered image quality. We observe that depth errors turn to be very critical when they affect the object contours since in such a case they cause significant structural distortion in the warped objects. This paper presents an algorithm to improve the visual quality of the synthesized views by enforcing the shape of the edges in presence of erroneous depth estimates. We show that it is possible to significantly improve the visual quality of the interpolated view by enforcing prior knowledge on the admissible deformations of edges under projective transformation. Both visual and objective results show that the proposed approach is very effective.

A panoramic 3D video coding with directional depth aided inpainting

Conference Paper

Full-text available

Oct 2014

The success of 3D and free-viewpoint television largely depends on the efficient representation and compression of 3D video in addition to viable rendering methods. This paper presents a novel 3D video coding technique based on the creation of a panorama view to compact the information of a stereoscopic pair. The panorama view represents the information that would be visible to a virtual camera with a larger field of view embracing all the available views. The information in the panorama view is then used to estimate any intermediate view using depth image based rendering. Furthermore, to fill the disocclusions in the reconstructed view a directional depth aided fast marching inpainting technique is presented. The panorama view and corresponding depth map are amenable to standard video compression. In this paper we show that using the novel HEVC standard the proposed 3D video format can be compressed very efficiently.

Panorama View With Spatiotemporal Occlusion Compensation for 3D Video Coding

Article

Full-text available

Nov 2014

The future of novel 3D display technologies largely depends on the design of efficient techniques for 3D video representation and coding. Recently, multiple view plus depth video formats have attracted many research efforts since they enable intermediate view estimation and permit to efficiently represent and compress 3D video sequences. In this paper we present Spatio-Temporal Occlusion compensation with Panorama view (STOP), a novel 3D video coding technique based on the creation of a panorama view and occlusion coding in terms of spatio-temporal offsets. The panorama picture represents most of the visual information acquired from multiple views using a single virtual view, characterized by a larger field of view. Encoding the panorama video with state of the art HECV and representing occlusions with simple spatio-temporal ancillary information STOP achieves high compression ratio and good visual quality with competitive results with respect to competing techniques. Moreover, STOP enables free view point 3D TV applications whilst allowing legacy display to get a bi-dimensional service by using a standard video codec and simple cropping operations.

Quality Assessment of a Stereo Pair Formed From Decoded and Synthesized Views Using Objective Metrics

Conference Paper

Full-text available

Oct 2012

When a stereo pair is formed from a decoded view and a syn-thesized view, it is unclear how the overall quality of the stereo pair should be assessed through objective quality metrics. In this paper, this problem is addressed considering a 3D video repre-sented in the format of multiview video plus depth. The perfor-mance of different state-of-the-art 2D quality metrics is analyzed in terms of correlation with subjective perception of video quality. A set of subjective data collected through formal subjective eval-uation tests is used as benchmark. Results show that the measured quality of the decoded view has the highest correlation with per-ceived quality. If the objective quality assessment is based on the measured quality of the synthesized view, it is suggested to use VIF, VQM, MS-SSIM, or SSIM since they significantly outper-form other objective metrics, including PSNR.

Standardized Extensions of High Efficiency Video Coding (HEVC)

Article

Full-text available

Dec 2013
IEEE J-STSP

This paper describes extensions to the High Efficiency Video Coding (HEVC) standard that are active areas of current development in the relevant international standardization committees. While the first version of HEVC is sufficient to cover a wide range of applications, needs for enhancing the standard in several ways have been identified, including work on range extensions for color format and bit depth enhancement, embedded-bitstream scalability, and 3D video. The standardization of extensions in each of these areas will be completed in 2014, and further work is also planned. The design for these extensions represents the latest state of the art for video coding and its applications.

Edges shape enforcement for visual enhancement of depth image based rendering

Conference Paper

Full-text available

Sep 2013

Depth image based rendering of intermediate views with high visual quality remains a challenging goal in presence of estimated and quantized depth values. Among the other rendering artifacts we observed that edges are usually affected by significant warping errors. In particular, because of depth estimation inaccuracy around object boundaries the edges may completely loose their original shape during the warping process. Nonetheless, edges represent one of the most important cues for the human visual system. In this paper a novel technique aiming at improving the edge rendering is presented. As opposed to previous approaches, the technique exploits only texture information, thus avoiding possible errors in depth estimation. The idea is based on the enforcement of prior knowledge of the edge shape under projective transformation. The proposed algorithm works in two steps: first the damaged edges of the warped image are detected, then these latter are corrected so as to better approximate their shape in the reference view. Finally the corrected edges are rendered within the intermediate image without introducing noticeable texture artifacts. The proposed algorithm has been tested on a variety of standard video sequences exhibiting excellent results in terms of rendered image visual quality.

Graph-Based Representation for Multiview Image Geometry

Article

Full-text available

Dec 2013

In this paper, we propose a new representation for multiview image sets. Our approach relies on graphs to describe geometry information in a compact and controllable way. The links of the graph connect pixels in different images and describe the proximity between pixels in the 3D space. These connections are dependent on the geometry of the scene and provide the right amount of information that is necessary for coding and reconstructing multiple views. This multiview image representation is very compact and adapts the transmitted geometry information as a function of the complexity of the prediction performed at the decoder side. To achieve this, our GBR adapts the accuracy of the geometry representation, in contrast with depth coding, which directly compresses with losses the original geometry signal. We present the principles of this graph-based representation (GBR) and we build a complete prototype coding scheme for multiview images. Experimental results demonstrate the potential of this new representation as compared to a depth-based approach. GBR can achieve a gain of 2 dB in reconstructed quality over depth-based schemes operating at similar rates.

Image quality assessment: From error visibility to structural similarity

Article

Jan 2014

Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.

3D high efficiency video coding for multi-view video and depth data

Article

Jan 2012

Depth Based Perceptual Quality Assessment for Synthesised Camera Viewpoints

Conference Paper

Jan 2012

This paper considers the visual quality assessment for view synthesis in the context of 3D video delivery chain. It is targeted to perceptually quantify the reconstruction quality of synthesised camera viewpoints. It is needed for developing better QoE models related to 3D-TV, as well as for a better representation of the effect of depth maps on views synthesis quality. In this paper, existing 2D video quality assessment methods, like PSNR and SSIM, are extended to assess the perceived quality of synthesised viewpoints based on the depth range. The performance of the extended assessment techniques is measured by correlating multiple sample video assessment scores to that of the Video Quality Metric (VQM) scores, which are a robust reflector of real subjective opinions.

Blind depth quality assessment using histogram shape analysis

Abstract and Figures

Recommended publications

A novel diagnostic algorithm for heparin‐induced thrombocytopenia

Prestack imaging of compressed seismic data: a Monte Carlo approach

Detection methods for blocking artefacts in transform coded images

An ensemble of support vector machines for predicting virulent proteins