ArticlePDF Available

Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index

August 2013
IEEE Transactions on Image Processing 23(2)

August 2013
23(2)

DOI:10.1109/TIP.2013.2293423

Source
arXiv

Authors:

Wufeng Xue

Shenzhen University

Lei Zhang

University College Dublin

Xuanqin Mou

Xi'an Jiaotong University

Alan Bovik

University of Texas at Austin

Faithfully evaluating perceptual image quality is an important task in applications such as image compression, image restoration and multimedia streaming. A good image quality assessment (IQA) model is expected to be not only effective (i.e., deliver high quality prediction accuracy) but also computationally efficient. Owing to the need to deploy image quality measurement tools in high-speed networks, the efficiency of an IQA metric is particularly important due to the increasing proliferation of high-volume visual data. Here we develop and explain a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). Although the image gradient has been employed in other IQA models, few have achieved favorable performance in terms of both accuracy and efficiency. The results are proactive: we find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy on benchmark IQA databases. Matlab code that implements GMSD can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/ IQA/GMSD/GMSD.htm.

The flowchart of a class of two-step FR-IQA models.

…

Examples of reference (r) and distorted (d) images, their gradient magnitude images (m r and m d ), and the associated gradient magnitude similarity (GMS) maps, where brighter gray level means higher similarity. The highlighted regions (by red curve) are with clear structural degradations in the gradient magnitude domain. From top to bottom, the four types of distortions are additive white noise (AWN), JPEG compression, JPEG2000 compression, and Gaussian blur (GB). For each type of distortion, two images with different contents are selected from the LIVE database [11]. For each distorted image, its subjective quality score (DMOS) and GMSD index are listed. Note that distorted images with similar DMOS scores have similar GMSD indices, though their contents are totally different.

…

The performance of GMSD in terms of SRC vs. constant c on the three databases.

…

Figures - uploaded by Wufeng Xue

Content may be subject to copyright.

Content uploaded by Wufeng Xue

Content may be subject to copyright.

684 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

Gradient Magnitude Similarity Deviation: A Highly

Efﬁcient Perceptual Image Quality Index

Wufeng Xue, Lei Zhang, Member, IEEE, Xuanqin Mou, Member, IEEE, and Alan C. Bovik, Fellow, IEEE

Abstract—It is an important task to faithfully evaluate the

perceptual quality of output images in many applications, such

as image compression, image restoration, and multimedia stream-

ing. A good image quality assessment (IQA) model should not

only deliver high quality prediction accuracy, but also be com-

putationally efﬁcient. The efﬁciency of IQA metrics is becoming

particularly important due to the increasing proliferation of high-

volume visual data in high-speed networks. We present a new

effective and efﬁcient IQA model, called gradient magnitude

similarity deviation (GMSD). The image gradients are sensitive to

image distortions, while different local structures in a distorted

image suffer different degrees of degradations. This motivates

us to explore the use of global variation of gradient based

local quality map for overall image quality prediction. We ﬁnd

that the pixel-wise gradient magnitude similarity (GMS) between

the reference and distorted images combined with a novel

pooling strategy—the standard deviation of the GMS map—can

predict accurately perceptual image quality. The resulting GMSD

algorithm is much faster than most state-of-the-art IQA methods,

and delivers highly competitive prediction accuracy. MATLAB

source code of GMSD can be downloaded at http://www4.comp.

polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.

Index Terms—Gradient magnitude similarity, image quality

assessment, standard deviation pooling, full reference.

I. INTRODUCTION

T IS an indispensable step to evaluate the quality of

output images in many image processing applications such

as image acquisition, compression, restoration, transmission,

etc. Since human beings are the ultimate observers of the

processed images and thus the judges of image quality, it

is highly desired to develop automatic approaches that can

predict perceptual image quality consistently with human

Manuscript received February 28, 2013; revised August 14, 2013 and

November 13, 2013; accepted November 14, 2013. Date of publication

December 3, 2013; date of current version December 24, 2013. This work was

supported in part by the Natural Science Foundation of China under Grants

90920003 and 61172163, and in part by HK RGC General Research Fund

under Grant PolyU 5315/12E. The associate editor coordinating the review

of this manuscript and approving it for publication was Prof. Damon M.

Chandler.

W. Xue is with the Institute of Image Processing and Pattern Recognition,

Xi’an Jiaotong University, Xi’an 710049, China, and also with the Department

of Computing, The Hong Kong Polytechnic University, Hong Kong (e-mail:

xwolfs@hotmail.com).

L. Zhang is with the Department of Computing, The Hong Kong Polytechnic

University, Hong Kong (e-mail: cslzhang@comp.polyu.edu.hk).

X. Mou is with the Institute of Image Processing and Pattern

Recognition, Xi’an Jiaotong University, Xi’an 710049, China (e-mail:

xqmou@mail.xjtu.edu.cn).

A. C. Bovik is with the Department of Electrical and Computer Engineer-

ing, The University of Texas at Austin, Austin, TX 78712 USA (e-mail:

bovik@ece.utexas.edu).

Color versions of one or more of the ﬁgures in this paper are available

online at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TIP.2013.2293423

subjective evaluation. The traditional mean square error (MSE)

or peak signal to noise ratio (PSNR) correlates poorly with

human perception, and hence researchers have been devoting

much effort in developing advanced perception-driven image

quality assessment (IQA) models [2], [25]. IQA models can be

classiﬁed [3] into full reference (FR) ones, where the pristine

reference image is available, no reference ones, where the

reference image is not available, and reduced reference ones,

where partial information of the reference image is available.

This paper focuses on FR-IQA models, which are widely

used to evaluate image processing algorithms by measuring

the quality of their output images. A good FR-IQA model

can shape many image processing algorithms, as well as their

implementations and optimization procedures [1]. Generally

speaking, there are two strategies for FR-IQA model design.

The ﬁrst strategy follows a bottom-up framework [3], [30],

which simulates the various processing stages in the visual

pathway of human visual system (HVS), including visual

masking effect [32], contrast sensitivity [33], just noticeable

differences [34], etc. However, HVS is too complex and

our current knowledge about it is far from enough to con-

struct an accurate bottom-up IQA framework. The second

strategy adopts a top-down framework [3], [30], [4]–[8],

which aims to model the overall function of HVS based

on some global assumptions on it. Many FR-IQA models

follow this framework. The well-known Structure SIMilarity

(SSIM) index [8] and its variants, Multi-Scale SSIM

(MS-SSIM) [17] and Information Weighted SSIM (IW-SSIM)

[16], assume that HVS tends to perceive the local structures in

an image when evaluating its quality. The Visual Information

Fidelity (VIF) [23] and Information Fidelity Criteria (IFC)

[22] treat HVS as a communication channel and they predict

the subjective image quality by computing how much the

information within the perceived reference image is preserved

in the perceived distorted one. Other state-of-the-art FR-IQA

models that follow the top-down framework include Ratio of

Non-shift Edges (rNSE) [18], [24], Feature SIMilarity (FSIM)

[7], etc. A comprehensive survey and comparison of state-of-

the-art IQA models can be found in [14] and [30].

Aside from the two different strategies for FR-IQA model

design, many IQA models share a common two-step frame-

work [4]–[8], [16] as illustrated in Fig. 1. First, a local quality

map (LQM) is computed by locally comparing the distorted

image with the reference image via some similarity function.

Then a single overall quality score is computed from the

LQM via some pooling strategy. The simplest and widely used

pooling strategy is average pooling, i.e., taking the average

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 685

Fig. 1. The ﬂowchart of a class of two-step FR-IQA models.

of local quality values as the overall quality prediction score.

Since different regions may contribute differently to the overall

perception of an image’s quality, the local quality values

can be weighted to produce the ﬁnal quality score. Example

weighting strategies include local measures of information

content [9], [16], content-based partitioning [19], assumed

visual ﬁxation [20], visual attention [10] and distortion based

weighting [9], [10], [29]. Compared with average pooling,

weighted pooling can improve the IQA accuracy to some

extent; however, it may be costly to compute the weights.

Moreover, weighted pooling complicates the pooling process

and can make the predicted quality scores more nonlinear w.r.t.

the subjective quality scores (as shown in Fig. 5).

In practice, an IQA model should be not only effective

(i.e., having high quality prediction accuracy) but also efﬁ-

cient (i.e., having low computational complexity). With the

increasing ubiquity of digital imaging and communication

technologies in our daily life, there is an increasing vast

amount of visual data to be evaluated. Therefore, efﬁciency

has become a critical issue of IQA algorithms. Unfortunately,

effectiveness and efﬁciency are hard to achieve simultaneously,

and most previous IQA algorithms can reach only one of the

two goals. Towards contributing to ﬁlling this need, in this

paper we develop an efﬁcient FR-IQA model, called gradient

magnitude similarity deviation (GMSD). GMSD computes

the LQM by comparing the gradient magnitude maps of the

reference and distorted images, and uses standard deviation

as the pooling strategy to compute the ﬁnal quality score.

The proposed GMSD is much faster than most state-of-the-art

FR-IQA methods, but supplies surprisingly competitive quality

prediction performance.

Using image gradient to design IQA models is not new. The

image gradient is a popular feature in IQA [4]–[7], [15], [19]

since it can effectively capture image local structures, to

which the HVS is highly sensitive. The most commonly

encountered image distortions, including noise corruption,

blur and compression artifacts, will lead to highly visible

structural changes that “pop out” of the gradient domain. Most

gradient based FR-IQA models [5]–[7], [15] were inspired

by SSIM [8]. They ﬁrst compute the similarity between

the gradients of reference and distorted images, and then

compute some additional information, such as the difference

of gradient orientation, luminance similarity and phase con-

gruency similarity, to combine with the gradient similarity for

pooling. However, the computation of such additional infor-

mation can be expensive and often yields small performance

improvement.

Without using any additional information, we ﬁnd that using

the image gradient magnitude alone can still yield highly

accurate quality prediction. The image gradient magnitude

is responsive to artifacts introduced by compression, blur or

additive noise, etc. (Please refer to Fig. 2 for some exam-

ples.) In the proposed GMSD model, the pixel-wise similarity

between the gradient magnitude maps of reference and dis-

torted images is computed as the LQM of the distorted image.

Natural images usually have diverse local structures, and

different structures suffer different degradations in gradient

magnitude. Based on the idea that the global variation of local

quality degradation can reﬂect the image quality, we propose

to compute the standard deviation of the gradient magnitude

similarity induced LQM to predict the overall image quality

score. The proposed standard deviation pooling based GMSD

model leads to higher accuracy than all state-of-the-art IQA

metrics we can ﬁnd, and it is very efﬁcient, making large scale

real time IQA possible.

The rest of the paper is organized as follows. Section II

presents the development of GMSD in detail. Section III

presents extensive experimental results, discussions and com-

putational complexity analysis of the proposed GMSD model.

Finally, Section IV concludes the paper.

II. G

RADIENT MAGNITUDE SIMILARITY DEVIATION

A. Gradient Magnitude Similarity

The image gradient has been employed for FR-IQA in

different ways [3]–[7], [15]. Most gradient based FR-IQA

methods adopt a similarity function which is similar to that in

SSIM [8] to compute gradient similarity. In SSIM, three types

of similarities are computed: luminance similarity (LS), con-

trast similarity (CS) and structural similarity (SS). The product

of the three similarities is used to predict the image local qual-

ity at a position. Inspired by SSIM, Chen et al. proposed gra-

dient SSIM (G-SSIM) [6]. They retained the LS term of SSIM

but applied the CS and SS similarities to the gradient mag-

nitude maps of reference image (denoted by r) and distorted

image (denoted by d). As in SSIM, average pooling is used in

G-SSIM to yield the ﬁnal quality score. Cheng et al. [5]

proposed a geometric structure distortion (GSD) metric to

predict image quality, which computes the similarity between

the gradient magnitude maps, the gradient orientation maps

and contrasts of r and d. Average pooling is also used in

GSD. Liu et al. [15] also followed the framework of SSIM.

They predicted the image quality using a weighted summation

(i.e., a weighted pooling strategy is used) of the squared lumi-

nance difference and the gradient similarity. Zhang et al. [7]

combined the similarities of phase congruency maps and gra-

dient magnitude maps between r and d. A phase congruency

based weighted pooling method is used to produce the ﬁnal

quality score. The resulting Feature SIMilarity (FSIM) model

is among the leading FR-IQA models in term of prediction

accuracy. However, the computation of phase congruency

features is very costly.

For digital images, the gradient magnitude is deﬁned as the

root mean square of image directional gradients along two

orthogonal directions. The gradient is usually computed by

convolving an image with a linear ﬁlter such as the classic

Roberts, Sobel, Scharr and Prewitt ﬁlters or some task-speciﬁc

686 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

Fig. 2. Examples of reference (r) and distorted (d) images, their gradient magnitude images (m

and m

), and the associated gradient magnitude similarity

(GMS) maps, where brighter gray level means higher similarity. The highlighted regions (by red curve) are with clear structural degradations in the gradient

magnitude domain. From top to bottom, the four types of distortions are additive white noise (AWN), JPEG compression, JPEG2000 compression, and

Gaussian blur (GB). For each type of distortion, two images with different contents are selected from the LIVE database [11]. For each distorted image, its

subjective quality score (DMOS) and GMSD index are listed. Note that distorted images with similar DMOS scores have similar GMSD indices, though their

contents are totally different.

ones [26]–[28]. For simplicity of computation and to introduce

a modicum of noise-insensitivity, we utilize the Prewitt ﬁlter

to calculate the gradient because it is the simplest one among

the 3 × 3 template gradient ﬁlters. By using other ﬁlters such

as the Sobel and Scharr ﬁlters, the proposed method will have

similar IQA results. The Prewitt ﬁlters along horizontal (x)

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 687

Fig. 3. Comparison beween GMSM and GMSD as a subjective quality indicator. Note that like DMOS, GMSD is a distortion index (a lower DMOS/GMSD

value means higher quality), while GMSM is a quality index (a highr GMSM value means higher quality). (a) Original image Fishing, its Gaussian noise

contaminated version (DMOS=0.4403; GMSM=0.8853; GMSD=0.1420), and their gradient simiarity map. (b) Original image Flower, its blurred version

(DMOS=0.7785; GMSM=0.8745; GMSD=0.1946), and their gradient simiarity map. Based on the subjective DMOS, image Fishing has much higher quality

than image Flower. GMSD gives the correct judgement but GMSM fails.

and vertical (y) directions are deﬁned as:

⎡

⎣

1/30−1/3

⎤

⎦

, h

⎡

⎣

1/31/31/3

000

−1/3 −1/3 −1/3

⎤

⎦

(1)

Convolving h

and h

with the reference and distorted images

yields the horizontal and vertical gradient images of r and d.

The gradient magnitudes of r and d at location i, denoted by

(i) and m

(i), are computed as follows:

(i) =



(r ⊗ h

)

(i) + (r ⊗ h

)

(i) (2)

(i) =



(d ⊗ h

)

(i) + (d ⊗ h

)

(i) (3)

where symbol “⊗” denotes the convolution operation.

With the gradient magnitude images m

and m

in hand,

the gradient magnitude similarity (GMS) map is computed as

follows:

GMS(i) =

(i)m

(i) + c

(i) + m

(i) + c

(4)

where c is a positive constant that supplies numerical stability,

(The selection of c will be discussed in Section III-B.) The

GMS map is computed in a pixel-wise manner; nonetheless,

please note that a value m

(i) or m

(i) in the gradient

magnitude image is computed from a small local patch in the

original image r or d.

The GMS map serves as the local quality map (LQM) of the

distorted image d. Clearly, if m

(i) and m

(i) are the same,

GMS(i) will achieve the maximal value 1. Let’s use some

examples to analyze the GMS induced LQM. The most com-

monly encountered distortions in many real image processing

systems are JPEG compression, JPEG2000 compression, addi-

tive white noise (AWN) and Gaussian blur (GB). In Fig. 2, for

each of the four types of distortions, two reference images with

different contents and their corresponding distorted images

are shown (the images are selected from the LIVE database

[11]). Their gradient magnitude images (m

and m

) and the

corresponding GMS maps are also shown. In the GMS map,

the brighter the gray level, the higher the similarity, and thus

the higher the predicted local quality. These images contain

a variety of important structures such as large scale edges,

smooth areas and ﬁne textures, etc. A good IQA model should

be adaptable to the broad array of possible natural scenes and

local structures.

In Fig. 2, examples of structure degradation are shown in

the gradient magnitude domain. Typical areas are highlighted

with red curves. From the ﬁrst group, it can be seen that the

artifacts caused by AWN are masked in the large structure

and texture areas, while the artifacts are more visible in ﬂat

areas. This is broadly consistent with human perception. In the

second group, the degradations caused by JPEG compression

are mainly blocking effects (see the background area of

image parrots and the wall area of image house)andloss

of ﬁne details. Clearly, the GMS map is highly responsive to

these distortions. Regarding JPEG2000 compression, artifacts

are introduced in the vicinity of edge structures and in the

textured areas. Regarding GB, the whole GMS map is clearly

688 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

changed after image distortion. All these observations imply

that the image gradient magnitude is a highly relevant feature

for the task of IQA.

B. Pooling With Standard Deviation

The LQM reﬂects the local quality of each small patch

in the distorted image. The image overall quality score can

then be estimated from the LQM via a pooling stage. The

most commonly used pooling strategy is average pooling, i.e.,

simply averaging the LQM values as the ﬁnal IQA score. We

refer to the IQA model by applying average pooling to the

GMS map as Gradient Magnitude Similarity Mean (GMSM):

GMSM =



i=1

GMS(i) (5)

where N is the total number of pixels in the image. Clearly,

a higher GMSM score means higher image quality. Average

pooling assumes that each pixel has the same importance

in estimating the overall image quality. As introduced in

Section I, researchers have devoted much effort to design

weighted pooling methods ([9], [10], [16], [19], [20], and

[29]); however, the improvement brought by weighted pooling

over average pooling is not always signiﬁcant [31] and the

computation of weights can be costly.

We propose a new pooling strategy with the GMS map.

A natural image generally has a variety of local structures

in its scene. When an image is distorted, the different local

structures will suffer different degradations in gradient mag-

nitude. This is an inherent property of natural images. For

example, the distortions introduced by JPEG2000 compres-

sion include blocking, ringing, blurring, etc. Blurring will

cause less quality degradation in ﬂat areas than in textured

areas, while blocking will cause higher quality degradation

in ﬂat areas than in textured areas. However, the average

pooling strategy ignores this fact and it cannot reﬂect how

the local quality degradation varies. Based on the idea that

the global variation of image local quality degradation can

reﬂect its overall quality, we propose to compute the stan-

dard deviation of the GMS map and take it as the ﬁnal

IQA index, namely Gradient Magnitude Similarity Deviation

(GMSD):

GMSD =





i=1

(

GMS(i) − GMSM

)

(6)

Note that the value of GMSD reﬂects the range of distortion

severities in an image. The higher the GMSD score, the larger

the distortion range, and thus the lower the image perceptual

quality.

In Fig. 3, we show two reference images from the CSIQ

database [12], their distorted images and the corresponding

GMS maps. The ﬁrst image Fishing is corrupted by additive

white noise, and the second image Flower is Gaussian blurred.

From the GMS map of distorted image Fishing, one can see

that its local quality is more homogenous, while from the

GMS map of distorted image Flower, one can see that its

local quality in the center area is much worse than at other

areas. The human subjective DMOS scores of the two distorted

images are 0.4403 and 0.7785, respectively, indicating that the

quality of the ﬁrst image is obviously better than the second

one. (Note that like GMSD, DMOS also measures distortion;

the lower it is, the better the image quality.) By using GMSM,

however, the predicted quality scores of the two images are

0.8853 and 0.8745, respectively, indicating that the perceptual

quality of the ﬁrst image is similar to the second one, which

is inconsistent to the subjective DMOS scores.

By using GMSD, the predicted quality scores of the two

images are 0.1420 and 0.1946, respectively, which is a con-

sistent judgment relative to the subjective DMOS scores, i.e.,

the ﬁrst distorted image has better quality than the second

one. More examples of the consistency between GMSD and

DMOS can be found in Fig. 2. For each distortion type, the

two images of different contents have similar DMOS scores,

while their GMSD indices are also very close. These examples

validate that the deviation pooling strategy coupled with the

GMS quality map can accurately predict the perceptual image

quality.

III. E

XPERIMENTAL RESULTS AND ANALYSIS

A. Databases and Evaluation Protocols

The performance of an IQA model is typically evaluated

from three aspects regarding its prediction power [21]: predic-

tion accuracy, prediction monotonicity, and prediction consis-

tency. The computation of these indices requires a regression

procedure to reduce the nonlinearity of predicted scores. We

denote by Q, Q

and S the vectors of the original IQA scores,

the IQA scores after regression and the subjective scores,

respectively. The logistic regression function is employed for

the nonlinear regression [21]:

= β

(

−

exp(β

(Q − β

))

) + β

Q + β

(7)

where β

, β

and β

are regression model parameters.

After the regression, 3 correspondence indices can be

computed for performance evaluation [21]. The ﬁrst one is

the Pearson linear Correlation Coefﬁcient (PCC) between

and S, which is to evaluate the prediction accuracy:

PCC(Q

, S) =



(8)

where

and

S are the mean-removed vectors of Q

and S,

respectively, and subscript “T ” means transpose. The second

index is the Spearman Rank order Correlation coefﬁcient

(SRC) between Q and S, which is to evaluate the prediction

monotonicity:

SRC(Q, S) = 1 −



i=1

n(n

− 1)

(9)

where d

is the difference between the ranks of each pair of

samples in Q and S,andn is the total number of samples.

Note that the logistic regression does not affect the SRC index,

and we can compute it before regression. The third index is

the root mean square error (RMSE) between Q

and S,which

is to evaluate the prediction consistency:

RMSE(Q

, S) =



− S)

− S)/n (10)

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 689

With the SRC, PCC and RMSE indices, we evaluate the

IQA models on three large scale and publicly accessible IQA

databases: LIVE [11], CSIQ [12], and TID2008 [13]. The

LIVE database consists of 779 distorted images generated

from 29 reference images. Five types of distortions are applied

to the reference images at various levels: JPEG2000 com-

pression, JPEG compression, additive white noise (AWN),

Gaussian blur (GB) and simulated fast fading Rayleigh chan-

nel (FF). These distortions reﬂect a broad range of image

impairments, for example, edge smoothing, block artifacts and

random noise. The CSIQ database consists of 30 reference

images and their distorted counterparts with six types of

distortions at ﬁve different distortion levels. The six types

of distortions include JPEG2000, JPEG, AWN, GB, global

contrast decrements (CTD), and additive pink Gaussian noise

(PGN). There are a total of 886 distorted images in it. The

TID2008 database is the largest IQA database to date. It has

1,700 distorted images, generated from 25 reference images

with 17 types of distortions at 4 levels. Please refer to [13]

for details of the distortions. Each image in these databases has

been evaluated by human subjects under controlled conditions,

and then assigned a quantitative subjective quality score: Mean

Opinion Score (MOS) or Difference MOS (DMOS).

To demonstrate the performance of GMSD, we com-

pare it with 11 state-of-the-art and representative FR-IQA

models, including PSNR, IFC [22], VIF [23], SSIM [8],

MS-SSIM [17], MAD [12], FSIM [7], IW-SSIM [16],

G-SSIM [6], GSD [5] and GS [15]. Among them, FSIM,

G-SSIM, GSD and GS explicitly exploit gradient information.

Except for G-SSIM and GSD, which are implemented by us,

the source codes of all the other models were obtained from the

original authors. To more clearly demonstrate the effectiveness

of the proposed deviation pooling strategy, we also present the

results of GMSM which uses average pooling. As in most of

the previous literature [7], [8], [16], [17], all of the competing

algorithms are applied to the luminance channel of the test

images.

B. Implementation of GMSD

The only parameter in the proposed GMSM and GMSD

models is the constant c in Eq. (4). Apart from ensuring the

numerical stability, the constant c also plays a role in mediat-

ing the contrast response in low gradient areas. We normalize

the pixel values of 8-bit luminance image into range [0, 1].

Fig. 4 plots the SRC curves against c by applying GMSD to the

LIVE, CSIQ and TID2008 databases. One can see that for all

the databases, GMSD shows similar preference to the value

of c. In our implementation, we set c=0.0026. In addition,

as in the implementations of SSIM [8] and FSIM [7], the

images r and d are ﬁrst ﬁltered by a 2 × 2 average ﬁlter, and

then down-sampled by a factor of 2. MATLAB source code

that implements GMSD can be downloaded at http://www4.

comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.

C. Performance Comparison

In Table I, we compare the competing IQA models’ perfor-

mance on each of the three IQA databases in terms of SRC,

Fig. 4. The performance of GMSD in terms of SRC vs. constant c on the

three databases.

PCC and RMSE. The top three models for each evaluation

criterion are shown in boldface. We can see that the top

models are mostly GMSD (8 times), MAD (6 times), FSIM

(5 times) and VIF (5 times). In terms of all the three criteria

(SRC, PCC and RMSE), the proposed GMSD outperforms

all the other models on the TID2008 and CSIQ databases.

On the LIVE database, MAD performs the best, and VIF,

FSIM and GMSD perform almost the same. Compared with

gradient based models such as GSD, G-SSIM and GS, GMSD

outperforms them by a large margin. Compared with GMSM,

the superiority of GMSD is obvious, demonstrating that the

proposed deviation pooling strategy works much better than

the average pooling strategy on the GMS induced LQM. The

FSIM algorithm also employs gradient similarity. It has similar

results to GMSD on the LIVE and TID2008 databases, but

lags GMSD on the CSIQ database with a lower SRC/PCC

and larger RMSE.

In Fig. 5, we show the scatter plots of predicted quality

scores against subjective DMOS scores for some representative

models (PSNR, VIF, GS, IW-SSIM, MS-SSIM, MAD, FSIM,

GMSM and GMSD) on the CSIQ database, which has six

types of distortions (AWN, JPEG, JPEG2000, PGN, GB and

CTD). One can observe that for FSIM, MAD, MS-SSIM,

GMSM, IW-SSIM and GS, the distribution of predicted scores

on the CTD distortion deviates much from the distributions on

other types of distortions, degrading their overall performance.

When the distortion is severe (i.e., large DMOS values), GS,

GMSM and PSNR yield less accurate quality predictions. The

information ﬁdelity based VIF performs very well on the

LIVE database but not very well on the CSIQ and TID2008

databases. This is mainly because VIF does not predict the

images’ quality consistently across different distortion types

on these two databases, as can be observed from the scatter

plots with CSIQ database in Fig. 5.

In Table I, we also show the weighted average of SRC

and PCC scores by the competing FR-IQA models over

the three databases, where the weights were determined by

the sizes (i.e., number of images) of the three databases.

According to this, the top 3 models are GMSD, FSIM and

IW-SSIM. Overall, the proposed GMSD achieves outstanding

and consistent performance across the three databases.

In order to make statistically meaningful conclusions on

the models’ performance, we further conducted a series

of hypothesis tests based on the prediction residuals of

690 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

TABLE I

ERFORMANCE OF THE PROPOSED GMSD AND THE OTHER ELEVEN COMPETING FR-IQA MODELS IN TERMS OF SRC, PCC, AND RMSE

ON THE 3DATABASES.THE TOP THREE MODELS FOR EACH CRITERION ARE SHOWN IN BOLDFACE

Fig. 5. Scatter plots of predicted quality scores against the subjective quality scores (DMOS) by representative FR-IQA models on the CSIQ database.

The six types of distortions are represented by different shaped colors.

each model after nonlinear regression. The results of sig-

niﬁcance tests are shown in Fig. 6. By assuming that the

model’s prediction residuals follow the Gaussian distribution

(the Jarque-Bera test [35] shows that only 3 models on LIVE

and 4 models on CSIQ violate this assumption), we apply

the left-tailed F-test to the residuals of every two models

to be compared. A value of H=1 for the left-tailed F-test

at a signiﬁcance level of 0.05 means that the ﬁrst model

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 691

Fig. 6. The results of statistical signiﬁcance tests of the competing IQA models on the (a) LIVE, (b) CSIQ and (c) TID2008 databases. A value of ‘1’

(highlighted in green) indicates that the model in the row is signiﬁcantly better than the model in the column, while a value of ‘0’ (highlighted in red) indicates

that the ﬁrst model is not signiﬁcantly better than the second one. Note that the proposed GMSD is signiﬁcantly better than most of the competitors on all

the three databases, while no IQA model is signiﬁcantly better than GMSD.

(indicated by the row in Fig. 6) has better IQA performance

than the second model (indicated by the column in Fig. 6)

with a conﬁdence greater than 95%. A value of H =0 means

that the ﬁrst model is not signiﬁcantly better than the second

one. If H=0 always holds no matter which one of the two

models is taken as the ﬁrst one, then the two models have

no signiﬁcant difference in performance. Fig. 6(a)–(c) show

the signiﬁcance test results on the LIVE, CSIQ and TID2008

databases, respectively. We see that on the LIVE database,

GMSD, VIF, GMSM and FSIM all perform very well and they

have no signiﬁcant difference, while MAD performs the best

on this database. On the CSIQ database, GMSD is signiﬁcantly

better than all the other models except for MAD. On the

TID2008 database, GMSD is signiﬁcantly better than all the

other IQA models except for FSIM. Note that on all the three

databases, no IQA model performs signiﬁcantly better than

GMSD except that MAD is signiﬁcantly better than GMSD

on LIVE.

D. Performance Comparison on Individual Distortion Types

To more comprehensively evaluate an IQA model’s ability

to predict image quality degradations caused by speciﬁc types

of distortions, we compare the performance of competing

methods on each type of distortion. The results are listed

in Table II. To save space, only the SRC scores are shown.

There are a total of 28 groups of distorted images in the three

databases. In Table II, we use boldface font to highlight the

top 3 models in each group. One can see that GMSD is among

the top 3 models 14 times, followed by VIF and GS, which are

among the top 3 models 13 times and 11 times, respectively.

However, neither GS nor VIF ranks among the top 3 in terms

of overall performance on the 3 databases. The classical PSNR

also performs among the top 3 for 8 groups, and a common

point of these 8 groups is that they are all noise contaminated.

PSNR is, indeed, an effective measure of perceptual quality of

noisy images. However, PSNR is not able to faithfully measure

the quality of images impaired by other types of distortions.

Generally speaking, performing well on speciﬁc types of

distortions does not guarantee that an IQA model will perform

well on the whole database with a broad spectrum of distortion

types. A good IQA model should also predict the image quality

consistently across different types of distortions. Referring to

the scatter plots in Fig. 5, it can be seen that the scatter

plot of GMSD is more concentrated across different groups

of distortion types. For example, its points corresponding to

JPEG2000 and PGN distortions are very close to each other.

However, the points corresponding to JPEG2000 and PGN for

VIF are relatively far from each other. We can have similar

observations for GS on the distortion types of PGN and CTD.

This explains why some IQA models perform well for many

individual types of distortions but they do not perform well

on the entire databases; that is, these IQA models behave

rather differently on different types of distortions, which can

be attributed to the different ranges of quality scores for those

distortion types [43].

The gradient based models G-SSIM and GSD do not

show good performance on either many individual types of

distortions or the entire databases. G-SSIM computes the

local variance and covariance of gradient magnitude to gauge

contrast and structure similarities. This may not be an effective

use of gradient information. The gradient magnitude describes

the local contrast of image intensity; however, the image

local structures with different distortions may have similar

variance of gradient magnitude, making G-SSIM less effective

to distinguish those distortions. GSD combines the orientation

differences of gradient, the contrast similarity and the gradient

similarity; however, there is intersection between these kinds

of information, making GSD less discriminative of image

quality. GMSD only uses the gradient magnitude information

but achieves highly competitive results against the competing

methods. This validates that gradient magnitude, coupled with

the deviation pooling strategy, can serve as an excellent

predictive image quality feature.

E. Standard Deviation Pooling on Other IQA Models

As shown in previous sections, the method of standard

deviation (SD) pooling applied to the GMS map leads to

signiﬁcantly elevated performance of image quality prediction.

692 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

TABLE II

ERFORMANCE COMPARISON OF THE IQA MODELS ON EACH INDIVIDUAL DISTORTION TYPE IN TERMS OF SRC

It is therefore natural to wonder whether the SD pooling

strategy can deliver similar performance improvement on other

IQA models. To explore this, we modiﬁed six representative

FR-IQA methods, all of which are able to generate an LQM

of the test image: MSE (which is equivalent to PSNR but

can produce an LQM), SSIM [8], MS-SSIM [17], FSIM [7],

G-SSIM [6] and GSD [5]. The original pooling strategies of

these methods are either average pooling or weighted pooling.

For MSE, SSIM, G-SSIM, GSD and FSIM, we directly applied

the SD pooling to their LQMs to yield the predicted quality

scores. For MS-SSIM, we applied SD pooling to its LQM on

each scale, and then computed the product of the predicted

scores on all scales as the ﬁnal score. In Table III, the

SRC results of these methods by using their nominal pooling

strategies and the SD pooling strategy are listed.

Table III makes it clear that except for MSE, all the other

IQA methods fail to gain in performance by using SD pooling

instead of their nominal pooling strategies. The reason may be

that in these methods, the LQM is generated using multiple,

diverse types of features. The interaction between these fea-

tures may complicate the estimation of image local quality so

that SD pooling does not apply. By contrast, MSE and GMSD

use only the original intensity and the intensity of gradient

magnitude, respectively, to calculate the LQM.

F. Complexity

In applications such as real-time image/video quality mon-

itoring and prediction, the complexity of implemented IQA

models becomes crucial. We thus analyze the computational

complexity of GMSD, and then compare the competing IQA

models in terms of running time.

Suppose that an image has N pixels. The classical PSNR has

the lowest complexity, and it only requires N multiplications

and 2N additions. The main operations in the proposed GMSD

model include calculating image gradients (by convolving

the image with two 3 × 3 template integer ﬁlters), thereby

producing gradient magnitude maps, generating the GMS map,

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 693

TABLE III

SRC R

ESULTS OF SD POOLING ON SOME REPRESENTATIVE IQA MODELS

TABLE IV

UNNING TIME OF THE COMPETING IQA MODELS

and deviation pooling. Overall, it requires 19N multiplications

and 16N additions to yield the ﬁnal quality score. Meanwhile,

it only needs to store at most 4 directional gradient images

(each of size N) in memory (at the gradient calculation

stage). Therefore, both the time and memory complexities

of GMSD are O(N). In other words, the time and memory

cost of GMSD scales linearly with image size. This is a

very attractive property since image resolutions have been

rapidly increasing with the development of digital imaging

technologies. In addition, the computation of image gradients

and GMS map can be parallelized by partitioning the reference

and distorted images into blocks if the image size is very large.

Table IV shows the running time of the 13 IQA models

on an image of size 512 × 512. All algorithms were run

on a ThinkPad T420S notebook with Intel Core i7-2600M

CPU@2.7GHz and 4G RAM. The software platform used

to run all algorithms was MATLAB R2010a (7.10). Apart

from G-SSIM and GSD, the MATLAB source codes of all

the other methods were obtained from the original authors.

(It should be noted that whether the code is optimized may

affect the running time of an algorithm.) Clearly, PSNR is the

fastest, followed by GMSM and GMSD. Speciﬁcally, it costs

only 0.0110 second for GMSD to process an image of size

512 × 512, which is 3.5 times faster than SSIM, 47.9 times

faster than FSIM, and 106.7 times faster than VIF.

G. Discussions

Apart from being used purely for quality assessment tasks,

it is expected that an IQA algorithm can be more pervasively

used in many other applications. According to [1], the most

common applications of IQA algorithms can be categorized

as follows: 1) quality monitoring; 2) performance evaluation;

3) system optimization; and 4) perceptual ﬁdelity criteria on

visual signals. Quality monitoring is usually conducted by

using no reference IQA models, while FR-IQA models can be

applied to the other three categories. Certainly, SSIM proved

to be a milestone in the development of FR-IQA models. It

has been widely and successfully used in the performance

evaluation of many image processing systems and algorithms,

such as image compression, restoration and communication,

etc. Apart from performance evaluation, thus far, SSIM is not

yet pervasively used in other applications. The reason may

be two-fold, as discussed below. The proposed GMSD model

might alleviate these problems associated with SSIM, and has

potentials to be more pervasively used in a wider variety of

image processing applications.

First, SSIM is difﬁcult to optimize when it is used as a

ﬁdelity criterion on visual signals. This largely restricts its

applications in designing image processing algorithms such

as image compression and restoration. Recently, some works

[36]–[38] have been reported to adopt SSIM for image/video

perceptual compression. However, these methods are not “one-

pass” and they have high complexity. Compared with SSIM,

the formulation of GMSD is much simpler. The calculation

is mainly on the gradient magnitude maps of reference and

distorted image, and the correlation of the two maps. GMSD

can be more easily optimized than SSIM, and it has greater

potentials to be adopted as a ﬁdelity criterion for designing

perceptual image compression and restoration algorithms, as

well as for optimizing network coding and resource allocation

problems.

Second, the time and memory complexity of SSIM is

relatively high, restricting its use in applications where low-

cost and real-time implementation is required. GMSD is much

faster and more scalable than SSIM, and it can be easily

adopted for tasks such as real time performance evaluation,

system optimization, etc. Considering that mobile and portable

devices are becoming much more popular, the merits of

simplicity, low complexity and high accuracy of GMSD make

it very attractive and competitive for mobile applications.

In addition, it should be noted that with the rapid devel-

opment of digital image acquisition and display technologies,

and the increasing popularity of mobile devices and websites

such as YouTube and Facebook, current IQA databases may

not fully represent the way that human subjects view digital

images. On the other hand, the current databases, including the

694 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014

three largest ones TID2008, LIVE and CSIQ, mainly focus on

a few classical distortion types, and the images therein undergo

only a single type of distortion. Therefore, there is a demand

to establish new IQA databases, which should contain images

with multiple types of distortions [40], images collected from

mobile devices [41], and images of high deﬁnition.

IV. C

ONCLUSION

The usefulness and effectiveness of image gradient for full

reference image quality assessment (FR-IQA) were studied in

this paper. We devised a simple FR-IQA model called gradient

magnitude similarity deviation (GMSD), where the pixel-wise

gradient magnitude similarity (GMS) is used to capture image

local quality, and the standard deviation of the overall GMS

map is computed as the ﬁnal image quality index. Such a

standard deviation based pooling strategy is based on the

consideration that the variation of local quality, which arises

from the diversity of image local structures, is highly relevant

to subjective image quality. Compared with state-of-the-art

FR-IQA models, the proposed GMSD model performs better

in terms of both accuracy and efﬁciency, making GMSD an

ideal choice for high performance IQA applications.

EFERENCES

[1] Z. Wang, “Applications of objective image quality assessment methods

[applications corner],” IEEE Signal Process. Mag., vol. 28, no. 6,

pp. 137–142, Nov. 2011.

[2] B. Girod, “What’s wrong with mean-squared error?” in Digital

Images and Human Vision. Cambridge, MA, USA: MIT Press, 1993,

pp. 207–220.

[3] Z. Wang and A. C. Bovik, “Modern image quality assessment,” Synth.

Lect. Image, Video, Multimedia Process., vol. 2, no. 1, pp. 1–156, 2006.

[4] D.O.Kim,H.S.Han,andR.H.Park,“Gradientinformation-based

image quality metric,” IEEE Trans. Consum. Electron., vol. 56, no. 2,

pp. 930–936, May 2010.

[5] G. Q. Cheng, J. C. Huang, C. Zhu, Z. Liu, and L. Z. Cheng, “Perceptual

image quality assessment using a geometric structural distortion model,”

in Proc. 17th IEEE ICIP, Sep. 2010, pp. 325–328.

[6] G. H. Chen, C. L. Yang, and S. L. Xie, “Gradient-based structural

similarity for image quality assessment,” in Proc. 13th IEEE Int. Conf.

Image Process., Oct. 2006, pp. 2929–2932.

[7] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity

index for image quality assessment,” IEEE Trans. Image Process.,

vol. 20, no. 8, pp. 2378–2386, Aug. 2011.

[8] Z. Wang, A C. Bovik, and H. R. Sheikh, and E. P. Simoncelli, “Image

quality assessment: From error visibility to structural similarity,” IEEE

Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.

[9] Z. Wang and X. Shang, “Spatial pooling strategies for perceptual image

quality assessment,” in Proc. IEEE Int. Conf. Image Process., Sep. 2006,

pp. 2945–2948.

[10] A. K. Moorthy and A. C. Bovik, “Visual importance pooling for image

quality assessment,” IEEE J. Sel. Topics Signal Process., vol. 3, no. 2,

pp. 193–201, Apr. 2009.

[11] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. (2005). LIVE

Image Quality Assessment Database Release 2 [Online]. Available:

http://live.ece.utexas.edu/research/quality

[12] E. C. Larson and D. M. Chandler, “Most apparent distortion:

Full-reference image quality assessment and the role of strategy,” J.

Electron. Imaging, vol. 19, no. 1, pp. 011006-1–011006-21, Jan. 2010.

[13] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and

F. Battisti, “TID2008—A database for evaluation of full-reference visual

quality assessment metrics,” Adv. Modern Radio Electron., vol. 10, no. 4,

pp. 30–45, 2009.

[14] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “A comprehensive eval-

uation of full reference image quality assessment algorithms,” in Proc.

19th IEEE ICIP, Oct. 2012, pp. 1477–1480.

[15] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment based

on gradient similarity,” IEEE Trans. Image Process., vol. 21, no. 4,

pp. 1500–1512, Apr. 2012.

[16] Z. Wang and Q. Li, “Information content weighting for perceptual

image quality assessment,” IEEE Trans. Image Process., vol. 20, no. 5,

pp. 1185–1198, May 2011.

[17] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale struc-

tural similarity for image quality assessment,” in Proc. IEEE 37th

Conf. Rec. Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2003,

pp. 1398–1402.

[18] M. Zhang, X. Mou, and L. Zhang, “Non-shift edge based ratio

(NSER): An image quality assessment metric based on early vision

features,” IEEE Signal Process. Lett., vol. 18, no. 5, pp. 315–318,

May 2011.

[19] C. F. Li and A. C. Bovik, “Content-partitioned structural similarity index

for image quality assessment,” Signal Process., Image Commun., vol. 25,

no. 7, pp. 517–526, Aug. 2010.

[20] Y. Tong, H. Konik, F. A. Cheikh, and A. Tremeau, “Full reference image

quality assessment based on saliency map analysis,” J. Imaging Sci.,

vol. 54, no. 3, pp. 30503-1–30503-14, May 2010.

[21] (2003, Aug.). Final Report From the Video Quality Experts Group on

the Validation of Objective Models of Video Quality Assessment—Phase

II [Online]. Available: http://www.vqeg.org/

[22] H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information ﬁdelity

criterion for image quality assessment using natural scene statistics,”

IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, Dec. 2005.

[23] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”

IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.

[24] W. Xue and X. Mou, “An image quality assessment metric based on

non-shift edge,” in Proc. 18th IEEE ICIP, Sep. 2011, pp. 3309–3312.

[25] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it?—

A new look at signal ﬁdelity measures,” IEEE Signal Process. Mag.,

vol. 26, no. 1, pp. 98–117, Jan. 2009.

[26] A. L. Neuenschwander, M. M. Crawford, L. A. Magruder, C. A. Weed,

R. Cannata, D. Fried, et al., “Terrain classiﬁcation of LADAR data

over Haitian urban environments using a lower envelope follower and

adaptive gradient operator,” Proc. SPIE 7684, Laser Radar Technology

and Applications XV, 768408, May 2010.

[27] S. A. Coleman, B. W. Scotney, and S. Suganthan, “Multi-scale edge

detection on range and intensity images,” Pattern Recognit., vol. 44,

no. 4, pp. 821–838, Apr. 2011.

[28] N. Ehsan and R. K. Ward, “An efﬁcient method for robust gradient

estimation of RGB color images,” in Proc. 16th IEEE ICIP, Nov. 2009,

pp. 701–704.

[29] J. Park, K. Seshadrinathan, S. Lee, and A. C. Bovik, “VQpooling: Video

quality pooling adaptive to perceptual distortion severity,” IEEE Trans.

Image Process., vol. 22, no. 2, pp. 610–620, Feb. 2013.

[30] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics:

Asurvey,”J. Vis. Commun. Image Represent., vol. 22, no. 4,

pp. 297–312, May 2011.

[31] A. Ninassi, O. Le Meur, P. Le Callet, and D. Barbba, “Does where you

gaze on an image affect your perception of quality? Applying visual

attention to image quality metric,” in Proc. IEEE ICIP, vol. 2. Oct. 2007,

pp. 169–172.

[32] J. Ross and H. D. Speed, “Contrast adaptation and contrast masking in

human vision,” Proc., Biol. Sci., R. Soc., vol. 246, no. 1315, pp. 61–9,

Oct. 1991.

[33] S. J. Daly, “Application of a noise-adaptive contrast sensitivity function

to image data compression,” Opt. Eng., vol. 29, no. 8, pp. 977–987,

Aug. 1990.

[34] J. Lubin, “A human vision system model for objective picture quality

measurements,” in Proc. IBC, Jan. 1997, pp. 498–503.

[35] C. M. Jarque and A. K. Bera, “Efﬁcient tests for normality, homoscedas-

ticity and serial independence of regression residuals,” Econ. Lett.,vol.6,

no. 3, pp. 255–259, 1980.

[36] C. Wang, X. Mou, W. Hong, and L. Zhang, “Block-layer bit allocation

for quality constrained video encoding based on constant perceptual

quality,” Proc. SPIE 8666, Visual Information Processing and Commu-

nication IV, 86660J, Feb. 2013.

[37] T.-S. Ou, Y.-H. Huang, and H. H. Chen, “SSIM-based perceptual rate

control for video coding,” IEEE Trans. Circuits Syst. Video Technol.,

vol. 21, no. 5, pp. 682–691, May 2011.

[38] S. Wang, A. Rehman, Z. Wang, S. Ma, and W. Gao, “SSIM-motivated

rate-distortion optimization for video coding,” IEEE Trans. Circuits Syst.

Video Technol., vol. 22, no. 4, pp. 516–529, Apr. 2012.

XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 695

[39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. (2009).

The SSIM Index for Image Quality Assessment [Online]. Available:

http://www.cns.nyu.edu/lcv/ssim/ssim.m

[40] D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik.

(2012, Feb. 13). LIVE Multiply Distorted Image Quality

Database [Online]. Available: http://live.ece.utexas.edu/research/

quality/live_multidistortedimage.html

[41] A. K. Moorthy, L. K. Choi, A. C. Bovik, and G. deVeciana, “Video qual-

ity assessment on mobile devices: Subjective, behavioral and objective

studies,” IEEE J. Sel. Topics Signal Process., vol. 6, no. 6, pp. 652–671,

Oct. 2012.

[42] M.-J. Chen and A. C. Bovik, “Fast structural similarity index algorithm,”

J. Real-Time Image Process., vol. 6, no. 4, pp. 281–287, 2011.

[43] R. Soundararajan and A. C. Bovik, “RRED indices: Reduced reference

entropic differencing for image quality assessment,” IEEE Trans. Image

Process., vol. 21, no. 2, pp. 517–526, Feb. 2012.

Wufeng Xue received the B.Sc. degree in automatic

engineering from the School of Electronic and Infor-

mation Engineering, Xi’an Jiaotng University, Xi’an,

China, in 2009. He is currently pursuing the Ph.D.

degree with the Institute of Image Processing and

Pattern Recognition, Xi’an Jiaotong University. His

research interest focuses on perceptual quality of

visual signals.

Lei Zhang (M’04) received the B.Sc. degree from

the Shenyang Institute of Aeronautical Engineer-

ing, Shenyang, China, in 1995, and the M.Sc. and

Ph.D. degrees in control theory and engineering

from Northwestern Polytechnical University, Xi’an,

China, in 1998 and 2001, respectively. From 2001 to

2002, he was a Research Associate with the Depart-

ment of Computing, The Hong Kong Polytechnic

University. From 2003 to 2006, he was a Post-

Doctoral Fellow with the Department of Electrical

and Computer Engineering, McMaster University,

Canada. In 2006, he joined the Department of Computing, The Hong Kong

Polytechnic University, as an Assistant Professor. Since 2010, he has been

an Associate Professor with the same department. His research interests

include image and video processing, computer vision, pattern recognition, and

biometrics. He is an Associate Editor of the IEEE T

RANSACTIONS ON CIR-

CUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,the IEEETRANSACTIONS

SYSTEMS,MAN, AND CYBERNETICS PART C, and Image and Vision

Computing, and the Guest Editor of several special issues in international

journals. He received the 2013 Outstanding Award in Research and Scholarly

Activities, Faculty of Engineering, PolyU.

Xuanqin Mou (M’08) has been with the Insti-

tute of Image Processing and Pattern Recogni-

tion (IPPR), Electronic and Information Engineering

School, Xi’an Jiaotong University, since 1987. He

has been an Associate Professor since 1997, and a

Professor since 2002. He is currently the Director of

IPPR, and served as the member of the 12th Expert

Evaluation Committee for the National Natural Sci-

ence Foundation of China, the Member of the 5th

and 6th Executive Committee of China Society of

Image and Graphics, the Vice President of Shaanxi

Image and Graphics Association. He has authored or co-authored more than

200 peer-reviewed journal or conference papers. He has supervised more than

70 master and doctoral students. He has been granted as the Yung Wing Award

for Excellence in Education, the KC Wong Education Award, the Technology

Academy Award for Invention by the Ministry of Education of China, and

the Technology Academy Awards from the Government of Shaanxi Province,

China.

Alan C. Bovik (S’80–M’81–SM’89–F’96) is the

Curry/Cullen Trust Endowed Chair Professor with

the University of Texas at Austin, where he is the

Director of the Laboratory for Image and Video

Engineering, Department of Electrical and Computer

Engineering and the Center for Perceptual Systems,

Institute for Neuroscience. His research interests

include image and video processing and visual per-

ception. He has published more than 650 technical

articles and holds four U.S. patents. His several

books include the recent companion volumes The

Essential Guides to Image and Video Processing (Academic Press, 2009).

He has received numerous awards, including the IEEE Signal Processing

Society Best Paper Award in 2009, Education Award in 2007, Technical

Achievement Award in 2005, Meritorious Service Award in 1998, Honorary

Membership in the Society for Imaging Science and Technology in 2013, the

SPIE Technology Achievement Award in 2012, and the IS&T/SPIE Imaging

Scientist of the Year in 2011. He is a fellow of the Optical Society of

America and the Society of Photo-Optical and Instrumentation Engineers. He

co-founded and served as an Editor-in-Chief of the IEEE T

RANSACTIONSON

IMAGE PROCESSING from 1996 to 2002 and founded and served as the ﬁrst

General Chairman of the IEEE International Conference on Image Processing,

Austin, TX, USA, in 1994.

Modélisation Statistique des Cartes des Dissimilarités Locales d'Images et Applications Sous la direction de M. Frédéric MORAIN-NICOLIER et M. Florent RETRAINT Soutenance le 14/12/2022 devant le jury composé de

Thesis

Full-text available

Dec 2022

Moustapha Diaw

En raison de l’augmentation considérable des images dans la vie quotidienne, de nombreuses applications nécessitent une étude sur leur similarité. La Carte des Dissimilarités Locales (CDL) est une mesure, construite autour de la distance de Hausdorff, qui est très efficace pour localiser et quantifier les différences de structures entre les images. Cette mesure a été proposée par Baudrier et al. [1]. Avant cela, aucune solution spécifiquement locale n’a été proposée par la communauté scientifique. À partir d’une CDL, il est cependant difficile d’interpréter et de prendre une décision sur la similarité entre deux images. De plus, la mesure est mise en échec sur des images contenant à la fois des structures et des textures et le comportement statistique des valeurs de la CDL n’a jamais été étudié. Tout cela limitait ses domaines d’application. Cette thèse propose d’abord une distribution statistique pour modéliser les valeurs des niveaux de gris des CDL des images structurelles. Les deux paramètres de la distribution sont pertinents pour discriminer les paires d’images en classes similaires et dissimilaires. Des modèles d’apprentissage automatique et des tests statistiques sont utilisés pour classer les paires d’images. Mais, avant d’aborder les tests, une extension de l’approche au problème de classification d’images multi-classes est proposée. Ensuite, les mesures d’informations telles que l’Information Mutuelle (IM) et l’Information Disjointe (ID) sont utilisées pour adapter la CDL sur des images avec un mélange de structures et de textures. Nous proposons, enfin, d’appliquer la mesure au problème de détection de changements sur des séries d’images. Nous savons aussi que, de nos jours, de nombreuses images numériques sont falsifiées pour de la propagande ou pour cacher des informations importantes. La détection de ces falsifications intéresse donc de nombreux acteurs majeurs de la sécurité. Dans cette thèse, nous nous intéressons uniquement à la détection de falsifications par copier-coller. Toutes nos approches sont basées uniquement sur la CDL et essentiellement sur les deux paramètres de la distribution proposée. Elles sont pertinentes et certaines méthodes sont même comparées avec des approches d’apprentissage profond de l’état de l’art.

HEAL: High-Frequency Enhanced and Attention-Guided Learning Network for Sparse-View CT Reconstruction

Article

Full-text available

Jun 2024

X-ray computed tomography (CT) imaging technology has become an indispensable diagnostic tool in clinical examination. However, it poses a risk of ionizing radiation, making the reduction of radiation dose one of the current research hotspots in CT imaging. Sparse-view imaging, as one of the main methods for reducing radiation dose, has made significant progress in recent years. In particular, sparse-view reconstruction methods based on deep learning have shown promising results. Nevertheless, efficiently recovering image details under ultra-sparse conditions remains a challenge. To address this challenge, this paper proposes a high-frequency enhanced and attention-guided learning Network (HEAL). HEAL includes three optimization strategies to achieve detail enhancement: Firstly, we introduce a dual-domain progressive enhancement module, which leverages fidelity constraints within each domain and consistency constraints across domains to effectively narrow the solution space. Secondly, we incorporate both channel and spatial attention mechanisms to improve the network’s feature-scaling process. Finally, we propose a high-frequency component enhancement regularization term that integrates residual learning with direction-weighted total variation, utilizing directional cues to effectively distinguish between noise and textures. The HEAL network is trained, validated and tested under different ultra-sparse configurations of 60 views and 30 views, demonstrating its advantages in reconstruction accuracy and detail enhancement.

Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

Preprint

Full-text available

Jun 2024

Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and misdiagnosis. Additionally, current methods allocate the same computational resources to recover each pixel of pathology image, leading to the sub-optimal recovery issue due to the large variation of pathology image. In this paper, we propose the first hierarchical reinforcement learning framework named Spatial-Temporal hierARchical Reinforcement Learning (STAR-RL), mainly for addressing the aforementioned issues in pathology image super-resolution problem. We reformulate the SR problem as a Markov decision process of interpretable operations and adopt the hierarchical recovery mechanism in patch level, to avoid sub-optimal recovery. Specifically, the higher-level spatial manager is proposed to pick out the most corrupted patch for the lower-level patch worker. Moreover, the higher-level temporal manager is advanced to evaluate the selected patch and determine whether the optimization should be stopped earlier, thereby avoiding the over-processed problem. Under the guidance of spatial-temporal managers, the lower-level patch worker processes the selected patch with pixel-wise interpretable actions at each time step. Experimental results on medical images degraded by different kernels show the effectiveness of STAR-RL. Furthermore, STAR-RL validates the promotion in tumor diagnosis with a large margin and shows generalizability under various degradations. The source code is available at https://github.com/CUHK-AIM-Group/STAR-RL.

Unsupervised Domain Adaptation for Low-dose CT Reconstruction via Bayesian Uncertainty Alignment

Article

Full-text available

Jun 2024

Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance.

Deformation-aware image restoration from atmospheric turbulence based on quasiconformal geometry and pulse-coupled neural network

Preprint

Full-text available

May 2024

Atmospheric turbulence is a major factor in image degradation issues such as blurring, distortion and intensity fluctuations when monitoring long-range targets. The randomness, spatiotemporal variation and perturbations of turbulence make it challenging to restore vision-friendly and credible images from degraded image sequences. In this work, we address the problem by proposing a deformation-aware image restoration algorithm based on quasiconformal geometry and pulse-coupled neural network (PCNN). To accurately measure the magnitude of geometric deformation caused by turbulence, the deformation within degraded images is specified in a non-conformal distortion that disrupts local geometry. The Beltrami coefficient uniquely associated with the quasiconformal maps is applied to quantify the average distortion degree. The deformation-aware measurement minimizes registration errors in aligning degraded images by more reliable reconstruction of reference frames. Additionally, an improved PCNN model inspired by the primary visual cortex is developed to boost the perceptual quality of the restored image with lucky image fusion. The absence of manual parameter tuning and the ability to simultaneously process image sequences in the PCNN model enhance the robustness of the restoration algorithm. The performance of our algorithm is validated by experiments on physically simulated and real data, which contain 220 sequences with 22928 frames. The results show that our algorithm can yield a superior restoration through atmospheric turbulence compared with several state-of-the-art methods. The code is available at https://github.com/whuluojia/ImTurb.

Omnidirectional image quality assessment with local–global vision transformers

Article

Jun 2024
IMAGE VISION COMPUT

Shift-insensitive perceptual feature of quadratic sum of gradient magnitude and LoG signals for image quality assessment and image classification

Article

Jun 2024
J VIS COMMUN IMAGE R

I2QED: A Benchmark Database for Infrared Imaging Quality Evaluation

Chapter

Jun 2024

Compressed Point Cloud Quality Index by Combining Global Appearance and Local Details

Article

Jun 2024

In recent years, many standardized algorithms for point cloud compression (PCC) has been developed and achieved remarkable compression ratios. To provide guidance for rate-distortion optimization and codec evaluation, point cloud quality assessment (PCQA) has become a critical problem for PCC. Therefore, in order to achieve a more consistent correlation with human visual perception of a compressed point cloud, we propose a full-reference PCQA algorithm tailored for static point clouds in this paper, which can jointly measure geometry and attribute deformations. Specifically, we assume that the quality decision of compressed point clouds is determined by both global appearance (e.g., density, contrast, complexity) and local details (e.g., gradient, hole). Motivated by the nature of compression distortions and the properties of the human visual system, we derive perceptually effective features for the above two categories, such as content complexity, luminance/ geometry gradient, and hole probability. Through systematically incorporating measurements of variations in the local and global characteristics, we derive an effective quality index for the input compressed point clouds. Extensive experiments and analyses conducted on popular PCQA databases show the superiority of the proposed method in evaluating compression distortions. Subsequent investigations validate the efficacy of different components within the model design.

Exposure-Based contrast enhancement method for low contrast Images using the entropy curve

Conference Paper

Jun 2024

Terrain classification of LADAR data over Haitian urban environments using a lower envelope follower and adaptive gradient operator

Article

Full-text available

Apr 2010
Proceedings of SPIE

In response to the 2010 Haiti earthquake, the ALIRT ladar system was tasked with collecting surveys to support disaster relief efforts. Standard methodologies to classify the ladar data as ground, vegetation, or man-made features failed to produce an accurate representation of the underlying terrain surface. The majority of these methods rely primarily on gradient- based operations that often perform well for areas with low topographic relief, but often fail in areas of high topographic relief or dense urban environments. An alternative approach based on a adaptive lower envelope follower (ALEF) with an adaptive gradient operation for accommodating local slope and roughness was investigated for recovering the ground surface from the ladar data. This technique was successful for classifying terrain in the urban and rural areas of Haiti over which the ALIRT data had been acquired.

Image quality assessment: From error visibility to structural similarity

Article

Jan 2014

Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.

Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures

Article

Jan 2009
IEEE SIGNAL PROC MAG

Image information and visual quality

Conference Paper

Jan 2004

Measurement of image quality is crucial for many image-processing algorithms. Traditionally, image quality assessment algorithms predict visual quality by comparing a distorted image against a reference image, typically by modeling the human visual system (HVS), or by using arbitrary signal fidelity criteria. We adopt a new paradigm for image quality assessment. We propose an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself. We use natural scene statistics (NSS) modeling in concert with an image degradation model and an HVS model. We demonstrate the performance of our algorithm by testing it on a data set of 779 images, and show that our method is competitive with state of the art quality assessment methods, and outperforms them in our simulations.

A comprehensive evaluation of full reference image quality assessment algorithms

Conference Paper

Sep 2012
Image Process

Recent years have witnessed a growing interest in developing objective image quality assessment (IQA) algorithms that can measure the image quality consistently with subjective evaluations. For the full reference (FR) IQA problem, great progress has been made in the past decade. On the other hand, several new large scale image datasets have been released for evaluating FR IQA methods in recent years. Meanwhile, no work has been reported to evaluate and compare the performance of state-of-the-art and representative FR IQA methods on all the available datasets. In this paper, we aim to fulfill this task by reporting the performance of eleven selected FR IQA algorithms on all the seven public IQA image datasets. Our evaluation results and the associated discussions will be very helpful for relevant researchers to have a clearer understanding about the status of modern FR IQA indices. Evaluation results presented in this paper are also online available at http://sse.tongji.edu.cn/linzhang/IQA/IQA.htm.

Block-Layer Bit Allocation for Quality Constrained Video Encoding Based on Constant Perceptual Quality

Article

Feb 2013
Proceedings of SPIE

In lossy image/video encoding, there is a compromise between the number of bits (rate) and the extent of distortion. Bits need to be properly allocated to different sources, such as frames and macro blocks (MBs). Since the human eyes are more sensitive to the difference than the absolute value of signals, the MINMAX criterion suggests to minimizing the maximum distortion of the sources to limit quality fluctuation. There are many works aimed to such constant quality encoding, however, almost all of them focus on the frame layer bit allocation, and use PSNR as the quality index. We suggest that the bit allocation for MBs should also be constrained in the constant quality, and furthermore, perceptual quality indices should be used instead of PSNR. Based on this idea, we propose a multi-pass block-layer bit allocation scheme for quality constrained encoding. The experimental results show that the proposed method can achieve much better encoding performance. Keywords: Bit allocation, block-layer, perceptual quality, constant quality, quality constrained

Video Quality Assessment on Mobile Devices: Subjective, Behavioral and Objective Studies

Article

Oct 2012
IEEE J-STSP

We introduce a new video quality database that models video distortions in heavily-trafficked wireless networks and that contains measurements of human subjective impressions of the quality of videos. The new LIVE Mobile Video Quality Assessment (VQA) database consists of 200 distorted videos created from 10 RAW HD reference videos, obtained using a RED ONE digital cinematographic camera. While the LIVE Mobile VQA database includes distortions that have been previously studied such as compression and wireless packet-loss, it also incorporates dynamically varying distortions that change as a function of time, such as frame-freezes and temporally varying compression rates. In this article, we describe the construction of the database and detail the human study that was performed on mobile phones and tablets in order to gauge the human perception of quality on mobile devices. The subjective study portion of the database includes both the differential mean opinion scores (DMOS) computed from the ratings that the subjects provided at the end of each video clip, as well as the continuous temporal scores that the subjects recorded as they viewed the video. The study involved over 50 subjects and resulted in 5,300 summary subjective scores and time-sampled subjective traces of quality. In the behavioral portion of the article we analyze human opinion using statistical techniques, and also study a variety of models of temporal pooling that may reflect strategies that the subjects used to make the final decision on video quality. Further, we compare the quality ratings obtained from the tablet and the mobile phone studies in order to study the impact of these different display modes on quality. We also evaluate several objective image and video quality assessment (IQA/VQA) algorithms with regards to their efficacy in predicting visual quality. A detailed correlation analysis and statistical hypothesis testing is carried out. Our general conclusion is that existing VQA algori- hms are not well-equipped to handle distortions that vary over time. The LIVE Mobile VQA database, along with the subject DMOS and the continuous temporal scores is being made available to researchers in the field of VQA at no cost in order to further research in the area of video quality assessment.

Image Quality Assessment: From Error Visibility to Structural Similarity

Article

Sep 2004

Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.

Efficient Test for Normality, Heteroscedasticity and Serial Independence of Regression Residuals

Article

Jan 1980
ECON LETT

title>Application of a Noise Adaptive Contrast Sensitivity Function to Image Data Compression</title

Article

Jan 1990
Proceedings of SPIE

Scott Daly

The visual contrast sensitivity function (CSF) has found increasing use in image compression as new algorithms optimize the display-observer interface in order to reduce the bit rate and increase the perceived image quality. In most compression algorithms, increasing the quantization intervals reduces the bit rate at the expense of introducing more quantization error, a potential image quality degradation. The CSF can be used to distribute this error as a function of spatial frequency such that it is undetectable by the human observer. Thus, instead of being mathematically lossless, the compression algorithm can be designed to be visually lossless, with the advantage of a significantly reduced bit rate. However, the CSF is strongly affected by image noise, changing in both shape and peak sensitivity. This work describes a model of the CSF that includes these changes as a function of image noise level by using the concepts of internal visual noise, and tests this model in the context of image compression with an observer study.

Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index

Abstract and Figures

Recommended publications

Video Face Editing Using Temporal-Spatial-Smooth Warping

Latency Minimization for Synchronous Data Flow Graphs

Feature Evaluation of Deep Convolutional Neural Networks for Object Recognition and Detection

Stereo Acoustic Perception based on Real Time Video Acquisition for Navigational Assistance