ArticlePDF Available

Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index

Authors:

Abstract and Figures

Faithfully evaluating perceptual image quality is an important task in applications such as image compression, image restoration and multimedia streaming. A good image quality assessment (IQA) model is expected to be not only effective (i.e., deliver high quality prediction accuracy) but also computationally efficient. Owing to the need to deploy image quality measurement tools in high-speed networks, the efficiency of an IQA metric is particularly important due to the increasing proliferation of high-volume visual data. Here we develop and explain a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). Although the image gradient has been employed in other IQA models, few have achieved favorable performance in terms of both accuracy and efficiency. The results are proactive: we find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy on benchmark IQA databases. Matlab code that implements GMSD can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/ IQA/GMSD/GMSD.htm.
Content may be subject to copyright.
684 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
Gradient Magnitude Similarity Deviation: A Highly
Efficient Perceptual Image Quality Index
Wufeng Xue, Lei Zhang, Member, IEEE, Xuanqin Mou, Member, IEEE, and Alan C. Bovik, Fellow, IEEE
AbstractIt is an important task to faithfully evaluate the
perceptual quality of output images in many applications, such
as image compression, image restoration, and multimedia stream-
ing. A good image quality assessment (IQA) model should not
only deliver high quality prediction accuracy, but also be com-
putationally efficient. The efficiency of IQA metrics is becoming
particularly important due to the increasing proliferation of high-
volume visual data in high-speed networks. We present a new
effective and efficient IQA model, called gradient magnitude
similarity deviation (GMSD). The image gradients are sensitive to
image distortions, while different local structures in a distorted
image suffer different degrees of degradations. This motivates
us to explore the use of global variation of gradient based
local quality map for overall image quality prediction. We find
that the pixel-wise gradient magnitude similarity (GMS) between
the reference and distorted images combined with a novel
pooling strategy—the standard deviation of the GMS map—can
predict accurately perceptual image quality. The resulting GMSD
algorithm is much faster than most state-of-the-art IQA methods,
and delivers highly competitive prediction accuracy. MATLAB
source code of GMSD can be downloaded at http://www4.comp.
polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.
Index TermsGradient magnitude similarity, image quality
assessment, standard deviation pooling, full reference.
I. INTRODUCTION
I
T IS an indispensable step to evaluate the quality of
output images in many image processing applications such
as image acquisition, compression, restoration, transmission,
etc. Since human beings are the ultimate observers of the
processed images and thus the judges of image quality, it
is highly desired to develop automatic approaches that can
predict perceptual image quality consistently with human
Manuscript received February 28, 2013; revised August 14, 2013 and
November 13, 2013; accepted November 14, 2013. Date of publication
December 3, 2013; date of current version December 24, 2013. This work was
supported in part by the Natural Science Foundation of China under Grants
90920003 and 61172163, and in part by HK RGC General Research Fund
under Grant PolyU 5315/12E. The associate editor coordinating the review
of this manuscript and approving it for publication was Prof. Damon M.
Chandler.
W. Xue is with the Institute of Image Processing and Pattern Recognition,
Xi’an Jiaotong University, Xi’an 710049, China, and also with the Department
of Computing, The Hong Kong Polytechnic University, Hong Kong (e-mail:
xwolfs@hotmail.com).
L. Zhang is with the Department of Computing, The Hong Kong Polytechnic
University, Hong Kong (e-mail: cslzhang@comp.polyu.edu.hk).
X. Mou is with the Institute of Image Processing and Pattern
Recognition, Xi’an Jiaotong University, Xi’an 710049, China (e-mail:
xqmou@mail.xjtu.edu.cn).
A. C. Bovik is with the Department of Electrical and Computer Engineer-
ing, The University of Texas at Austin, Austin, TX 78712 USA (e-mail:
bovik@ece.utexas.edu).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2013.2293423
subjective evaluation. The traditional mean square error (MSE)
or peak signal to noise ratio (PSNR) correlates poorly with
human perception, and hence researchers have been devoting
much effort in developing advanced perception-driven image
quality assessment (IQA) models [2], [25]. IQA models can be
classified [3] into full reference (FR) ones, where the pristine
reference image is available, no reference ones, where the
reference image is not available, and reduced reference ones,
where partial information of the reference image is available.
This paper focuses on FR-IQA models, which are widely
used to evaluate image processing algorithms by measuring
the quality of their output images. A good FR-IQA model
can shape many image processing algorithms, as well as their
implementations and optimization procedures [1]. Generally
speaking, there are two strategies for FR-IQA model design.
The first strategy follows a bottom-up framework [3], [30],
which simulates the various processing stages in the visual
pathway of human visual system (HVS), including visual
masking effect [32], contrast sensitivity [33], just noticeable
differences [34], etc. However, HVS is too complex and
our current knowledge about it is far from enough to con-
struct an accurate bottom-up IQA framework. The second
strategy adopts a top-down framework [3], [30], [4]–[8],
which aims to model the overall function of HVS based
on some global assumptions on it. Many FR-IQA models
follow this framework. The well-known Structure SIMilarity
(SSIM) index [8] and its variants, Multi-Scale SSIM
(MS-SSIM) [17] and Information Weighted SSIM (IW-SSIM)
[16], assume that HVS tends to perceive the local structures in
an image when evaluating its quality. The Visual Information
Fidelity (VIF) [23] and Information Fidelity Criteria (IFC)
[22] treat HVS as a communication channel and they predict
the subjective image quality by computing how much the
information within the perceived reference image is preserved
in the perceived distorted one. Other state-of-the-art FR-IQA
models that follow the top-down framework include Ratio of
Non-shift Edges (rNSE) [18], [24], Feature SIMilarity (FSIM)
[7], etc. A comprehensive survey and comparison of state-of-
the-art IQA models can be found in [14] and [30].
Aside from the two different strategies for FR-IQA model
design, many IQA models share a common two-step frame-
work [4]–[8], [16] as illustrated in Fig. 1. First, a local quality
map (LQM) is computed by locally comparing the distorted
image with the reference image via some similarity function.
Then a single overall quality score is computed from the
LQM via some pooling strategy. The simplest and widely used
pooling strategy is average pooling, i.e., taking the average
1057-7149 © 2013 IEEE
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 685
Fig. 1. The flowchart of a class of two-step FR-IQA models.
of local quality values as the overall quality prediction score.
Since different regions may contribute differently to the overall
perception of an image’s quality, the local quality values
can be weighted to produce the final quality score. Example
weighting strategies include local measures of information
content [9], [16], content-based partitioning [19], assumed
visual fixation [20], visual attention [10] and distortion based
weighting [9], [10], [29]. Compared with average pooling,
weighted pooling can improve the IQA accuracy to some
extent; however, it may be costly to compute the weights.
Moreover, weighted pooling complicates the pooling process
and can make the predicted quality scores more nonlinear w.r.t.
the subjective quality scores (as shown in Fig. 5).
In practice, an IQA model should be not only effective
(i.e., having high quality prediction accuracy) but also effi-
cient (i.e., having low computational complexity). With the
increasing ubiquity of digital imaging and communication
technologies in our daily life, there is an increasing vast
amount of visual data to be evaluated. Therefore, efficiency
has become a critical issue of IQA algorithms. Unfortunately,
effectiveness and efficiency are hard to achieve simultaneously,
and most previous IQA algorithms can reach only one of the
two goals. Towards contributing to filling this need, in this
paper we develop an efficient FR-IQA model, called gradient
magnitude similarity deviation (GMSD). GMSD computes
the LQM by comparing the gradient magnitude maps of the
reference and distorted images, and uses standard deviation
as the pooling strategy to compute the final quality score.
The proposed GMSD is much faster than most state-of-the-art
FR-IQA methods, but supplies surprisingly competitive quality
prediction performance.
Using image gradient to design IQA models is not new. The
image gradient is a popular feature in IQA [4]–[7], [15], [19]
since it can effectively capture image local structures, to
which the HVS is highly sensitive. The most commonly
encountered image distortions, including noise corruption,
blur and compression artifacts, will lead to highly visible
structural changes that “pop out” of the gradient domain. Most
gradient based FR-IQA models [5]–[7], [15] were inspired
by SSIM [8]. They first compute the similarity between
the gradients of reference and distorted images, and then
compute some additional information, such as the difference
of gradient orientation, luminance similarity and phase con-
gruency similarity, to combine with the gradient similarity for
pooling. However, the computation of such additional infor-
mation can be expensive and often yields small performance
improvement.
Without using any additional information, we find that using
the image gradient magnitude alone can still yield highly
accurate quality prediction. The image gradient magnitude
is responsive to artifacts introduced by compression, blur or
additive noise, etc. (Please refer to Fig. 2 for some exam-
ples.) In the proposed GMSD model, the pixel-wise similarity
between the gradient magnitude maps of reference and dis-
torted images is computed as the LQM of the distorted image.
Natural images usually have diverse local structures, and
different structures suffer different degradations in gradient
magnitude. Based on the idea that the global variation of local
quality degradation can reflect the image quality, we propose
to compute the standard deviation of the gradient magnitude
similarity induced LQM to predict the overall image quality
score. The proposed standard deviation pooling based GMSD
model leads to higher accuracy than all state-of-the-art IQA
metrics we can find, and it is very efficient, making large scale
real time IQA possible.
The rest of the paper is organized as follows. Section II
presents the development of GMSD in detail. Section III
presents extensive experimental results, discussions and com-
putational complexity analysis of the proposed GMSD model.
Finally, Section IV concludes the paper.
II. G
RADIENT MAGNITUDE SIMILARITY DEVIATION
A. Gradient Magnitude Similarity
The image gradient has been employed for FR-IQA in
different ways [3]–[7], [15]. Most gradient based FR-IQA
methods adopt a similarity function which is similar to that in
SSIM [8] to compute gradient similarity. In SSIM, three types
of similarities are computed: luminance similarity (LS), con-
trast similarity (CS) and structural similarity (SS). The product
of the three similarities is used to predict the image local qual-
ity at a position. Inspired by SSIM, Chen et al. proposed gra-
dient SSIM (G-SSIM) [6]. They retained the LS term of SSIM
but applied the CS and SS similarities to the gradient mag-
nitude maps of reference image (denoted by r) and distorted
image (denoted by d). As in SSIM, average pooling is used in
G-SSIM to yield the final quality score. Cheng et al. [5]
proposed a geometric structure distortion (GSD) metric to
predict image quality, which computes the similarity between
the gradient magnitude maps, the gradient orientation maps
and contrasts of r and d. Average pooling is also used in
GSD. Liu et al. [15] also followed the framework of SSIM.
They predicted the image quality using a weighted summation
(i.e., a weighted pooling strategy is used) of the squared lumi-
nance difference and the gradient similarity. Zhang et al. [7]
combined the similarities of phase congruency maps and gra-
dient magnitude maps between r and d. A phase congruency
based weighted pooling method is used to produce the final
quality score. The resulting Feature SIMilarity (FSIM) model
is among the leading FR-IQA models in term of prediction
accuracy. However, the computation of phase congruency
features is very costly.
For digital images, the gradient magnitude is defined as the
root mean square of image directional gradients along two
orthogonal directions. The gradient is usually computed by
convolving an image with a linear filter such as the classic
Roberts, Sobel, Scharr and Prewitt filters or some task-specific
686 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
Fig. 2. Examples of reference (r) and distorted (d) images, their gradient magnitude images (m
r
and m
d
), and the associated gradient magnitude similarity
(GMS) maps, where brighter gray level means higher similarity. The highlighted regions (by red curve) are with clear structural degradations in the gradient
magnitude domain. From top to bottom, the four types of distortions are additive white noise (AWN), JPEG compression, JPEG2000 compression, and
Gaussian blur (GB). For each type of distortion, two images with different contents are selected from the LIVE database [11]. For each distorted image, its
subjective quality score (DMOS) and GMSD index are listed. Note that distorted images with similar DMOS scores have similar GMSD indices, though their
contents are totally different.
ones [26]–[28]. For simplicity of computation and to introduce
a modicum of noise-insensitivity, we utilize the Prewitt filter
to calculate the gradient because it is the simplest one among
the 3 × 3 template gradient filters. By using other filters such
as the Sobel and Scharr filters, the proposed method will have
similar IQA results. The Prewitt filters along horizontal (x)
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 687
Fig. 3. Comparison beween GMSM and GMSD as a subjective quality indicator. Note that like DMOS, GMSD is a distortion index (a lower DMOS/GMSD
value means higher quality), while GMSM is a quality index (a highr GMSM value means higher quality). (a) Original image Fishing, its Gaussian noise
contaminated version (DMOS=0.4403; GMSM=0.8853; GMSD=0.1420), and their gradient simiarity map. (b) Original image Flower, its blurred version
(DMOS=0.7785; GMSM=0.8745; GMSD=0.1946), and their gradient simiarity map. Based on the subjective DMOS, image Fishing has much higher quality
than image Flower. GMSD gives the correct judgement but GMSM fails.
and vertical (y) directions are defined as:
h
x
=
1/301/3
1/301/3
1/301/3
, h
y
=
1/31/31/3
000
1/3 1/3 1/3
(1)
Convolving h
x
and h
y
with the reference and distorted images
yields the horizontal and vertical gradient images of r and d.
The gradient magnitudes of r and d at location i, denoted by
m
r
(i) and m
d
(i), are computed as follows:
m
r
(i) =
(r h
x
)
2
(i) + (r h
y
)
2
(i) (2)
m
d
(i) =
(d h
x
)
2
(i) + (d h
y
)
2
(i) (3)
where symbol denotes the convolution operation.
With the gradient magnitude images m
r
and m
d
in hand,
the gradient magnitude similarity (GMS) map is computed as
follows:
GMS(i) =
2m
r
(i)m
d
(i) + c
m
2
r
(i) + m
2
d
(i) + c
(4)
where c is a positive constant that supplies numerical stability,
(The selection of c will be discussed in Section III-B.) The
GMS map is computed in a pixel-wise manner; nonetheless,
please note that a value m
r
(i) or m
d
(i) in the gradient
magnitude image is computed from a small local patch in the
original image r or d.
The GMS map serves as the local quality map (LQM) of the
distorted image d. Clearly, if m
r
(i) and m
d
(i) are the same,
GMS(i) will achieve the maximal value 1. Let’s use some
examples to analyze the GMS induced LQM. The most com-
monly encountered distortions in many real image processing
systems are JPEG compression, JPEG2000 compression, addi-
tive white noise (AWN) and Gaussian blur (GB). In Fig. 2, for
each of the four types of distortions, two reference images with
different contents and their corresponding distorted images
are shown (the images are selected from the LIVE database
[11]). Their gradient magnitude images (m
r
and m
d
) and the
corresponding GMS maps are also shown. In the GMS map,
the brighter the gray level, the higher the similarity, and thus
the higher the predicted local quality. These images contain
a variety of important structures such as large scale edges,
smooth areas and fine textures, etc. A good IQA model should
be adaptable to the broad array of possible natural scenes and
local structures.
In Fig. 2, examples of structure degradation are shown in
the gradient magnitude domain. Typical areas are highlighted
with red curves. From the first group, it can be seen that the
artifacts caused by AWN are masked in the large structure
and texture areas, while the artifacts are more visible in flat
areas. This is broadly consistent with human perception. In the
second group, the degradations caused by JPEG compression
are mainly blocking effects (see the background area of
image parrots and the wall area of image house)andloss
of fine details. Clearly, the GMS map is highly responsive to
these distortions. Regarding JPEG2000 compression, artifacts
are introduced in the vicinity of edge structures and in the
textured areas. Regarding GB, the whole GMS map is clearly
688 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
changed after image distortion. All these observations imply
that the image gradient magnitude is a highly relevant feature
for the task of IQA.
B. Pooling With Standard Deviation
The LQM reflects the local quality of each small patch
in the distorted image. The image overall quality score can
then be estimated from the LQM via a pooling stage. The
most commonly used pooling strategy is average pooling, i.e.,
simply averaging the LQM values as the final IQA score. We
refer to the IQA model by applying average pooling to the
GMS map as Gradient Magnitude Similarity Mean (GMSM):
GMSM =
1
N
N
i=1
GMS(i) (5)
where N is the total number of pixels in the image. Clearly,
a higher GMSM score means higher image quality. Average
pooling assumes that each pixel has the same importance
in estimating the overall image quality. As introduced in
Section I, researchers have devoted much effort to design
weighted pooling methods ([9], [10], [16], [19], [20], and
[29]); however, the improvement brought by weighted pooling
over average pooling is not always significant [31] and the
computation of weights can be costly.
We propose a new pooling strategy with the GMS map.
A natural image generally has a variety of local structures
in its scene. When an image is distorted, the different local
structures will suffer different degradations in gradient mag-
nitude. This is an inherent property of natural images. For
example, the distortions introduced by JPEG2000 compres-
sion include blocking, ringing, blurring, etc. Blurring will
cause less quality degradation in flat areas than in textured
areas, while blocking will cause higher quality degradation
in flat areas than in textured areas. However, the average
pooling strategy ignores this fact and it cannot reflect how
the local quality degradation varies. Based on the idea that
the global variation of image local quality degradation can
reflect its overall quality, we propose to compute the stan-
dard deviation of the GMS map and take it as the final
IQA index, namely Gradient Magnitude Similarity Deviation
(GMSD):
GMSD =
1
N
N
i=1
(
GMS(i) GMSM
)
2
(6)
Note that the value of GMSD reflects the range of distortion
severities in an image. The higher the GMSD score, the larger
the distortion range, and thus the lower the image perceptual
quality.
In Fig. 3, we show two reference images from the CSIQ
database [12], their distorted images and the corresponding
GMS maps. The first image Fishing is corrupted by additive
white noise, and the second image Flower is Gaussian blurred.
From the GMS map of distorted image Fishing, one can see
that its local quality is more homogenous, while from the
GMS map of distorted image Flower, one can see that its
local quality in the center area is much worse than at other
areas. The human subjective DMOS scores of the two distorted
images are 0.4403 and 0.7785, respectively, indicating that the
quality of the first image is obviously better than the second
one. (Note that like GMSD, DMOS also measures distortion;
the lower it is, the better the image quality.) By using GMSM,
however, the predicted quality scores of the two images are
0.8853 and 0.8745, respectively, indicating that the perceptual
quality of the first image is similar to the second one, which
is inconsistent to the subjective DMOS scores.
By using GMSD, the predicted quality scores of the two
images are 0.1420 and 0.1946, respectively, which is a con-
sistent judgment relative to the subjective DMOS scores, i.e.,
the first distorted image has better quality than the second
one. More examples of the consistency between GMSD and
DMOS can be found in Fig. 2. For each distortion type, the
two images of different contents have similar DMOS scores,
while their GMSD indices are also very close. These examples
validate that the deviation pooling strategy coupled with the
GMS quality map can accurately predict the perceptual image
quality.
III. E
XPERIMENTAL RESULTS AND ANALYSIS
A. Databases and Evaluation Protocols
The performance of an IQA model is typically evaluated
from three aspects regarding its prediction power [21]: predic-
tion accuracy, prediction monotonicity, and prediction consis-
tency. The computation of these indices requires a regression
procedure to reduce the nonlinearity of predicted scores. We
denote by Q, Q
p
and S the vectors of the original IQA scores,
the IQA scores after regression and the subjective scores,
respectively. The logistic regression function is employed for
the nonlinear regression [21]:
Q
p
= β
1
(
1
2
1
exp
2
(Q β
3
))
) + β
4
Q + β
5
(7)
where β
1
, β
2,
β
3,
β
4
and β
5
are regression model parameters.
After the regression, 3 correspondence indices can be
computed for performance evaluation [21]. The first one is
the Pearson linear Correlation Coefficient (PCC) between
Q
p
and S, which is to evaluate the prediction accuracy:
PCC(Q
P
, S) =
¯
Q
T
P
¯
S
¯
Q
T
P
¯
Q
P
¯
S
T
¯
S
(8)
where
¯
Q
P
and
¯
S are the mean-removed vectors of Q
P
and S,
respectively, and subscript T means transpose. The second
index is the Spearman Rank order Correlation coefficient
(SRC) between Q and S, which is to evaluate the prediction
monotonicity:
SRC(Q, S) = 1
6
n
i=1
d
2
i
n(n
2
1)
(9)
where d
i
is the difference between the ranks of each pair of
samples in Q and S,andn is the total number of samples.
Note that the logistic regression does not affect the SRC index,
and we can compute it before regression. The third index is
the root mean square error (RMSE) between Q
p
and S,which
is to evaluate the prediction consistency:
RMSE(Q
P
, S) =
(Q
P
S)
T
(Q
P
S)/n (10)
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 689
With the SRC, PCC and RMSE indices, we evaluate the
IQA models on three large scale and publicly accessible IQA
databases: LIVE [11], CSIQ [12], and TID2008 [13]. The
LIVE database consists of 779 distorted images generated
from 29 reference images. Five types of distortions are applied
to the reference images at various levels: JPEG2000 com-
pression, JPEG compression, additive white noise (AWN),
Gaussian blur (GB) and simulated fast fading Rayleigh chan-
nel (FF). These distortions reflect a broad range of image
impairments, for example, edge smoothing, block artifacts and
random noise. The CSIQ database consists of 30 reference
images and their distorted counterparts with six types of
distortions at five different distortion levels. The six types
of distortions include JPEG2000, JPEG, AWN, GB, global
contrast decrements (CTD), and additive pink Gaussian noise
(PGN). There are a total of 886 distorted images in it. The
TID2008 database is the largest IQA database to date. It has
1,700 distorted images, generated from 25 reference images
with 17 types of distortions at 4 levels. Please refer to [13]
for details of the distortions. Each image in these databases has
been evaluated by human subjects under controlled conditions,
and then assigned a quantitative subjective quality score: Mean
Opinion Score (MOS) or Difference MOS (DMOS).
To demonstrate the performance of GMSD, we com-
pare it with 11 state-of-the-art and representative FR-IQA
models, including PSNR, IFC [22], VIF [23], SSIM [8],
MS-SSIM [17], MAD [12], FSIM [7], IW-SSIM [16],
G-SSIM [6], GSD [5] and GS [15]. Among them, FSIM,
G-SSIM, GSD and GS explicitly exploit gradient information.
Except for G-SSIM and GSD, which are implemented by us,
the source codes of all the other models were obtained from the
original authors. To more clearly demonstrate the effectiveness
of the proposed deviation pooling strategy, we also present the
results of GMSM which uses average pooling. As in most of
the previous literature [7], [8], [16], [17], all of the competing
algorithms are applied to the luminance channel of the test
images.
B. Implementation of GMSD
The only parameter in the proposed GMSM and GMSD
models is the constant c in Eq. (4). Apart from ensuring the
numerical stability, the constant c also plays a role in mediat-
ing the contrast response in low gradient areas. We normalize
the pixel values of 8-bit luminance image into range [0, 1].
Fig. 4 plots the SRC curves against c by applying GMSD to the
LIVE, CSIQ and TID2008 databases. One can see that for all
the databases, GMSD shows similar preference to the value
of c. In our implementation, we set c=0.0026. In addition,
as in the implementations of SSIM [8] and FSIM [7], the
images r and d are first filtered by a 2 × 2 average filter, and
then down-sampled by a factor of 2. MATLAB source code
that implements GMSD can be downloaded at http://www4.
comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.
C. Performance Comparison
In Table I, we compare the competing IQA models’ perfor-
mance on each of the three IQA databases in terms of SRC,
Fig. 4. The performance of GMSD in terms of SRC vs. constant c on the
three databases.
PCC and RMSE. The top three models for each evaluation
criterion are shown in boldface. We can see that the top
models are mostly GMSD (8 times), MAD (6 times), FSIM
(5 times) and VIF (5 times). In terms of all the three criteria
(SRC, PCC and RMSE), the proposed GMSD outperforms
all the other models on the TID2008 and CSIQ databases.
On the LIVE database, MAD performs the best, and VIF,
FSIM and GMSD perform almost the same. Compared with
gradient based models such as GSD, G-SSIM and GS, GMSD
outperforms them by a large margin. Compared with GMSM,
the superiority of GMSD is obvious, demonstrating that the
proposed deviation pooling strategy works much better than
the average pooling strategy on the GMS induced LQM. The
FSIM algorithm also employs gradient similarity. It has similar
results to GMSD on the LIVE and TID2008 databases, but
lags GMSD on the CSIQ database with a lower SRC/PCC
and larger RMSE.
In Fig. 5, we show the scatter plots of predicted quality
scores against subjective DMOS scores for some representative
models (PSNR, VIF, GS, IW-SSIM, MS-SSIM, MAD, FSIM,
GMSM and GMSD) on the CSIQ database, which has six
types of distortions (AWN, JPEG, JPEG2000, PGN, GB and
CTD). One can observe that for FSIM, MAD, MS-SSIM,
GMSM, IW-SSIM and GS, the distribution of predicted scores
on the CTD distortion deviates much from the distributions on
other types of distortions, degrading their overall performance.
When the distortion is severe (i.e., large DMOS values), GS,
GMSM and PSNR yield less accurate quality predictions. The
information fidelity based VIF performs very well on the
LIVE database but not very well on the CSIQ and TID2008
databases. This is mainly because VIF does not predict the
images’ quality consistently across different distortion types
on these two databases, as can be observed from the scatter
plots with CSIQ database in Fig. 5.
In Table I, we also show the weighted average of SRC
and PCC scores by the competing FR-IQA models over
the three databases, where the weights were determined by
the sizes (i.e., number of images) of the three databases.
According to this, the top 3 models are GMSD, FSIM and
IW-SSIM. Overall, the proposed GMSD achieves outstanding
and consistent performance across the three databases.
In order to make statistically meaningful conclusions on
the models’ performance, we further conducted a series
of hypothesis tests based on the prediction residuals of
690 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
TABLE I
P
ERFORMANCE OF THE PROPOSED GMSD AND THE OTHER ELEVEN COMPETING FR-IQA MODELS IN TERMS OF SRC, PCC, AND RMSE
ON THE 3DATABASES.THE TOP THREE MODELS FOR EACH CRITERION ARE SHOWN IN BOLDFACE
Fig. 5. Scatter plots of predicted quality scores against the subjective quality scores (DMOS) by representative FR-IQA models on the CSIQ database.
The six types of distortions are represented by different shaped colors.
each model after nonlinear regression. The results of sig-
nificance tests are shown in Fig. 6. By assuming that the
model’s prediction residuals follow the Gaussian distribution
(the Jarque-Bera test [35] shows that only 3 models on LIVE
and 4 models on CSIQ violate this assumption), we apply
the left-tailed F-test to the residuals of every two models
to be compared. A value of H=1 for the left-tailed F-test
at a significance level of 0.05 means that the first model
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 691
Fig. 6. The results of statistical significance tests of the competing IQA models on the (a) LIVE, (b) CSIQ and (c) TID2008 databases. A value of ‘1’
(highlighted in green) indicates that the model in the row is significantly better than the model in the column, while a value of ‘0’ (highlighted in red) indicates
that the first model is not significantly better than the second one. Note that the proposed GMSD is significantly better than most of the competitors on all
the three databases, while no IQA model is significantly better than GMSD.
(indicated by the row in Fig. 6) has better IQA performance
than the second model (indicated by the column in Fig. 6)
with a confidence greater than 95%. A value of H =0 means
that the first model is not significantly better than the second
one. If H=0 always holds no matter which one of the two
models is taken as the first one, then the two models have
no significant difference in performance. Fig. 6(a)–(c) show
the significance test results on the LIVE, CSIQ and TID2008
databases, respectively. We see that on the LIVE database,
GMSD, VIF, GMSM and FSIM all perform very well and they
have no significant difference, while MAD performs the best
on this database. On the CSIQ database, GMSD is significantly
better than all the other models except for MAD. On the
TID2008 database, GMSD is significantly better than all the
other IQA models except for FSIM. Note that on all the three
databases, no IQA model performs significantly better than
GMSD except that MAD is significantly better than GMSD
on LIVE.
D. Performance Comparison on Individual Distortion Types
To more comprehensively evaluate an IQA model’s ability
to predict image quality degradations caused by specific types
of distortions, we compare the performance of competing
methods on each type of distortion. The results are listed
in Table II. To save space, only the SRC scores are shown.
There are a total of 28 groups of distorted images in the three
databases. In Table II, we use boldface font to highlight the
top 3 models in each group. One can see that GMSD is among
the top 3 models 14 times, followed by VIF and GS, which are
among the top 3 models 13 times and 11 times, respectively.
However, neither GS nor VIF ranks among the top 3 in terms
of overall performance on the 3 databases. The classical PSNR
also performs among the top 3 for 8 groups, and a common
point of these 8 groups is that they are all noise contaminated.
PSNR is, indeed, an effective measure of perceptual quality of
noisy images. However, PSNR is not able to faithfully measure
the quality of images impaired by other types of distortions.
Generally speaking, performing well on specific types of
distortions does not guarantee that an IQA model will perform
well on the whole database with a broad spectrum of distortion
types. A good IQA model should also predict the image quality
consistently across different types of distortions. Referring to
the scatter plots in Fig. 5, it can be seen that the scatter
plot of GMSD is more concentrated across different groups
of distortion types. For example, its points corresponding to
JPEG2000 and PGN distortions are very close to each other.
However, the points corresponding to JPEG2000 and PGN for
VIF are relatively far from each other. We can have similar
observations for GS on the distortion types of PGN and CTD.
This explains why some IQA models perform well for many
individual types of distortions but they do not perform well
on the entire databases; that is, these IQA models behave
rather differently on different types of distortions, which can
be attributed to the different ranges of quality scores for those
distortion types [43].
The gradient based models G-SSIM and GSD do not
show good performance on either many individual types of
distortions or the entire databases. G-SSIM computes the
local variance and covariance of gradient magnitude to gauge
contrast and structure similarities. This may not be an effective
use of gradient information. The gradient magnitude describes
the local contrast of image intensity; however, the image
local structures with different distortions may have similar
variance of gradient magnitude, making G-SSIM less effective
to distinguish those distortions. GSD combines the orientation
differences of gradient, the contrast similarity and the gradient
similarity; however, there is intersection between these kinds
of information, making GSD less discriminative of image
quality. GMSD only uses the gradient magnitude information
but achieves highly competitive results against the competing
methods. This validates that gradient magnitude, coupled with
the deviation pooling strategy, can serve as an excellent
predictive image quality feature.
E. Standard Deviation Pooling on Other IQA Models
As shown in previous sections, the method of standard
deviation (SD) pooling applied to the GMS map leads to
significantly elevated performance of image quality prediction.
692 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
TABLE II
P
ERFORMANCE COMPARISON OF THE IQA MODELS ON EACH INDIVIDUAL DISTORTION TYPE IN TERMS OF SRC
It is therefore natural to wonder whether the SD pooling
strategy can deliver similar performance improvement on other
IQA models. To explore this, we modified six representative
FR-IQA methods, all of which are able to generate an LQM
of the test image: MSE (which is equivalent to PSNR but
can produce an LQM), SSIM [8], MS-SSIM [17], FSIM [7],
G-SSIM [6] and GSD [5]. The original pooling strategies of
these methods are either average pooling or weighted pooling.
For MSE, SSIM, G-SSIM, GSD and FSIM, we directly applied
the SD pooling to their LQMs to yield the predicted quality
scores. For MS-SSIM, we applied SD pooling to its LQM on
each scale, and then computed the product of the predicted
scores on all scales as the final score. In Table III, the
SRC results of these methods by using their nominal pooling
strategies and the SD pooling strategy are listed.
Table III makes it clear that except for MSE, all the other
IQA methods fail to gain in performance by using SD pooling
instead of their nominal pooling strategies. The reason may be
that in these methods, the LQM is generated using multiple,
diverse types of features. The interaction between these fea-
tures may complicate the estimation of image local quality so
that SD pooling does not apply. By contrast, MSE and GMSD
use only the original intensity and the intensity of gradient
magnitude, respectively, to calculate the LQM.
F. Complexity
In applications such as real-time image/video quality mon-
itoring and prediction, the complexity of implemented IQA
models becomes crucial. We thus analyze the computational
complexity of GMSD, and then compare the competing IQA
models in terms of running time.
Suppose that an image has N pixels. The classical PSNR has
the lowest complexity, and it only requires N multiplications
and 2N additions. The main operations in the proposed GMSD
model include calculating image gradients (by convolving
the image with two 3 × 3 template integer filters), thereby
producing gradient magnitude maps, generating the GMS map,
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 693
TABLE III
SRC R
ESULTS OF SD POOLING ON SOME REPRESENTATIVE IQA MODELS
TABLE IV
R
UNNING TIME OF THE COMPETING IQA MODELS
and deviation pooling. Overall, it requires 19N multiplications
and 16N additions to yield the final quality score. Meanwhile,
it only needs to store at most 4 directional gradient images
(each of size N) in memory (at the gradient calculation
stage). Therefore, both the time and memory complexities
of GMSD are O(N). In other words, the time and memory
cost of GMSD scales linearly with image size. This is a
very attractive property since image resolutions have been
rapidly increasing with the development of digital imaging
technologies. In addition, the computation of image gradients
and GMS map can be parallelized by partitioning the reference
and distorted images into blocks if the image size is very large.
Table IV shows the running time of the 13 IQA models
on an image of size 512 × 512. All algorithms were run
on a ThinkPad T420S notebook with Intel Core i7-2600M
CPU@2.7GHz and 4G RAM. The software platform used
to run all algorithms was MATLAB R2010a (7.10). Apart
from G-SSIM and GSD, the MATLAB source codes of all
the other methods were obtained from the original authors.
(It should be noted that whether the code is optimized may
affect the running time of an algorithm.) Clearly, PSNR is the
fastest, followed by GMSM and GMSD. Specifically, it costs
only 0.0110 second for GMSD to process an image of size
512 × 512, which is 3.5 times faster than SSIM, 47.9 times
faster than FSIM, and 106.7 times faster than VIF.
G. Discussions
Apart from being used purely for quality assessment tasks,
it is expected that an IQA algorithm can be more pervasively
used in many other applications. According to [1], the most
common applications of IQA algorithms can be categorized
as follows: 1) quality monitoring; 2) performance evaluation;
3) system optimization; and 4) perceptual fidelity criteria on
visual signals. Quality monitoring is usually conducted by
using no reference IQA models, while FR-IQA models can be
applied to the other three categories. Certainly, SSIM proved
to be a milestone in the development of FR-IQA models. It
has been widely and successfully used in the performance
evaluation of many image processing systems and algorithms,
such as image compression, restoration and communication,
etc. Apart from performance evaluation, thus far, SSIM is not
yet pervasively used in other applications. The reason may
be two-fold, as discussed below. The proposed GMSD model
might alleviate these problems associated with SSIM, and has
potentials to be more pervasively used in a wider variety of
image processing applications.
First, SSIM is difficult to optimize when it is used as a
fidelity criterion on visual signals. This largely restricts its
applications in designing image processing algorithms such
as image compression and restoration. Recently, some works
[36]–[38] have been reported to adopt SSIM for image/video
perceptual compression. However, these methods are not “one-
pass” and they have high complexity. Compared with SSIM,
the formulation of GMSD is much simpler. The calculation
is mainly on the gradient magnitude maps of reference and
distorted image, and the correlation of the two maps. GMSD
can be more easily optimized than SSIM, and it has greater
potentials to be adopted as a fidelity criterion for designing
perceptual image compression and restoration algorithms, as
well as for optimizing network coding and resource allocation
problems.
Second, the time and memory complexity of SSIM is
relatively high, restricting its use in applications where low-
cost and real-time implementation is required. GMSD is much
faster and more scalable than SSIM, and it can be easily
adopted for tasks such as real time performance evaluation,
system optimization, etc. Considering that mobile and portable
devices are becoming much more popular, the merits of
simplicity, low complexity and high accuracy of GMSD make
it very attractive and competitive for mobile applications.
In addition, it should be noted that with the rapid devel-
opment of digital image acquisition and display technologies,
and the increasing popularity of mobile devices and websites
such as YouTube and Facebook, current IQA databases may
not fully represent the way that human subjects view digital
images. On the other hand, the current databases, including the
694 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 2, FEBRUARY 2014
three largest ones TID2008, LIVE and CSIQ, mainly focus on
a few classical distortion types, and the images therein undergo
only a single type of distortion. Therefore, there is a demand
to establish new IQA databases, which should contain images
with multiple types of distortions [40], images collected from
mobile devices [41], and images of high definition.
IV. C
ONCLUSION
The usefulness and effectiveness of image gradient for full
reference image quality assessment (FR-IQA) were studied in
this paper. We devised a simple FR-IQA model called gradient
magnitude similarity deviation (GMSD), where the pixel-wise
gradient magnitude similarity (GMS) is used to capture image
local quality, and the standard deviation of the overall GMS
map is computed as the final image quality index. Such a
standard deviation based pooling strategy is based on the
consideration that the variation of local quality, which arises
from the diversity of image local structures, is highly relevant
to subjective image quality. Compared with state-of-the-art
FR-IQA models, the proposed GMSD model performs better
in terms of both accuracy and efficiency, making GMSD an
ideal choice for high performance IQA applications.
R
EFERENCES
[1] Z. Wang, Applications of objective image quality assessment methods
[applications corner], IEEE Signal Process. Mag., vol. 28, no. 6,
pp. 137–142, Nov. 2011.
[2] B. Girod, “What’s wrong with mean-squared error?” in Digital
Images and Human Vision. Cambridge, MA, USA: MIT Press, 1993,
pp. 207–220.
[3] Z. Wang and A. C. Bovik, “Modern image quality assessment, Synth.
Lect. Image, Video, Multimedia Process., vol. 2, no. 1, pp. 1–156, 2006.
[4] D.O.Kim,H.S.Han,andR.H.Park,“Gradientinformation-based
image quality metric, IEEE Trans. Consum. Electron., vol. 56, no. 2,
pp. 930–936, May 2010.
[5] G. Q. Cheng, J. C. Huang, C. Zhu, Z. Liu, and L. Z. Cheng, “Perceptual
image quality assessment using a geometric structural distortion model,
in Proc. 17th IEEE ICIP, Sep. 2010, pp. 325–328.
[6] G. H. Chen, C. L. Yang, and S. L. Xie, “Gradient-based structural
similarity for image quality assessment, in Proc. 13th IEEE Int. Conf.
Image Process., Oct. 2006, pp. 2929–2932.
[7] L. Zhang, L. Zhang, X. Mou, and D. Zhang, “FSIM: A feature similarity
index for image quality assessment, IEEE Trans. Image Process.,
vol. 20, no. 8, pp. 2378–2386, Aug. 2011.
[8] Z. Wang, A C. Bovik, and H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: From error visibility to structural similarity, IEEE
Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[9] Z. Wang and X. Shang, “Spatial pooling strategies for perceptual image
quality assessment, in Proc. IEEE Int. Conf. Image Process., Sep. 2006,
pp. 2945–2948.
[10] A. K. Moorthy and A. C. Bovik, “Visual importance pooling for image
quality assessment, IEEE J. Sel. Topics Signal Process., vol. 3, no. 2,
pp. 193–201, Apr. 2009.
[11] H. R. Sheikh, Z. Wang, L. Cormack, and A. C. Bovik. (2005). LIVE
Image Quality Assessment Database Release 2 [Online]. Available:
http://live.ece.utexas.edu/research/quality
[12] E. C. Larson and D. M. Chandler, “Most apparent distortion:
Full-reference image quality assessment and the role of strategy, J.
Electron. Imaging, vol. 19, no. 1, pp. 011006-1–011006-21, Jan. 2010.
[13] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and
F. Battisti, “TID2008—A database for evaluation of full-reference visual
quality assessment metrics, Adv. Modern Radio Electron., vol. 10, no. 4,
pp. 30–45, 2009.
[14] L. Zhang, L. Zhang, X. Mou, and D. Zhang, A comprehensive eval-
uation of full reference image quality assessment algorithms, in Proc.
19th IEEE ICIP, Oct. 2012, pp. 1477–1480.
[15] A. Liu, W. Lin, and M. Narwaria, “Image quality assessment based
on gradient similarity, IEEE Trans. Image Process., vol. 21, no. 4,
pp. 1500–1512, Apr. 2012.
[16] Z. Wang and Q. Li, “Information content weighting for perceptual
image quality assessment, IEEE Trans. Image Process., vol. 20, no. 5,
pp. 1185–1198, May 2011.
[17] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale struc-
tural similarity for image quality assessment, in Proc. IEEE 37th
Conf. Rec. Asilomar Conf. Signals, Syst. Comput., vol. 2. Nov. 2003,
pp. 1398–1402.
[18] M. Zhang, X. Mou, and L. Zhang, “Non-shift edge based ratio
(NSER): An image quality assessment metric based on early vision
features, IEEE Signal Process. Lett., vol. 18, no. 5, pp. 315–318,
May 2011.
[19] C. F. Li and A. C. Bovik, “Content-partitioned structural similarity index
for image quality assessment,Signal Process., Image Commun., vol. 25,
no. 7, pp. 517–526, Aug. 2010.
[20] Y. Tong, H. Konik, F. A. Cheikh, and A. Tremeau, “Full reference image
quality assessment based on saliency map analysis, J. Imaging Sci.,
vol. 54, no. 3, pp. 30503-1–30503-14, May 2010.
[21] (2003, Aug.). Final Report From the Video Quality Experts Group on
the Validation of Objective Models of Video Quality Assessment—Phase
II [Online]. Available: http://www.vqeg.org/
[22] H. R. Sheikh, A. C. Bovik, and G. de Veciana, An information fidelity
criterion for image quality assessment using natural scene statistics,
IEEE Trans. Image Process., vol. 14, no. 12, pp. 2117–2128, Dec. 2005.
[23] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,
IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
[24] W. Xue and X. Mou, An image quality assessment metric based on
non-shift edge, in Proc. 18th IEEE ICIP, Sep. 2011, pp. 3309–3312.
[25] Z. Wang and A. C. Bovik, “Mean squared error: Love it or leave it?—
A new look at signal fidelity measures, IEEE Signal Process. Mag.,
vol. 26, no. 1, pp. 98–117, Jan. 2009.
[26] A. L. Neuenschwander, M. M. Crawford, L. A. Magruder, C. A. Weed,
R. Cannata, D. Fried, et al., “Terrain classification of LADAR data
over Haitian urban environments using a lower envelope follower and
adaptive gradient operator, Proc. SPIE 7684, Laser Radar Technology
and Applications XV, 768408, May 2010.
[27] S. A. Coleman, B. W. Scotney, and S. Suganthan, “Multi-scale edge
detection on range and intensity images, Pattern Recognit., vol. 44,
no. 4, pp. 821–838, Apr. 2011.
[28] N. Ehsan and R. K. Ward, An efficient method for robust gradient
estimation of RGB color images, in Proc. 16th IEEE ICIP, Nov. 2009,
pp. 701–704.
[29] J. Park, K. Seshadrinathan, S. Lee, and A. C. Bovik, “VQpooling: Video
quality pooling adaptive to perceptual distortion severity, IEEE Trans.
Image Process., vol. 22, no. 2, pp. 610–620, Feb. 2013.
[30] W. Lin and C.-C. Jay Kuo, “Perceptual visual quality metrics:
Asurvey,J. Vis. Commun. Image Represent., vol. 22, no. 4,
pp. 297–312, May 2011.
[31] A. Ninassi, O. Le Meur, P. Le Callet, and D. Barbba, “Does where you
gaze on an image affect your perception of quality? Applying visual
attention to image quality metric, in Proc. IEEE ICIP, vol. 2. Oct. 2007,
pp. 169–172.
[32] J. Ross and H. D. Speed, “Contrast adaptation and contrast masking in
human vision, Proc., Biol. Sci., R. Soc., vol. 246, no. 1315, pp. 61–9,
Oct. 1991.
[33] S. J. Daly, Application of a noise-adaptive contrast sensitivity function
to image data compression, Opt. Eng., vol. 29, no. 8, pp. 977–987,
Aug. 1990.
[34] J. Lubin, A human vision system model for objective picture quality
measurements, in Proc. IBC, Jan. 1997, pp. 498–503.
[35] C. M. Jarque and A. K. Bera, “Efficient tests for normality, homoscedas-
ticity and serial independence of regression residuals,Econ. Lett.,vol.6,
no. 3, pp. 255–259, 1980.
[36] C. Wang, X. Mou, W. Hong, and L. Zhang, “Block-layer bit allocation
for quality constrained video encoding based on constant perceptual
quality, Proc. SPIE 8666, Visual Information Processing and Commu-
nication IV, 86660J, Feb. 2013.
[37] T.-S. Ou, Y.-H. Huang, and H. H. Chen, “SSIM-based perceptual rate
control for video coding, IEEE Trans. Circuits Syst. Video Technol.,
vol. 21, no. 5, pp. 682–691, May 2011.
[38] S. Wang, A. Rehman, Z. Wang, S. Ma, and W. Gao, “SSIM-motivated
rate-distortion optimization for video coding,IEEE Trans. Circuits Syst.
Video Technol., vol. 22, no. 4, pp. 516–529, Apr. 2012.
XUE et al.: GRADIENT MAGNITUDE SIMILARITY DEVIATION 695
[39] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. (2009).
The SSIM Index for Image Quality Assessment [Online]. Available:
http://www.cns.nyu.edu/lcv/ssim/ssim.m
[40] D. Jayaraman, A. Mittal, A. K. Moorthy, and A. C. Bovik.
(2012, Feb. 13). LIVE Multiply Distorted Image Quality
Database [Online]. Available: http://live.ece.utexas.edu/research/
quality/live_multidistortedimage.html
[41] A. K. Moorthy, L. K. Choi, A. C. Bovik, and G. deVeciana, “Video qual-
ity assessment on mobile devices: Subjective, behavioral and objective
studies, IEEE J. Sel. Topics Signal Process., vol. 6, no. 6, pp. 652–671,
Oct. 2012.
[42] M.-J. Chen and A. C. Bovik, “Fast structural similarity index algorithm,
J. Real-Time Image Process., vol. 6, no. 4, pp. 281–287, 2011.
[43] R. Soundararajan and A. C. Bovik, “RRED indices: Reduced reference
entropic differencing for image quality assessment, IEEE Trans. Image
Process., vol. 21, no. 2, pp. 517–526, Feb. 2012.
Wufeng Xue received the B.Sc. degree in automatic
engineering from the School of Electronic and Infor-
mation Engineering, Xi’an Jiaotng University, Xi’an,
China, in 2009. He is currently pursuing the Ph.D.
degree with the Institute of Image Processing and
Pattern Recognition, Xi’an Jiaotong University. His
research interest focuses on perceptual quality of
visual signals.
Lei Zhang (M’04) received the B.Sc. degree from
the Shenyang Institute of Aeronautical Engineer-
ing, Shenyang, China, in 1995, and the M.Sc. and
Ph.D. degrees in control theory and engineering
from Northwestern Polytechnical University, Xi’an,
China, in 1998 and 2001, respectively. From 2001 to
2002, he was a Research Associate with the Depart-
ment of Computing, The Hong Kong Polytechnic
University. From 2003 to 2006, he was a Post-
Doctoral Fellow with the Department of Electrical
and Computer Engineering, McMaster University,
Canada. In 2006, he joined the Department of Computing, The Hong Kong
Polytechnic University, as an Assistant Professor. Since 2010, he has been
an Associate Professor with the same department. His research interests
include image and video processing, computer vision, pattern recognition, and
biometrics. He is an Associate Editor of the IEEE T
RANSACTIONS ON CIR-
CUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,the IEEETRANSACTIONS
ON
SYSTEMS,MAN, AND CYBERNETICS PART C, and Image and Vision
Computing, and the Guest Editor of several special issues in international
journals. He received the 2013 Outstanding Award in Research and Scholarly
Activities, Faculty of Engineering, PolyU.
Xuanqin Mou (M’08) has been with the Insti-
tute of Image Processing and Pattern Recogni-
tion (IPPR), Electronic and Information Engineering
School, Xi’an Jiaotong University, since 1987. He
has been an Associate Professor since 1997, and a
Professor since 2002. He is currently the Director of
IPPR, and served as the member of the 12th Expert
Evaluation Committee for the National Natural Sci-
ence Foundation of China, the Member of the 5th
and 6th Executive Committee of China Society of
Image and Graphics, the Vice President of Shaanxi
Image and Graphics Association. He has authored or co-authored more than
200 peer-reviewed journal or conference papers. He has supervised more than
70 master and doctoral students. He has been granted as the Yung Wing Award
for Excellence in Education, the KC Wong Education Award, the Technology
Academy Award for Invention by the Ministry of Education of China, and
the Technology Academy Awards from the Government of Shaanxi Province,
China.
Alan C. Bovik (S’80–M’81–SM’89–F’96) is the
Curry/Cullen Trust Endowed Chair Professor with
the University of Texas at Austin, where he is the
Director of the Laboratory for Image and Video
Engineering, Department of Electrical and Computer
Engineering and the Center for Perceptual Systems,
Institute for Neuroscience. His research interests
include image and video processing and visual per-
ception. He has published more than 650 technical
articles and holds four U.S. patents. His several
books include the recent companion volumes The
Essential Guides to Image and Video Processing (Academic Press, 2009).
He has received numerous awards, including the IEEE Signal Processing
Society Best Paper Award in 2009, Education Award in 2007, Technical
Achievement Award in 2005, Meritorious Service Award in 1998, Honorary
Membership in the Society for Imaging Science and Technology in 2013, the
SPIE Technology Achievement Award in 2012, and the IS&T/SPIE Imaging
Scientist of the Year in 2011. He is a fellow of the Optical Society of
America and the Society of Photo-Optical and Instrumentation Engineers. He
co-founded and served as an Editor-in-Chief of the IEEE T
RANSACTIONSON
IMAGE PROCESSING from 1996 to 2002 and founded and served as the first
General Chairman of the IEEE International Conference on Image Processing,
Austin, TX, USA, in 1994.
... La similarité des images consiste à rechercher des images similaires à partir d'une image de référence. Ainsi, la notion de similarité d'image a été utilisée, dans les dernières années, dans plusieurs domaines : évaluation de qualité [6,7,8,9], analyse et recherche d'images [10,11], recalage [12,13], classication [14,15,16], détection [17,18], etc. Dans diverses applications où les informations partielles de l'image de référence sont uniquement accessibles, l' évaluation de la qualité des images à référence réduite constitue une solution pratique. Pour trouver une mesure de qualité adéquate, les auteurs de [6] ont estimé l'indice de similarité structurelle, qui est une mesure de référence complète largement utilisée dans la littérature. ...
... Ils développent ainsi une mesure de distorsion en suivant la philosophie de la construction de l'indice de similarité structurelle. En raison de la sensibilité des gradients de l'image aux distortions et que diérentes structures locales dans une image déformée subissent diérents degrés de dégradation Xue et al. [7] ont proposé un modèle appelé la déviation de similarité de l'amplitude du gradient pour évaluer la qualité globale de l'image. La similarité de l'amplitude du gradient pixel par pixel entre l'image de référence et l'image déformée, combinée avec la déviation standard de la carte de similarité de l'amplitude du gradient prédit avec précision la qualité perceptive de l'image. ...
Thesis
Full-text available
En raison de l’augmentation considérable des images dans la vie quotidienne, de nombreuses applications nécessitent une étude sur leur similarité. La Carte des Dissimilarités Locales (CDL) est une mesure, construite autour de la distance de Hausdorff, qui est très efficace pour localiser et quantifier les différences de structures entre les images. Cette mesure a été proposée par Baudrier et al. [1]. Avant cela, aucune solution spécifiquement locale n’a été proposée par la communauté scientifique. À partir d’une CDL, il est cependant difficile d’interpréter et de prendre une décision sur la similarité entre deux images. De plus, la mesure est mise en échec sur des images contenant à la fois des structures et des textures et le comportement statistique des valeurs de la CDL n’a jamais été étudié. Tout cela limitait ses domaines d’application. Cette thèse propose d’abord une distribution statistique pour modéliser les valeurs des niveaux de gris des CDL des images structurelles. Les deux paramètres de la distribution sont pertinents pour discriminer les paires d’images en classes similaires et dissimilaires. Des modèles d’apprentissage automatique et des tests statistiques sont utilisés pour classer les paires d’images. Mais, avant d’aborder les tests, une extension de l’approche au problème de classification d’images multi-classes est proposée. Ensuite, les mesures d’informations telles que l’Information Mutuelle (IM) et l’Information Disjointe (ID) sont utilisées pour adapter la CDL sur des images avec un mélange de structures et de textures. Nous proposons, enfin, d’appliquer la mesure au problème de détection de changements sur des séries d’images. Nous savons aussi que, de nos jours, de nombreuses images numériques sont falsifiées pour de la propagande ou pour cacher des informations importantes. La détection de ces falsifications intéresse donc de nombreux acteurs majeurs de la sécurité. Dans cette thèse, nous nous intéressons uniquement à la détection de falsifications par copier-coller. Toutes nos approches sont basées uniquement sur la CDL et essentiellement sur les deux paramètres de la distribution proposée. Elles sont pertinentes et certaines méthodes sont même comparées avec des approches d’apprentissage profond de l’état de l’art.
... Three commonly used objective image quality assessment metrics are used for quantitative evaluation of reconstruction performance: root mean square error (RMSE), peak signalto-noise ratio (PSNR) and structural similarity index (SSIM) [44]. Additionally, we have introduced two assessment metrics in our experiments that have been demonstrated to be consistent with subjective evaluations [45], namely Visual Information Fidelity (VIF) [46] and Gradient Magnitude Similarity Deviation (GMSD) [47]. Higher values of PSNR, SSIM and VIF, and lower values of RMSE and GMSD indicate better performance. ...
Article
Full-text available
X-ray computed tomography (CT) imaging technology has become an indispensable diagnostic tool in clinical examination. However, it poses a risk of ionizing radiation, making the reduction of radiation dose one of the current research hotspots in CT imaging. Sparse-view imaging, as one of the main methods for reducing radiation dose, has made significant progress in recent years. In particular, sparse-view reconstruction methods based on deep learning have shown promising results. Nevertheless, efficiently recovering image details under ultra-sparse conditions remains a challenge. To address this challenge, this paper proposes a high-frequency enhanced and attention-guided learning Network (HEAL). HEAL includes three optimization strategies to achieve detail enhancement: Firstly, we introduce a dual-domain progressive enhancement module, which leverages fidelity constraints within each domain and consistency constraints across domains to effectively narrow the solution space. Secondly, we incorporate both channel and spatial attention mechanisms to improve the network’s feature-scaling process. Finally, we propose a high-frequency component enhancement regularization term that integrates residual learning with direction-weighted total variation, utilizing directional cues to effectively distinguish between noise and textures. The HEAL network is trained, validated and tested under different ultra-sparse configurations of 60 views and 30 views, demonstrating its advantages in reconstruction accuracy and detail enhancement.
... This motivates us to design a framework to allocate more computational resources to non-flat regions and fewer resources for flat regions to provide adequate effort to each region. 1 Structural similarity index (SSIM) globally assesses the image quality by comparing luminance, contrast and structure. 2 Gradient Magnitude Similarity Deviation (GMSD) [28] and feature similarity (FSIM) [29] can measure the significance of local structure. To address the abovementioned challenges, we present a framework named Spatial-Temporal hierARchical Reinforcement Learning (STAR-RL) for interpretable pathology image super-resolution, which reformulates image SR problem as the Markov decision process and attempts to tackle it with hierarchical reinforcement learning. ...
Preprint
Full-text available
Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and misdiagnosis. Additionally, current methods allocate the same computational resources to recover each pixel of pathology image, leading to the sub-optimal recovery issue due to the large variation of pathology image. In this paper, we propose the first hierarchical reinforcement learning framework named Spatial-Temporal hierARchical Reinforcement Learning (STAR-RL), mainly for addressing the aforementioned issues in pathology image super-resolution problem. We reformulate the SR problem as a Markov decision process of interpretable operations and adopt the hierarchical recovery mechanism in patch level, to avoid sub-optimal recovery. Specifically, the higher-level spatial manager is proposed to pick out the most corrupted patch for the lower-level patch worker. Moreover, the higher-level temporal manager is advanced to evaluate the selected patch and determine whether the optimization should be stopped earlier, thereby avoiding the over-processed problem. Under the guidance of spatial-temporal managers, the lower-level patch worker processes the selected patch with pixel-wise interpretable actions at each time step. Experimental results on medical images degraded by different kernels show the effectiveness of STAR-RL. Furthermore, STAR-RL validates the promotion in tumor diagnosis with a large margin and shows generalizability under various degradations. The source code is available at https://github.com/CUHK-AIM-Group/STAR-RL.
... The former includes the peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) [60], where PSNR can measure the ratio between the maximum possible power of a signal and the power of the noise, and SSIM can take into account the differences of luminance, contrast, and structural information between two images. The latter includes the gradient magnitude similarity deviation (GMSD) [61] and discrete cosine transform-based sub-bands similarity index (DSS) [62], which respectively derive from the perspectives of image gradients and structure information, achieving the approximation of the human visual system. For ISICDM-20, the absolute error (AR) between the average CT value of reconstructed images and referenced NDCT images is computed. ...
Article
Full-text available
Low-dose computed tomography (LDCT) image reconstruction techniques can reduce patient radiation exposure while maintaining acceptable imaging quality. Deep learning is widely used in this problem, but the performance of testing data (a.k.a. target domain) is often degraded in clinical scenarios due to the variations that were not encountered in training data (a.k.a. source domain). Unsupervised domain adaptation (UDA) of LDCT reconstruction has been proposed to solve this problem through distribution alignment. However, existing UDA methods fail to explore the usage of uncertainty quantification, which is crucial for reliable intelligent medical systems in clinical scenarios with unexpected variations. Moreover, existing direct alignment for different patients would lead to content mismatch issues. To address these issues, we propose to leverage a probabilistic reconstruction framework to conduct a joint discrepancy minimization between source and target domains in both the latent and image spaces. In the latent space, we devise a Bayesian uncertainty alignment to reduce the epistemic gap between the two domains. This approach reduces the uncertainty level of target domain data, making it more likely to render well-reconstructed results on target domains. In the image space, we propose a sharpness-aware distribution alignment to achieve a match of second-order information, which can ensure that the reconstructed images from the target domain have similar sharpness to normal-dose CT images from the source domain. Experimental results on two simulated datasets and one clinical low-dose imaging dataset show that our proposed method outperforms other methods in quantitative and visualized performance.
... Two categories of metrics are adopted to evaluate the performance of the algorithms, i.e., pixel-based and perception-based metrics. Specifically, pixelbased metrics contain PSNR, SSIM, feature similarity index measure (FSIM), and gradient magnitude similarity deviation (GMSD) [67], which focus on the similarity of each pixel between the restored image and ground-truth image. The perception-based metrics primarilly measure the image quality according to the human perceptual preference. ...
Preprint
Full-text available
Atmospheric turbulence is a major factor in image degradation issues such as blurring, distortion and intensity fluctuations when monitoring long-range targets. The randomness, spatiotemporal variation and perturbations of turbulence make it challenging to restore vision-friendly and credible images from degraded image sequences. In this work, we address the problem by proposing a deformation-aware image restoration algorithm based on quasiconformal geometry and pulse-coupled neural network (PCNN). To accurately measure the magnitude of geometric deformation caused by turbulence, the deformation within degraded images is specified in a non-conformal distortion that disrupts local geometry. The Beltrami coefficient uniquely associated with the quasiconformal maps is applied to quantify the average distortion degree. The deformation-aware measurement minimizes registration errors in aligning degraded images by more reliable reconstruction of reference frames. Additionally, an improved PCNN model inspired by the primary visual cortex is developed to boost the perceptual quality of the restored image with lucky image fusion. The absence of manual parameter tuning and the ability to simultaneously process image sequences in the PCNN model enhance the robustness of the restoration algorithm. The performance of our algorithm is validated by experiments on physically simulated and real data, which contain 220 sequences with 22928 frames. The results show that our algorithm can yield a superior restoration through atmospheric turbulence compared with several state-of-the-art methods. The code is available at https://github.com/whuluojia/ImTurb.
Article
In recent years, many standardized algorithms for point cloud compression (PCC) has been developed and achieved remarkable compression ratios. To provide guidance for rate-distortion optimization and codec evaluation, point cloud quality assessment (PCQA) has become a critical problem for PCC. Therefore, in order to achieve a more consistent correlation with human visual perception of a compressed point cloud, we propose a full-reference PCQA algorithm tailored for static point clouds in this paper, which can jointly measure geometry and attribute deformations. Specifically, we assume that the quality decision of compressed point clouds is determined by both global appearance (e.g., density, contrast, complexity) and local details (e.g., gradient, hole). Motivated by the nature of compression distortions and the properties of the human visual system, we derive perceptually effective features for the above two categories, such as content complexity, luminance/ geometry gradient, and hole probability. Through systematically incorporating measurements of variations in the local and global characteristics, we derive an effective quality index for the input compressed point clouds. Extensive experiments and analyses conducted on popular PCQA databases show the superiority of the proposed method in evaluating compression distortions. Subsequent investigations validate the efficacy of different components within the model design.
Article
Full-text available
In response to the 2010 Haiti earthquake, the ALIRT ladar system was tasked with collecting surveys to support disaster relief efforts. Standard methodologies to classify the ladar data as ground, vegetation, or man-made features failed to produce an accurate representation of the underlying terrain surface. The majority of these methods rely primarily on gradient- based operations that often perform well for areas with low topographic relief, but often fail in areas of high topographic relief or dense urban environments. An alternative approach based on a adaptive lower envelope follower (ALEF) with an adaptive gradient operation for accommodating local slope and roughness was investigated for recovering the ground surface from the ladar data. This technique was successful for classifying terrain in the urban and rural areas of Haiti over which the ALIRT data had been acquired.
Article
Objective methods for assessing perceptual image quality have traditionally attempted to quantify the visibility of errors between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MatLab implementation of the proposed algorithm is available online at http://www.cns.nyu.edu/~lcv/ssim/.
Conference Paper
Measurement of image quality is crucial for many image-processing algorithms. Traditionally, image quality assessment algorithms predict visual quality by comparing a distorted image against a reference image, typically by modeling the human visual system (HVS), or by using arbitrary signal fidelity criteria. We adopt a new paradigm for image quality assessment. We propose an information fidelity criterion that quantifies the Shannon information that is shared between the reference and distorted images relative to the information contained in the reference image itself. We use natural scene statistics (NSS) modeling in concert with an image degradation model and an HVS model. We demonstrate the performance of our algorithm by testing it on a data set of 779 images, and show that our method is competitive with state of the art quality assessment methods, and outperforms them in our simulations.
Conference Paper
Recent years have witnessed a growing interest in developing objective image quality assessment (IQA) algorithms that can measure the image quality consistently with subjective evaluations. For the full reference (FR) IQA problem, great progress has been made in the past decade. On the other hand, several new large scale image datasets have been released for evaluating FR IQA methods in recent years. Meanwhile, no work has been reported to evaluate and compare the performance of state-of-the-art and representative FR IQA methods on all the available datasets. In this paper, we aim to fulfill this task by reporting the performance of eleven selected FR IQA algorithms on all the seven public IQA image datasets. Our evaluation results and the associated discussions will be very helpful for relevant researchers to have a clearer understanding about the status of modern FR IQA indices. Evaluation results presented in this paper are also online available at http://sse.tongji.edu.cn/linzhang/IQA/IQA.htm.
Article
In lossy image/video encoding, there is a compromise between the number of bits (rate) and the extent of distortion. Bits need to be properly allocated to different sources, such as frames and macro blocks (MBs). Since the human eyes are more sensitive to the difference than the absolute value of signals, the MINMAX criterion suggests to minimizing the maximum distortion of the sources to limit quality fluctuation. There are many works aimed to such constant quality encoding, however, almost all of them focus on the frame layer bit allocation, and use PSNR as the quality index. We suggest that the bit allocation for MBs should also be constrained in the constant quality, and furthermore, perceptual quality indices should be used instead of PSNR. Based on this idea, we propose a multi-pass block-layer bit allocation scheme for quality constrained encoding. The experimental results show that the proposed method can achieve much better encoding performance. Keywords: Bit allocation, block-layer, perceptual quality, constant quality, quality constrained
Article
We introduce a new video quality database that models video distortions in heavily-trafficked wireless networks and that contains measurements of human subjective impressions of the quality of videos. The new LIVE Mobile Video Quality Assessment (VQA) database consists of 200 distorted videos created from 10 RAW HD reference videos, obtained using a RED ONE digital cinematographic camera. While the LIVE Mobile VQA database includes distortions that have been previously studied such as compression and wireless packet-loss, it also incorporates dynamically varying distortions that change as a function of time, such as frame-freezes and temporally varying compression rates. In this article, we describe the construction of the database and detail the human study that was performed on mobile phones and tablets in order to gauge the human perception of quality on mobile devices. The subjective study portion of the database includes both the differential mean opinion scores (DMOS) computed from the ratings that the subjects provided at the end of each video clip, as well as the continuous temporal scores that the subjects recorded as they viewed the video. The study involved over 50 subjects and resulted in 5,300 summary subjective scores and time-sampled subjective traces of quality. In the behavioral portion of the article we analyze human opinion using statistical techniques, and also study a variety of models of temporal pooling that may reflect strategies that the subjects used to make the final decision on video quality. Further, we compare the quality ratings obtained from the tablet and the mobile phone studies in order to study the impact of these different display modes on quality. We also evaluate several objective image and video quality assessment (IQA/VQA) algorithms with regards to their efficacy in predicting visual quality. A detailed correlation analysis and statistical hypothesis testing is carried out. Our general conclusion is that existing VQA algori- hms are not well-equipped to handle distortions that vary over time. The LIVE Mobile VQA database, along with the subject DMOS and the continuous temporal scores is being made available to researchers in the field of VQA at no cost in order to further research in the area of video quality assessment.
Article
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
Article
The visual contrast sensitivity function (CSF) has found increasing use in image compression as new algorithms optimize the display-observer interface in order to reduce the bit rate and increase the perceived image quality. In most compression algorithms, increasing the quantization intervals reduces the bit rate at the expense of introducing more quantization error, a potential image quality degradation. The CSF can be used to distribute this error as a function of spatial frequency such that it is undetectable by the human observer. Thus, instead of being mathematically lossless, the compression algorithm can be designed to be visually lossless, with the advantage of a significantly reduced bit rate. However, the CSF is strongly affected by image noise, changing in both shape and peak sensitivity. This work describes a model of the CSF that includes these changes as a function of image noise level by using the concepts of internal visual noise, and tests this model in the context of image compression with an observer study.