Content uploaded by Hafiz Malik
Author content
All content in this area was uploaded by Hafiz Malik
Content may be subject to copyright.
Steganalysis of QIM Steganography Using Irregularity
Measure
Hafiz Malik
Electrical and Computer Engineering Department
University of Michigan -Dearborn, Dearborn, MI 48128
hafiz@umd.umich.edu
ABSTRACT
This paper presents a nonparametric steganalysis technique
to attack quantization index modulation (QIM) steganogra-
phy and JSteg steganographic tool. The proposed scheme
is based on the observation that message embedding using
QIM introduces local irregularity (or randomness) in the
cover-object. Presented steganalysis technique exploits rich
spatial/temporal correlation in the multimedia-objects to es-
timate local irregularity in the test-object. The underlying
density function based on local irregularity in the test-object
is estimated in a systematic manner using a kernel density
estimate (KDE) method. The Tsallis-divergence, a para-
metric divergence method, is used to quantify irregularity
in the test-object. The Tsallis-divergence between the den-
sity function estimated from the test-object and its doubly-
quantized version is used to distinguish between the cover
and the stego. The impact of the choice of message em-
bedding parameters such as quantization step-size, quality
factor, etc. on the accuracy of the steganalysis detection for
gray scale images is also evaluated. Simulation results pre-
sented for these message embedding parameters show that
the proposed method can successfully distinguish between
the quantized-cover and the QIM-stego with low false alarm
rates. Detection performance of the proposed steganalysis
scheme is also evaluated for JSteg steganographic tool.
Categories and Subject Descriptors
I.4 [Image Processing]: Miscellaneous
General Terms
Security
Keywords
Steganography, Steganalysis, Quantization Index Modula-
tion, Kernel Density Estimation, Higher-Order Statistics,
Divergence, Tsallis-Entropy, Kullback-Liebler Divergence,
Tsallis-Divergence
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
MM&Sec’08, September 22–23, 2008, Oxford, United Kingdom.
Copyright 2008 ACM 978-1-60558-058-6/08/09 ...$5.00.
1. INTRODUCTION
A steganographic information embedding process encodes
a message into the cover-object so that the resulting stego-
object is perceptually as well as statistically similar to the
cover-object. Rapid proliferation of digital media and the
high degree of redundancy in digital representation (despite
compression) are some of the motivations for using multime-
dia data as cover-objects for steganographic applications.
There are more than 200 stego softwares available on the
Internet ranging from freeware to sophisticated commercial
products. Many of the existing stego softwares use least
significant bit (LSB) steganography for message embedding.
Researchers in the steganographic community have also de-
veloped complex and more sophisticated steganographic tech-
niques that are robust to active warden and/or statistical at-
tacks. For example, Sallee et al. in [17] and Fridrich et al. in
[12] have proposed model based steganographic techniques
which insert secret message into the cover-images without
perturbing the statistical features (such as density function)
of the cover-image significantly.
The quantization index modulation (QIM) based data hid-
ing [8], on the other hand, provides a flexible trade-off among
robustness, capacity, and security of the hidden message.
Costa’s seminal work [9] provides the theoretical basis of
QIM data hiding where the theoretical capacity of the com-
munication with side information over a Gaussian channel
is derived. The ideal Costa scheme (ICS) gives a theoreti-
cal upper bound on the data hiding capacity under additive
white Gaussian noise (AWGN) attack. However, infinite
length random codebook requirement makes ICS impracti-
cal [11]. A few practical realizations of ICS include, QIM,
scalar Costa scheme (SCS), dither modulation (DM)[11],
and quantization projection (QP) [16].
Steganalysis refers to the analysis of a given multimedia
data (e.g. image, video, audio etc) for the presence of the
hidden message with limited or no access to information
about the embedding algorithm used. Steganalysis tech-
niques may be classified into passive or active depending on
whether the aim of the steganalyst is to detect the pres-
ence or the absence of the hidden message only or to extract
the hidden message itself. To date, there appears to have
been limited investigation of issues related to steganalysis
of QIM steganography. Guillon et al [13] proposed a frame-
work for steganalysis of SCS by modeling QIM steganogra-
phy as an additive noise channel. Sullivan et al [19] proposed
a steganalysis scheme for QIM steganography using super-
vised learning techniques. Detection performance of learning
based steganalysis schemes, however, are limited by several
factors such as 1) detecting zero-day attack [1], i.e. detect-
ing a stego algorithm not used during the training phase,
is impossible, 2) learning based techniques require separate
classifier training for each steganographic algorithm, 3) the
detection performance depends on the selection of features
used to train the classifier, and 4) there is no systematic
rule for feature selection to achieve desired detection per-
formance [7]. Therefore, the steganalyst has limited control
on the achievable detector performance. Recently Malik et
al in [14, 15] have proposed non-learning based steganalysis
schemes to attack QIM steganography to address limitations
of learning based steganalysis techniques for QIM steganog-
raphy. However, relatively high false positive rate is one of
the limitations of the proposed scheme in [14] and scheme
proposed in [15] cannot detect stego-objects obtained using
steganographic tools such as JSteg [4], JP-Hide-and-Seek [2],
MP3Stego [3], etc.
In this paper we propose a nonparametric steganalysis
scheme for QIM steganography. We assume stego-only at-
tack (SOA) model, that is, the steganalyst have access to
the stego-object to make a decision. Here, a specific exam-
ple of image steganalysis is considered, though the proposed
method is applicable to other types of data. For QIM-stego
detection, the proposed nonparametric steganalysis scheme
exploits the following facts:
•natural images exhibit strong local-similarity and quan-
tization further increases local-similarity in the result-
ing quantized-image,
•QIM-stego image exhibits higher level of randomness
than the corresponding quantized-cover image (obtained
using plain-quantization, that is, quantization without
message embedding), and
•embedding an arbitrary message, M0, using QIM, in
an already quantized-image introduces disturbance in
the local-similarity in the resulting doubly-quantized
image, xQIM (2) . The amount of disturbance due to
re-embedding is a function of whether the quantized-
image is a quantized-cover or a QIM-stego
We have shown in Section 2 that re-embedding introduces
relatively larger disturbance in the underlying density func-
tion based on local-randomness in the resulting doubly-quantized
image if the test-image is a quantized-cover than the corre-
sponding QIM-stego.
To capture traces of message embedding in the QIM-stego,
a local-similarity-based randomness mask estimation method
is proposed. A kernel density estimate (KDE) method is
used to estimate the underlying density function for the
local-randomness mark estimated from the test-image. The
Tallis’s q-divergence [20] is used to quantify statistical dis-
turbance due to message embedding in the test-image and
the doubly-quantized image. Impact of the choice of mes-
sage embedding parameters on the steganalysis detection ac-
curacy is also evaluated. Simulation results evaluated on a
large dataset, consisting of 12000 test-images show that the
proposed method can detect QIM-stego with low false rates.
The propose steganalysis scheme is also applied to attack
JSteg steganographic tool [4]. Detection results presented
here indicates that the proposed scheme can reliably detect
the stego-image generated using JSteg steganographic tool
[4].
Rest of the paper is organized as follows: Section 2 high-
lights irregularities that QIM steganography introduces in
the resulting QIM-stego image. Details of the proposed
steganalysis framework are provided in Section 3. Local-
similarity-based randomness mask estimation algorithm is
discussed in Section 3.1. Details of nonparametric density
estimation from the estimated randomness mask using KDE
are discussed in Section 3.2. A brief overview of Tsallis
statistics and its use for stego-detection is provided in Sec-
tion 3.3. Detection performance of the proposed steganal-
ysis scheme for digital images in DCT domain embedding,
is given in Section 4.1. Details of attacking JSteg stegano-
graphic tool [4] are given in Section 4.2. Future directions
and concluding remarks are given in Section 5.
2. FOOTPRINTS OF QIM STEGANOGRA-
PHY
Objective of the steganalyst to attack QIM steganography
is to decide whether, a given test-image is quantized-cover
(xq) or quantized with message embedding (e.g. QIM-stego,
xQIM ). Recently Malik et al in [14] have made following
observations on the difference between the QIM-stego and
the quantized-cover:
•Firstly, we in in [14] have noted that the quantiza-
tion (with and without message embedding) introduces
smoothness in the probability mass function(pmf ) of
the cover-image. In order to illustrate this claim, em-
pirical pmf of DCT coefficients of the cover and the
corresponding QIM-stego obtained using quantization
step-size, ∆ = {0.5,4,8}are plotted in Fig. 1. It can
be observed from Fig. 1 that as ∆ increases the em-
pirical pmf of the resulting QIM-stego changes from
asuper-Gaussian like pmf (e.g. Laplacian pmf ) to a
more Gaussian like pmf.
−50 0 50
0
0.1
0.2
0.3
0.4 Cover Image
−50 0 50
0
0.05
0.1
0.15
0.2 QIM−Stego (∆ = 0.5)
−50 0 50
0
0.05
0.1
0.15
0.2
0.25
QIM−Stego (∆ = 4)
−50 0 50
0
0.05
0.1
0.15
0.2
0.25
QIM−Stego (∆ = 8)
Figure 1: Empirical pmf based on histogram of
DCT coefficients of the cover (top-left) and quan-
tized DCT coefficients of QIM-stego obtained with
∆={0.5, 4, 8}(top-right, bottom-left, and bottom-
right respectively)
•Secondly, quantization step-size, ∆, controls the level
of smoothness introduced in the pmf of the resulting
quantized-image.
•Finally, for quantization with message embedding (e.g.
QIM) introduces more smoothness than the plain -
quantization. To investigate smoothing effect on the
cover pmf due to quantization further, the empiri-
cal pmf of the quantized-cover and the QIM-stego are
plotted in Fig. 2.
−50 0 50
0
0.05
0.1
0.15
0.2
0.25
0.3
Quantized−Cover (∆ = 0.5)
−50 0 50
0
0.05
0.1
0.15
0.2 QIM−Stego (∆ = 0.5)
−50 0 50
0
0.1
0.2
0.3
0.4
0.5
Quantized−Cover (∆ = 4)
−50 0 50
0
0.05
0.1
0.15
0.2
0.25
QIM−Stego (∆ = 4)
Figure 2: Empirical pmf of the quantized-cover
(left) and the corresponding QIM-stego (right) both
obtained with ∆={0.5, 4}
It can be observed from Fig. 2 that for same ∆, the
QIM-stego image exhibits smoother pmf than the cor-
responding the quantized-cover. Moreover, for large
∆, e.g. ∆ ≥4, message embedding using QIM splits
the peak of the cover pmf around zero into three peaks
(say p−∆, p0, p∆centered at −∆,0,∆ respectively), which
can be used to distinguish between the quantized-cover
and the QIM-stego. However, such visual attack does
not guarantee successful stego detection, especially when
smaller ∆ is used for message embedding and/or the
cover-image has either uniform or Gaussian like pmf.
Learning-based steganalysis techniques have been proposed
in the past [19] to distinguish between the quantized-cover
and the QIM-stego but as noted earlier, there are some in-
herent disadvantages with these steganalysis schemes. Re-
cently Malik et al in [14, 15] have proposed non-learning
based steganalysis schemes to attack QIM steganography,
their proposed scheme in [14] has relatively high false pos-
itive rate and the scheme presented in [15] cannot detect
stego-images obtained using steganographic tools such as
JSteg [4], JP-Hide-and-Seek [2], MP3Stego [3], etc. The pro-
posed steganalysis scheme here intend to address limitations
of the QIM steganalysis schemes proposed in [14, 15].
2.1 Embedding Messages in the Quantized–
Cover and the QIM–Stego
As discussed in Section 1 that quantization of natural im-
ages reduces randomness in the resulting quantized-images,
and QIM-stego image exhibits relatively higher level of ran-
domness than the corresponding quantized-image. In addi-
tion, embedding an arbitrary message, M0, in a the quan-
tized test-image, xt, using QIM with parameter ∆, intro-
duces disturbance in the statistics of the underlying density
function based on local-randomness of the resulting doubly-
quantized image, xQIM(2) . The amount of disturbance in
the underlying local-randomness-based density function of
the doubly-quantized image depends on whether xt=xqor
xt=xQIM . We have observed that if xt=xq, then embed-
ding, M0, (using QIM) introduces relatively larger distur-
bance in first- and higher-order statistics of local-randomness-
based density function, fxQIM (2) (x), estimated from xQIM (2) ,
than embedding same message in the corresponding QIM-
stego. To verify this claim, plots of the local-randomness-
based density functions estimated from the quantized–cover
(ˆ
fxq), the QIM-stego ( ˆ
fxQIM ), and their corresponding doubly–
quantized versions, are given in Fig. 3. Here the local-
randomness-based density functions, ˆ
fxq,ˆ
fxQIM , and ˆ
fxQIM (2)
are estimated using method discussed in Section 3.2 and 3.1.
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
Rcx
fx(x)
fXq
fXQIM(2)
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Rcx
fx(x)
fXQIM
fXQIM(2)
Figure 3: Estimated density functions from xq,
xQIM , and the corresponding doubly-quantized im-
ages, xQIM (2)
It can be observed from Fig. 12 that embedding M0in
a quantized-cover, xqincreases local-randomness in the re-
sulting xQIM (2) (see Fig. 3 (top)). Whereas, embedding
same message, M0, (independent of M) in a QIM-stego
image introduces relatively small disturbance in the local-
randomness in the resulting xQIM (2) . Statistical disturbance
due to embedding M0(using QIM) in both the quantized-
cover and the QIM-stego can be explained as follow:
Without loss of generality, consider an N-point cover-
signal, s={si}N
i=1, si∈ R, uniformly distributed over (−∆
2,∆
2)
where ∆ >0. Let xqis generated by quantizing susing a
uniform quantizer with parameter, ∆. And, let xQI M is ob-
tained by embedding M(with P r[m= 0] = P r[m= 1] = 1
2
) using QIM with parameter ∆. Here quantized-sequence
xqconsists of all zeros, whereas, quantized-sequence xQIM
consists of approximately N
2nonzero and N
2zero quantized
points. This means that the xQIM has higher level of ran-
domness than the xq, though both are quantized using pa-
rameter, ∆.
Now to show effect of re-embedding, let xQI M (2)
qis ob-
tained by embedding M0(with P r[m0= 0] = P r[m0=
1] = 1
2) in xqusing QIM with parameter ∆. Here xQI M (2)
q
contains N
2nonzero points. Therefore, xQIM (2)
qwould have
higher level of randomness than xqand Fig. 12 (top) also
supports this claim. As first-order statistics (mode) of the
local-randomness-based density function ˆ
fxQIM (2) is higher
relative to the to ˆ
fxq. Here higher level of randomness in
ˆ
fxQIM (2) can be attributed to randomness in M0.
Similarly, let xQIM (2)
QIM
is obtained by embedding M0(with
P r[m0= 0] = P r[m0= 1] = 1
2) in xQIM using QIM with
parameter ∆. Here, embedding M0in xQI M using QIM does
not introduce significant disturbance of points at zero but it
does change distribution at nonzero points while keeping to-
tal number of nonzero points same, that is, both xQIM and
xQIM (2) have approximately same number of nonzero quan-
tized points, i.e. N
2nonzero quantized points. Therefore,
re–embedding an independent message in QIM-stego intro-
duces spreading of the underlying local-randomness-based
density function without changing its mode value. Fig. 12
(bottom) supports this claim as well, i.e., estimated den-
sity functions, ˆ
fxQIM ,ˆ
fxQIM (2) are reasonably close with
approximately same mode values.
3. PROPOSED STEGANALYSIS MODEL
The objective of nonparametric steganalysis system is to
detect the stego-image reliably without learning characteris-
tics of the stego-image or using parameters of the underlying
steganographic algorithm, used for message embedding. The
nonparametric steganalysis is done by first estimating fea-
ture vector, from the test-image, that captures traces of the
message embedding process and then applying binary hy-
pothesis test controlled by a threshold. Performance of every
steganalysis system is measure in terms of the probability
of false-positive, Pf p, and the probability of false-negative,
Pfn . The proposed steganalysis system operates under the
additional constraint that only stego-image is available to
the steganalyst to a make decision.
The proposed steganalysis scheme exploits the fact that
message embedding using QIM introduces randomness in
the resulting stego-image. And, embedding an arbitrary
message in an already quantized-image (xqor xQIM ), also
introduces randomness in the resulting doubly-quantized im-
age, xQIM (2) which is a function of of the test-image. The
underlying density function that captures traces of embed-
ded message is estimated using KDE. The Tallis’s q-divergence
between density functions estimated from the test-image and
the corresponding doubly-quantized image is used for stego
detection. The proposed steganalysis system can be divided
into the following four processing stages:
1. Local Randomness Mask Estimation(Data Gen-
eration Stage)
2. Density Estimation (Data Reduction Stage)
3. Tsallis Divergence Calculation (Feature Extrac-
tion Stage)
4. Stego Detection Using q-Divergence (Stego De-
tection Stage
Semantic diagram of the proposed steganalysis scheme is
given in Fig. 10. Details of the each processing stage are
provided in the following sections.
3.1 Local Randomness Mask Estimation
This section provides details of estimating local-similarity
based randomness mask from the quantized-image. To esti-
mate the local-randomness mask, Rcx, from the test-image,
the test-image is segmented into non-overlapping blocks,
each of 8 ×8 pixels (here we assume that message is embed-
ding in DCT domain using 8 ×8 non-overlapping blocks).
Each block is transformed into DCT domain using the fol-
lowing 2D forward discrete cosine function,
xk1,k2=
1
4Gk1Gk2
7
P
n1=0
7
P
n2=0
sn1,n2cos ³πk1(2n1+1)
16 ´cos ³πk2(2n2+1)
16 ´,
k1, k2= 0,··· ,7
where
Gk1, Gk2=(1
√2if k1= 0, k2= 0
1 otherwise
The local-similarity mask value of the coefficient xk1,k2
in the jth block (e.g. x(j)
k1,k2), Rc(j)
k1,k2, is calculated based
on similarity of x(j)
k1,k2to corresponding coefficients in ν-
neighboring blocks. Let x(j)
NH (k1,k2,i), i = 1,· · · , ν denote the
corresponding coefficients of x(j)
k1,k2in ν-neighboring blocks.
The similarity value, C(j)
k1,k2, for coefficient x(j)
k1,k2is calcu-
lated as,
C(j)
k1,k2=1
k
ν
X
l=1
1[x(j)
k1,k2]³x(j)
NH (k1,k2,l)´(1)
l= 1,· · · , ν, and j= 1,· · · , n
and the corresponding local-randomness mask, Rc(j)
k1,k2, is
calculated as,
Rc(j)
k1,k2= 1 −C(j)
k1,k2(2)
where 1is an indicator function,n=bn1
8c×bn2
8c, and bxc
denotes the largest integer not exceeding x.
The Rc(j)
k1,k2is a nonnegative real valued random variable,
and 0 ≤Rc(j)
k1,k2≤1. Here Rc(j)
k1,k2= 0 implies maximum
local-similarity value for x(j)
k1,k2. In this case, all the neigh-
boring coefficients are quantized to the same value. Sim-
ilarly, Rc(j)
k1,k2= 1 implies minimum similarity that corre-
sponds to the case when all coefficients (the current coef-
ficient value and its neighbors) are quantized to νdistinct
values, where ν∈ {2,4,8}. To illustrate the notion of ran-
domness mask estimation based on ν-neighborhood using
Eq. (2); the randomness mask estimation for the selected
block (or block of interest (BOI)) using 8-neighborhood is
given in Fig. 5.
It has been observed that the estimated local-randomness
mask, Rcx, depends on the test-image characteristics, quan-
tization step-size used to quantized it, and the distribution
of the hidden message, M, (in case of QIM-stego). Ac-
cording to Eq. (2), a rich-texture image would yield higher
randomness mask values than a low-texture image for same
quantization parameters. Similarly, a quantized-image gen-
erated using smaller ∆ would yield randomness mask with
higher mean than the corresponding quantized-image gen-
erated using larger ∆. This is because, a larger ∆ tends
Figure 4: Schematic diagram of the steganalysis scheme used to attack QIM steganography
4 0 0
14 6 0
20 14 8
4 0 0
12 6 2
20 10 4
4 0 0
18 8 0
20 12 4
6 0 0
16 6 0
20 16 6
4 0 0
14 6 0
20 12 4
Block of Interest
(BOI)
Rcx Estimation using 8-neighborhood correlation
8-Neighborhood
Blocks
6 0 0
18 8 0
20 14 6
4 2 0
12 4 0
20 14 4
4 0 0
12 4 0
20 10 8
6 2 0
16 6 0
20 16 6
0.33 0.22 0
0.78 0.44 0.11
0 0.89 0.56
Rcx
X
Figure 5: Randomness mask estimation for the se-
lected block based on 8-neighborhood
to map neighboring coefficients to fewer distinct quantized
values than a smaller ∆. As discussed earlier, that QIM-
stego exhibits higher level of randomness than the corre-
sponding quantized-cover. Therefore, it is reasonable to ex-
pect that Rcxestimated from the QIM-stego image would
have a higher mean value than the randomness mask esti-
mated from the corresponding quantized-cover, where both
quantized-images are obtained using same ∆ (see Fig. 3).
3.2 Density Estimation
This is data reduction stage, in this processing stage large
dataset, i.e. estimated local-randomness mask, Rcx, is pro-
cessed to extract a feature vector that can be used for stego
detection. The kernel density estimation (KDE) technique is
used for data reduction. The KDE technique is used to esti-
mate the underlying density function from Rcx. To estimate
the underlying density from Rcxusing KDE. To achieve this
goal, the Rcxis first mapped to one-dimensional (1D) se-
quence. The resulting 1D sequence is then used to estimate
the underlying density, ˆ
fx(x). The KDE package down-
loaded from [6] was used for KDE. The KDE package down-
loaded from [6] supports all known kernels such as Gaus-
sian, logistic, Laplacian, Epanechnikov, etc. The proposed
scheme however uses the Gaussian kennel, i.e. Kg(x) =
1
√2πexp(−1
2u2) for the simulation results present in this pa-
per. Selection of the kernel function, Kg(x), for density esti-
mation, is motivated by the fact that for a given bandwidth,
h, the Gaussian kernel yields optimally smooth density es-
timate, ˆ
fx(x).
This fact is illustrated in Fig. 7 where plots of the esti-
mated densities for the quantized Girl image (see Fig. 6),
using Gaussian, logistic, Laplacial, and Epanechnikov ker-
nels each with bandwidth h= 0.1. It can be observed from
Fig. 7 that the Gaussian kernel yields an optimally smooth
density estimate.
Girl Image
0 20 40 60 80 100 120
0
20
40
60
80
100
120
Figure 6: Girl image
It is important to mention that estimated density from
Rcx,ˆ
fRcx(x;h), plotted in Fig. 7 has nonzero value for x < 0
though 0 ≤x≤1. Nonzero estimated density values out-
side support of the data. e.g. [0,1] can be attributed to the
kernel, K(x), and the bandwidth, h, used for density estima-
tion. As kernels used for density estimation (e.g., Gaussian,
logistic, Laplacial, and Epanechnikov) plotted in Fig. 7 are
continuous and nonzero for (−∞,∞) and spread of these
kernels is controlled by the bandwidth hused. During den-
sity estimation, the kernel estimators smooth-out the contri-
bution of each observed data point over local-neighborhood
of that the data point. Therefore, though the underlying
data has support over [−R1, R2] but the estimated density
using such kernels might have support larger than the sup-
port of the underlying data. The amount of density leakage
depends on the frequency of data points at the boundaries
of the data range, shape of the kernel used, and the band-
width. To obtain density estimate which zero outside the
support of the data one need to use smaller bandwidth h
and sharply decaying kernels such as uniform kernel. How-
−0.2 0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
Estimated Density for coef. # 30 (3,6) using Varionus Kernels
Gaussian
Logistic
Laplacian
Epanechenikov
Figure 7: Estimated density from sequence x(30)
nof
Rcxof quantized Girl Image using Gaussian, logistic,
Laplacial, and Epanechnikov kernels with h= 0.1
ever, for such kernels, resulting density function would not
be optimally smooth. We have observed that such density
leakage of the estimated density outside [0, 1] does not de-
grade detection performance of the proposed steganalysis
scheme. Therefore, to obtained an optimally smooth den-
sity estimate from the underlying data, e.g. Rcx, we shall
assume that the Gaussian kernel is used for density estima-
tion with bandwidth h= 0.1 for the rest of the paper.
3.3 Tsallis Divergence Calculation
With some exceptions, almost every steganographic tech-
nique introduces statistical and/or perceptual irregularities
in the resulting stego-image. In QIM data hiding, these
irregularities manifest themselves as spatial randomness in
the quantized coefficients. To capture traces of message em-
bedding, the proposed scheme uses local-randomness-based
density function estimated form both the test-image and its
doubly-quantized version, xQIM (2) . The proposed scheme
then uses relative statistical variation in the underlying den-
sity functions estimated from the test-image, xt, (which is
either xqor xQIM ) and the corresponding doubly-quantized
image, xQIM (2) for QIM-stego detection. To capture these
statistical variations effectively, Tsallis divergence [20] of ˆ
fxtx
from ˆ
fxQIM (2) xis used. The Tsallis-divergence (also known
as q-divergence ) is one-parameter generalization of Kullback-
Liebler (KL) divergence [10]. Brief overview of Tsallis statis-
tics is provided in the following Section.
3.3.1 Tsallis Statistics
Consider probability distribution pof a discrete random
variable, x, the Tsallis entropy is defined as,
Sq(p) = (q−1)−1
N
X
i=1
pi(1 −pq−1
i) (3)
where qis called nonextensive index.
The Tsallis entropy is one-parameter generalization of Shan-
non’s entropy[10], that is,
lim
q→1Sq(p) = −1
N
X
i=1
piln(pi) = S1(p) (4)
The Sq(p) is concave for q > 0, it takes zero value for abso-
lute certainty, and it increases monotonously with increasing
complexity in time-series. The Tsallis entropy can be used
for entropic nonextensivity as a measure of time-series com-
plexity.
The Tsallis relative entropy (or q-divergence ) between two
probability distributions pand rover a discrete random vari-
able xis defined as,
Dq(pkr) = (q−1)−1
N
X
i=1
piÃ1−µpi
ri¶q−1!(5)
The q-divergence is one-parameter generalization of KL -
divergence [10], that is,
lim
q→1Dq(pkr) =
N
X
i=1
piln µpi
ri¶=Dq=1 (pkr) (6)
For q > 0Dq(pkr) is convex and positive.
The proposed scheme uses q-divergence between estimated
density functions, ˆ
fxQIM (2) (x) and ˆ
fxt(x) to distinguish be-
tween the cover and the stego. The motivation behind using
q-divergence instead of KL-divergence is that q-divergence
is a parametric divergence, that is, it allows user to select a
particular value of qto calculate corresponding distance be-
tween a given pair of distributions over a random variable.
Moreover, KL-divergence is a special case of q-divergence,
i.e., Dq=1 =DKL . Therefore, to analyze divergence be-
tween a given pair of distributions thoroughly, a set consists
of different values of qcan be used. To illustrate this fact fur-
ther, q-divergence between ˆ
fxq, & ˆ
fxQIM (2) , and 2) ˆ
fxQIM ,
&ˆ
fxQIM (2) for q={1.0,· · · ,5.0}is given in Fig. 8.
1 1.25 1.5 1.75 2.0 2.25 2.5 3.0 3.5 4.0 4.5 5.0
10−3
10−2
10−1
100
101
102
q
Dq
Dq ( fxq || fxQIM(2) )
Dq ( fxQIM || fxQIM(2) )
Figure 8: Calculated Tsallis divergence, Dq, between
1) ˆ
fxq, & ˆ
fxQIM (2) (thin line), and 2) ˆ
fxQIM , & ˆ
fxQIM (2)
(thick line), from gray scale Lenna Image
It can be observed from Fig. 8 that {D}5.0
q=1.0=Dq(· k ·),
varies differently for the same range of q, that is, {D}5.0
q=1.0=
Dq³ˆ
fxqkˆ
fxQIM (2) ´increases as qincreases; whereas D=
Dq³ˆ
fxQIM kˆ
fxQIM (2) ´decreases with increasing q. Whereas,
in case of KL-divergence, this would result only single-point
for each pair, i.e.
Dq=1 ³ˆ
fxqkˆ
fxQIM (2) ´and Dq=1 ³ˆ
fxQIM kˆ
fxQIM (2) ´. There-
fore, classification based on KL-divergence would be highly
sensitive to the predefined threshold.
The q-divergence based feature vector, on the other hand,
consists of |q|points, where | · | is cardinality of q, hence
results improve classification performance. The proposed
scheme uses gradient of the {D}5.0
q=1.0vector for QIM-stego
detection.
3.4 Stego Detection using q-Divergence
The stego-detection stage uses q-divergence vector, {D}5.0
q=1.0
between ˆ
fxQIM (2) (x) and ˆ
fxt(x) to distinguish between the
cover and the stego. Plots of the calculated q-divergence vec-
tor between ˆ
fxt(x) and ˆ
fxQIM (2) (x) are given in Fig. 8. Here
think-line gives plot of {D}5.0
q=1.0=Dq³ˆ
fxQIM kˆ
fxQIM (2) ´,
and thin-line gives plot of {D}5.0
q=1.0=Dq³ˆ
fxqkˆ
fxQIM (2) ´.
It can be observed from Fig. 8 that, in case of xt=
xQIM ,{D}q=5.0
q=1.0between ˆ
fxQIM (2) (x) and ˆ
fxt(x) decreases
monotonously for increasing qand approaches zero. Whereas,
in case of xt=xq,{D}5.0
q=1.0first increases slowly and
then start increasing at a higher rate. Classification stage
of the proposed steganalysis scheme uses gradient of the
{D}5.0
q=1.0=Dq³ˆ
fxtkˆ
fxQIM (2) ´to distinguish between the
quantized-cover and the QIM-stego.
3.5 Summary
Here we summarize the proposed QIM-stego detection
method.
1. Generate doubly-quantized-image, xQIM (2) , by embed-
ding an arbitrary message, M0, in the test-image, xt,
using QIM with parameter ∆ estimated form the test-
image.
2. Estimate local-randomness mask, using method dis-
cussed in Section 3.1, from both xtand xQIM (2) .
3. Estimate density function for Rcxtand RcxQIM (2) us-
ing KDE.
4. Calculate q-divergence between ˆ
fxtand ˆ
fxQIM (2) ,
{D}5.0
q=1.0=Dq³ˆ
fxtkˆ
fxQIM (2) ´
where q={1.0,··· ,5.0}.
5. Gradient of {D}5.0
q=1.0is used to distinguish between
the quantized-cover and the QIM-stego. Following bi-
nary hypothesis test is used for stego detection:
xt=xQIM if Di≥Di+1 ,∨i(7)
xt=xqotherwise
4. EXPERIMENTAL RESULTS
This section provides detection performance in terms of
the probability of false positive, Pfp , and the probability of
false negative, Pfn . The following sections will provide per-
formance evaluation of the proposed scheme to detect QIM
steganography in DCT domain for gray scale images and
to detect JPEG-stego images obtained using JSteg stegano-
graphic tool [4].
4.1 QIM Steganography in DCT Domain
The performance of the proposed steganalysis scheme is
evaluated for uncompressed color image database (UCID)
[5]. The UCID downloaded from [5] contains around 1383
natural images. Simulation results presented here are based
on first 1000 of the UICD [5]. These 1000 images were
resize to 256 ×256 pixels each and transformed to gray
scale for message embedding using QIM in DCT domain.
Eight thousand quantized images (4000 quantized-cover and
4000 QIM-stego ) were generated by quantizing 1000 nat-
ural images using uniform quantizer with parameter ∆ =
{1.0,2.0,4.0,5.0}. Each QIM-stego image was generated by
embedding 7 KB (kilo bytes) random binary message with
equally probable message symbols. These eight thousand
quantized images were then tested using the proposed ste-
ganalysis scheme.
During detection phase, for each test-image, xt, a doubly-
quantized image, xQIM (2) was generated first by embed-
ding an arbitrary message M0using QIM with ∆. Local-
randomness masks were estimated from both xtand xQIM (2) .
Estimated randomness masks were used to estimate the un-
derlying density functions, ˆ
fxt(x) and ˆ
fxQIM (2) , using KDE.
The q-divergence between ˆ
fxt(x) and ˆ
fxQIM (2) is calculate
using Eq. (5) for q={1.0,··· ,5.0}. The resulting q-
divergence vector, {D}5.0
q=1.0, is used to determine whether
the test-image is a quantized-cover or a QIM-stego. The
detection stage uses detection rule given in Eq. (8). Simula-
tion results for these experimental setting are given in Table
1.
Table 1: Detection performance as function of qual-
ity factor ∆
Quantization Step-Size ∆
1 2 4 5
Pfp 0.079 0.061 0.052 0.0.048
Pfn 0.098 0.062 0.05 0.042
Simulation results listed in Table 1 show that the proposed
steganalysis scheme has fairly low average false rates (e.g.
Pfp <0.1 & Pf n <0.08) for quantization parameter ∆ ∈
{1.0,2.0,4.0,5.0}.
4.2 Attacking JSteg
The proposed steganalysis scheme was also applied to at-
tack JSteg [4] steganographic tool, a freeware that hides mes-
sages in baseline JPEG compressed images. JSteg stegano-
graphic tool embeds messages in the GIF image by replac-
ing the LSB of the quantized run-length coded DCT coef-
ficients with the secret message, during JPEG compression
process. To attack JSteg, quantized AC coefficients (after
run-length coding) of the JPEG test-image are used to esti-
mate the local-randomness mask, Rcx. Since for most JPEG
compressed images only low- and mid-frequency coefficients
survive for the entropy coding stage, the local-randomness
mask is therefore estimated from these run-length coded co-
efficients. The underlying density estimated from Rcxis
used to distinguish between the JPEG-cover (generated us-
ing baseline JPEG compression [18]) and the JEPG-stego
(generated using JSteg steganographic tool).
To illustrate the effect of message embedding on local-
randomness mask in the JPEG-stego image, generated us-
ing JSteg steganographic tool (available at [4]). A random
message of 7 KB length was embedded using quality factor
Q= 100 in the gray scale Lenna image of size 512 ×512
pixels. Estimated density functions ˆ
fxJP EG and ˆ
fxJsteg from
the JPEG-cover and the JPEG-stego respectively are shown
in Fig. 9.
−0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0
0.5
1
1.5
2
2.5
3
3.5 Estimated Density
Rcx
fx(x)
fXJPEG(x)
fXJsteg(x)
Figure 9: Estimated density plots from the JPEG-
stego image (doted line) obtained using JSteg tool
and the JPEG-cover image (solid line)
It can be observed from Fig. 9 that ˆ
fxJP EG exhibits
peak around zero, ˆ
fxJsteg has its peak between 0.2 and 0.5.
Change in the first-order statistics of the local-randomness-
based density function can be attributed to randomness in
the hidden message M. It is important to mention that sta-
tistical variations in the underlying local-randomness due to
message embedding are function the hidden message length,
which depends on the cover-image characteristics, quality
factor used for compression, and the size of the cover-image.
We have observed through extensive simulations that ˆ
fxJP EG (x)
and ˆ
fxJsteg (x) are hard to differentiate, especially when the
JPEG-stego image is carrying small hidden message. For ex-
ample, low-texture images of size less than 128 ×128 pixels
when compressed using quality factor, Q≤75 would carry
very small message size, e.g. message size less than 4K B.
This fact is illustrated in Fig. 11, here ˆ
fxJP EG (x) is esti-
mated from Fruits JPEG-image of 128x128 pixels (see Fig.
10), obtained using Q= 50, and ˆ
fxJsteg (x) the correspond-
ing JPEG-stego image obtained using JSteg steganographic
tool [4]. It can be observed from Fig. 11 that ˆ
fxJP EG (x)
and ˆ
fxJsteg (x) are very close.
The proposed steganalysis scheme discussed in Section 3
can also be used to detect JPEG-stego image, carrying small
hidden message, obtained using JSteg with 50 ≤Q≤75. To
detect JPEG-stego image obtained using JSteg[4] following
processing steps were used:
1. The test-image (in JPEG format), xJ P EG(t), is re–
compressed to generate JSteg(2) image, xJsteg(2) , by
embedding an arbitrary message M0using JSteg. The
recompression stage involves, decompression of the JPEG
test-image followed by JPEG compression using JSteg
[4].
2. The local-randomness mask, Rcx, is then estimated for
both the test-image, xJ P EG(t), and the corresponding
JSteg(2) image, xJsteg(2) .
Figure 10: Fruits image
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.05
.1
.15
.2
.25
Rcx
fx(x)
fXJPEG
fXJsteg
Figure 11: Estimated density plots from the JPEG-
cover image obtained with Q= 50 and the JPEG-
stego image obtained by embedding 3KB message
using JSteg tool with Q= 50
3. Estimated randomness masks from xJ P EG(t)and xJsteg (2) ,
e.g. RcxJP E G(t)and RcxJsteg(2) , are used to estimate
the underlying density functions using KDE.
4. The q-divergence between ˆ
fxJP EG(t)(x) and ˆ
fxJsteg(2) (x),
e.g., Dq³ˆ
fxJP EG(t)kˆ
fxJsteg(2) ´is used to distinguish
between the JPEG-cover and the JPEG-stego image
generated using JSteg [4].
Schematic diagram of the steganalysis scheme use to distin-
guish between the JPEG-cover and the JPEG-stego image
is given in Fig. 12.
Justification of re-embedding using JSteg in the test-image
can be supported using same arguments given in Section 2.1,
that is, embedding M0in the test-image, xJ P EG(t), using
JSteg steganographic tool [4] would introduce smaller sta-
tistical disturbance in the estimated density, ˆ
fxJsteg(2) (x),
if xJP E G(t)=xJsteg . Whereas, embedding same M0in
xJP E G(t)(using JSteg) would introduce larger statistical dis-
turbance in ˆ
fxJsteg(2) (x), when xJ P EG(t)=xJ P EG . There-
fore,
Dq=1 ³ˆ
fxJP EG kˆ
fxJsteg(2) ´< Dq=1 ³ˆ
fxJsteg kˆ
fxJsteg(2) ´,
can be used to distinguish between the JPEG-cover and the
JPEG-stego. This claim is supported in 13 where plots of
Figure 12: Block diagram of the steganalysis scheme used to attack JSteg steganographic tool
density functions estimated from the JPEG-cover, the cor-
responding JPEG-stego, and JSteg(2) images.
−0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.05
0.1
0.15
0.2
0.25
Rcx
fx(x)
fXJPEG
fXJsteg
fXJsteg(2)
Figure 13: Estimated density plots from the JPEG
cover (obtained with Q= 50). JPEG-stego and
JPEG-stego(2) both obtained by embedding 3KB
message using JSteg tool with Q= 50
It can be observed from Fig. 13 that ˆ
fxJsteg and ˆ
fxJsteg(2)
have smaller relative entropy than ˆ
fxJP EG and ˆ
fxJsteg(2) .
More specifically, Dq=1 ³ˆ
fxJsteg kˆ
fxJsteg(2) ´= 0.17 bits /
sample (bps); whereas, Dq=1 ³ˆ
fxJP EG kˆ
fxJsteg(2) ´= 0.85
bps. To improve detection performance based on divergence
between a pair of distributions further, a parameterized ver-
sion of KL-divergence, i.e. q-divergence, can be used. For
example, q-divergence between 1) ˆ
fxJP EG , & ˆ
fxJsteg(2) , and
2) ˆ
fxJsteg , & ˆ
fxJsteg(2) for q={1.0,· · · ,5.0}is given in Fig.
14.
It can be observed from Fig. 14 that for quality factor
Q={50,75,100}, the q-divergence {D}5.0
q=1.0between den-
sity functions estimated from xJP EG and xJsteg 2decreases
monotonously for ∨q; whereas, {D}5.0
q=1.0between density
functions estimated from xJsteg and xJ steg2also decreases
monotonously but at a faster rate than the corresponding
xJP E G, for ∨q. Therefore, gradient of q-divergence be-
tween the estimated density functions from the test-image
and the corresponding JSteg(2) images can be used to distin-
guish between the JPEG-cover and the JPEG-stego image.
4.2.1 Experimental Results
Detection performance of the proposed steganalysis scheme
to attach JSteg steganographic tool [4] is evaluated for a
dataset consisting of 3000 JPEG images. First 500 images
of the UCID [5] were used to generate these 3000 test-images.
The dataset used for performance evaluation of the proposed
steganalysis scheme consists of 1500 JPEG-cover images and
1500 JPEG-stego images. The JPEG-cover images were
generated by compressing 500 uncompressed images using
baseline JPEG [18] with quality factor Q={100,75,50}.
Whereas, 1500 JPEG-stego images were generated by em-
bedding 1500 random messages using JSteg steganographic
tool [4] with Q={100,75,50}.
During detection phase, each test-image, xt, (xJP E G or
xJsteg ) was processed to generate the corresponding doubly-
compressed JSteg(2) image using JSteg steganographic tool
[4]. The xtand the xJ steg(2) were used to estimate local-
randomness-based density functions e.g. ˆ
fxtand ˆ
fxJsteg(2) .
The q-divergence between ˆ
fxtand ˆ
fxJsteg(2) ,{D}5.0
q=1.0=
Dq³ˆ
fxtkˆ
fxJsteg(2) ´, was calculated. Calculated {D}5.0
q=1.0
was then used to distinguish between the JPEG-cover and
the JPEG-stego. Gradient of the calculated q-divergence
vector, {D}5.0
q=1.0was used to distinguish between the JPEG-
cover and the JPEG-stego. Detection performance of the
proposed steganalysis scheme to attack JSteg steganographic
tool is given in Table 2.
Table 2: Detection performance
Quality Factor Q
Q= 50 Q= 75 Q= 100
Pfp 0.07 0.05 0.01
Pfn 0.021 0.012 0.01
Experimental results presented in Table 2 show that the
proposed framework to attack JSteg steganographic tool [4]
can distinguish between the JPEG-cover and the JPEG-
stego (generated using JSteg steganographic tool) with low
false rates. Detection performance of the proposed frame-
work is fairly consistent for quality factor, Q={100,75,50},
which implies that detection performance does not deterio-
rate significantly even with decreasing message length.
1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.5 4 4.5 5
10−5
10−4
10−3
10−2
10−1 q−divergence for Quality Factor Q = 50
q
Dq
Dq ( fxJPEG || fxJsteg(2) )
Dq ( fxJsteg || fxJsteg(2) )
1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.5 4 4.5 5
10−5
10−4
10−3
10−2
10−1 q−divergence for Quality Factor Q = 75
q
Dq
Dq ( fxJPEG || fxJsteg(2) )
Dq ( fxJsteg || fxJsteg(2) )
1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.5 4 4.5 5
10−6
10−5
10−4
10−3
10−2
10−1 q−divergence for Quality Factor Q = 100
q
Dq
Dq ( fxJPEG || fxJsteg(2) )
Dq ( fxJsteg || fxJsteg(2) )
Figure 14: Calculated Tsallis divergence, {D}5.0
q=1.0, between 1) ˆ
fxJP EG , & ˆ
fxJsteg(2) (thin line), and 2) ˆ
fxJsteg ,
&ˆ
fxJsteg(2) (thick line), for quality factor Q={50,75,100}
5. CONCLUSION
This paper presents steganalysis scheme for QIM steganog-
raphy and JSteg steganographic tool [4]. The proposed
steganalysis scheme is non-learning based therefore can ad-
dress limitations of learning-based steganalysis schemes. We
have shown that QIM-stego exhibits higher level of irreg-
ularity than the corresponding quantized-cover. In addi-
tion, re-embedding in the quantized-image introduces dis-
turbance in the local-randomness-based density function of
the resulting doubly-quantized (or doubly-compressed) im-
age. The proposed steganalysis scheme uses q-divergence be-
tween the estimated density functions from the test-image
and its corresponding doubly-quantized version to distin-
guish between the quantized-cover and the QIM-stego. Sim-
ulation results show that the proposed steganalysis scheme
can detect the QIM-stego with low false negative rate. The
proposed steganalysis framework can also be used to attack
JSteg steganographic tool [4]. Experimental results to evalu-
ated performance of the proposed steganalysis scheme shows
that it can also successfully distinguish between the JPEG
cover and the JPEG stego obtained with quality factor as
low as 50 with low false rates.
6. REFERENCES
[1] http://en.wikipedia.org/wiki/Zero-Day Attack.
[2] A. latham: Jp hide and seek. available at
http://linux01.gwdg.de/ alatham/stego.html.
[3] Fabien petitcolas: Mp3stego. available at
http://www.petitcolas.net/fabien/steganography
/mp3stego/index.html.
[4] J. korejwa: Jsteg. available at
ftp://ftp.funet.fi/pub/crypt/steganography/.
[5] Ucid: An uncompressed colour image database.
available at http://www-
users.aston.ac.uk/ schaefeg/datasets/UCID/ucid.html.
[6] Wafo: Wave analysis for fatigue and oceanography.
available at http://www.maths.lth.se/matstat/wafo/.
[7] R. Chandramouli and K. Subbalakshmi. Current
trends in steganalysis: A critical survey. In IEEE Int.
Conf. on Control, Automation, Robotics and Vision,
ICARCV, volume 2, pages 964–967, December 2004.
[8] B. Chen and G. Wornell. Quantization index
modulation: A class of provably good methods for
digital watermarking and information embedding.
IEEE Trans. Information Theory, 47(4), May 2001.
[9] M. Costa. Writing on dirty paper. IEEE Transactions
on Information Theory, 29(3):439–441, May 1983.
[10] T. M. Cover and J. A. Thomas. Elements of
Information Theory. John Wiley & Sons, 1991.
[11] J. Eggers and B. Girod. Informed Watermarking.
Kluwer Academic Publisher, 2002.
[12] J. Fridrich and M. Goljan. Digital image
steganography using stochastic modeling. In
IS&T/SPIE: Security and Watermarking of
Multimedia Content V, volume 5020, pages 191–202,
San Jose, CA, January 2003.
[13] P. Guillon, T. Furon, and P. Duhamel. Applied
public-key steganography. In Proc. IS&T/SPIE, pages
38–49, 2002.
[14] H. Malik, K. P. Subbalakshmi, and R. Chandramouli.
Steganalysis of qim-based data hiding using kernel
density estimation. In 9th Workshop on Multimedia &
Security 2007 (MM&Sec 2007), Dallas, TX,
September 2007.
[15] H. Malik, K. P. Subbalakshmi, and R. Chandramouli.
Nonparametric steganalysis of qim-based
steganography using approsimante entropy. In
IS&T/SPIE: Security, Steganography, and
Watermarking of Multimedia Content X, vol. 6819,
San Jose, CA, January 2008.
[16] F. Perez-Gonzalez, F. Balado, and J. R. Hernandez.
Performance analysis of existing and new methods for
data hiding with known-host information in additive
channels. IEEE Transaction on Signal Processing,
51(4), April 2003.
[17] P. Sallee. Model-based steganography. In 6th Int.
Workshop on Digital Watermarking, volume 3929 of
LNCS, pages 154–167. Springer Berlin / Heidelberg,
2003.
[18] K. Sayood. Introduction to Data Compression. Morgan
Kaufmann, 2nd edition, 2000.
[19] K. Sullivan, Z. Bi, U. Madhow, S. Chandrasekaran,
and B. Manjunath. Steganalysis of quantization index
modulation data hiding. In IEEE Int. Conf. Image
Processing (ICIP), volume 2, pages 1165–1168, 2004.
[20] C. Tsallis. Possible generalization of boltzmann-gibbs
statistics. Journal of Statistical Physics, 52(1-2), July
1988.