Content uploaded by Hong Zhao
Author content
All content in this area was uploaded by Hong Zhao on Jan 07, 2016
Content may be subject to copyright.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4835
[16] P. Stoica and A. Nehorai, “Performance analysis of an adaptive notch
filter with constrained poles and zeros,” IEEE Trans. Acoust., Speech,
Signal Process., vol. ASSP-36, pp. 911–919, Jun. 1988.
[17] J. F. Chicharo and T. S. Ng, “Gradient-based adaptive IIR notch fil-
tering for frequency estimation,” IEEE Trans. Acoust., Speech, Signal
Process., vol. 38, pp. 769–777, May 1990.
[18] B. S. Chen, T. Y. Yang, and B. H. Lin, “Adaptive notch filter by direct
frequency estimation,” Signal Process., vol. 27, pp. 161–176, 1992.
[19] D. A. Johns, W. M. Snelgrove, and A. S. Sedra, “Adaptive recursive
state-space filters using a gradient based algorithm,” IEEE Trans. Cir-
cuits Syst., vol. 37, pp. 673–684, Jun. 1990.
[20] M. Nayeri and W. K. Jenkins, “Alternate realizations to adaptive IIR
filters and properties of their performance surfaces,” IEEE Trans. Cir-
cuits Syst., vol. 36, pp. 485–496, Apr. 1989.
A Novel Synchronization Invariant Audio Watermarking
Scheme Based on DWT and DCT
Xiang-Yang Wang and Hong Zhao
Abstract—Synchronization attack is one of the key issues of digital audio
watermarking. In this correspondence, a blind digital audio watermarking
scheme against synchronization attack using adaptive quantization is pro-
posed. The features of the proposed scheme are as follows: 1) a kind of more
steady synchronization code and a new embedded strategy are adopted
to resist the synchronization attack more effectively; 2) he multiresolution
characteristics of discrete wavelet transform (DWT) and the energy-com-
pression characteristics of discrete cosine transform (DCT) are combined
to improve the transparency of digital watermark; 3) the watermark is em-
bedded into the low frequency components by adaptive quantization ac-
cording to human auditory masking; and 4) the scheme can extract the wa-
termark without the help of the original digital audio signal. Experiment
results shows that the proposed watermarking scheme is inaudible and ro-
bust against various signal processing such as noise adding, resampling,
requantization, random cropping, and MPEG-1 Layer III (MP3) compres-
sion.
Index Terms—Audio watermarking, discrete cosine transform (DCT),
discrete wavelet transform (DWT), synchronization.
I. I
NTRODUCTION
With the rapid development of the network (especially the Internet)
and multimedia technique, the protection of intellectual property rights
has been the key problem which we must solve. Under this background,
digital watermarking has received a large deal of attention recently
and has been a focus in network information security [1]. Digital wa-
termarking can be classified into image watermarking, video water-
marking and audio watermarking according to the range of applica-
tion. The current digital watermarking schemes mainly focus on image
and video copyright protection. A few audio watermarking techniques
Manuscript received January 11, 2006; revised January 26, 2006. This work
was supported in part by the Natural Science Foundation of Liaoning Province
of China by Grant 20032100 and by the Open foundation of State Key Labo-
ratory of Information Security of China by Grant 03-02. The associate editor
coordinating the review of this manuscript and approving it for publication was
Prof. Zixiang Xiong.
X.-Y. Wang is with the School of Computer and Information Technique,
Liaoning Normal University, Dalian 116029, China. He is also with the State
Key Laboratory of Information Security, Institute of Software of Chinese
Academy of Sciences, Beijing 100039, China (e-mail: wxy37@263.net).
H. Zhao is with the School of Computer and Information Technique, Liaoning
Normal University, Dalian 116029, China (e-mail: fhq_xa@126.com).
Digital Object Identifier 10.1109/TSP.2006.881258
have been reported [2]. Especially, it is hard to find the robust audio
watermarking algorithms which can resist the synchronization attack
effectively [3]–[5].
Synchronization attack does not refer to get rid of watermark infor-
mation from the watermarked signal, but refer to change the embedding
position so that the detector cannot detect the right watermark. Up to
now, four robust audio watermarking strategies are adopted to resist
the synchronization attack. They are: all-list-search [2], combination
of spread spectrum and spread spectrum code [6], [7], utilizing the im-
portant feature of origin digital audio [8], [9] (or we call it self-syn-
chronization strategy), and synchronization code [10]–[12]. Among
them, All-list-search strategy need great calculating amount and has
high false positive rate; the second strategy cannot achieve blind detec-
tion. Kirovski
et al. [7] proposed several novel mechanisms for effec-
tive encoding and detection of direct-sequence spread-spectrum wa-
termarks in audio signals, the presented method embed HAS shaped
watermark into modulated complex lapped transform (MCLT) coeffi-
cients; the current self-synchronization algorithm cannot extract fea-
ture points steadily, besides, it usually need large number of threshold
values which make it more difficult to be applied. By contrast, syn-
chronization code strategy has more obvious technological advantages.
Kim et al. [10] proposed a robust audio watermarking strategy using
common binary sequence as synchronization, but the relatively poor
periodic and aperiodic relativity of common binary sequence weakens
the ability of resisting synchronization attack. Barker code has better
self-relativity, so Wang et al. [11] and Huang et al. [12] chooses it as
synchronization mark and embeds it into temporal domain, then, em-
beds the watermark information into DCT domain. It can resist syn-
chronization attack effectively. But it has such defects as follows: 1) it
chooses a 12-bit Barker code which is so short that it is easy to cause
false synchronization; 2) it only embeds the synchronization code by
modifying individual sample value, which reduces the resisting ability
greatly (especially against resampling and MP3 compression); 3) it
does not make full use of human auditory masking effect.
Taking the problems aforementioned into consideration, we intro-
duce a DWT- and DCT-based digital audio blind watermarking al-
gorithm that can resist synchronization attack effectively. We choose
16-bit Barker code as synchronization mark, and embed it by modi-
fying the mean value of several samples. Besides, in order to make full
use of auditory masking effect, we embed the watermark information
into DWT and DCT domain.
This correspondnece proceeds as follows: Section II describes the
basic principle of the proposed algorithm and synchronization code.
Section III introduces the proposed embedding algorithm for synchro-
nization code and watermark information. In Section IV, the detection
procedure is provided. The simulated experimental results and conclu-
sions are given in Section V.
II. F
UNDAMENTAL THEORY AND
SYNCHRONIZATION
A. Fundamental Theory
In our audio watermarking scheme, the watermark can be embedded
into the host audio by three steps. First, the origin digital audio is seg-
mented and then each segment is cut into two sections. Second, with
the spatial watermarking technique, synchronization code is embedded
into the first section. Finally, the DWT and DCT are performed on the
second section, and then the watermark is embedded into the low fre-
quency components by quantization. The construction of embedding
information is shown in Fig. 1, and a diagram of our audio water-
marking technique is shown in Fig. 2.
1053-587X/$20.00 © 2006 IEEE
4836 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
Fig. 1. Construction of embedding information.
Fig. 2. Watermark embedding scheme.
B. Synchronization Code
Synchronization is one of the key issues of audio watermarking.
Watermark detection starts by alignment of watermarked block with
detector. Losing synchronization causes false detection. Time-scale or
frequency-scale modification makes the detector lose synchronization.
So we need exact synchronization algorithms based on robust synchro-
nization code.
Generally, we should avoid false synchronization during selecting
synchronization code. Several reasons contribute to false synchroniza-
tion:
the style of the synchronization code, the length of syn-
chronization code,
the probability of “0” and “1” in synchronization
code. Among of them, the length of synchronization code is especially
important. The longer it is, the more robust it is.
The proposed scheme embeds Barker code in front of the watermark
to locate the position where watermark is embedded. Barker codes,
which are subsets of PN sequences, are commonly used for frame
synchronization in digital communication systems. Barker codes have
low correlation sidelobes. A correlation sidelobe is the correlation of a
codeword with a time-shifted version of itself. The correlation sidelobe
for a -symbol shift of an -bit code sequence is given by
where is an individual code symbol taking values or for
, and the adjacent symbols are assumed to be zero.
III. W
ATERMARK
EMBEDDING SCHEME
In order to guarantee robustness and transparency of watermarking,
the proposed scheme embeds synchronization code in the mean value
of several samples.
Let
Length represent a host digital audio
signal with Length samples.
is a binary image to be embedded within the host audio signal, and
is the pixel value at
.
is a synchronization code with
bits, where
.
The main steps of the embedding procedure developed can be described
as follows.
A. Preprocessing
In order to dispel the pixel space relationship of the binary watermark
image, and improve the security performance of the whole digital wa-
termark system, watermark scrambling algorithm is used at first. In our
watermark embedding scheme, the binary watermark image is scram-
bled from
to
by using Arnold transform.
To improve the robustness of proposed scheme against cropping and
make the detector available when it loses synchronization, audio seg-
menting is used at first, and then, synchronization code and watermark
are embedded into each segment. Let
denotes each segment, and
is cut into two sections
and with
and
samples, respec-
tively. Synchronization code and watermark are embedded into
and
, respectively.
B. Synchronization Code Embedding
The proposed watermark embedding method proceeds as follows.
1) The audio segment
is cut into
audio segments, and each
audio segment
having samples, where
2) Calculating the mean value of ,that is
3) The synchronization code can be embedded into each
by quantizing the mean value
, the rule is given by
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4837
where is original sample,
is modified sample, and
if
if
where returns the remainder of the division of
by
, and
is the quantization step.
C. Watermark Embedding
1) DWT: For each audio segment
,
-level DWT is
performed, and we get the wavelet coefficients of
, where is the coarse
signal and the detail signals are
.
2) DCT:To take the advantage of low frequency coefficient which has
a higher energy value and robustness against various signal pro-
cessing, the DCT is only performed on low frequency coefficient
3) Watermark Embedding: In order to guarantee robustness and
transparency of watermark, the proposed scheme embeds wa-
termark signal bit in the magnitude of the DCT-coefficient by
quantization .The proposed method embeds all watermark bits in
every segment of audio signal. The quantization function is given
as follows:
if
if
where ,
, and is the quantization step,
and
4) Inverse DCT: The Inverse DCT is performed on low frequency
coefficient
as follows:
5) Inverse DWT: After substituting the coefficients with ,
-level Inverse DWT is performed, and then the watermarked
digital audio signal is
.
Fig. 3. The origin digital audio signals and watermark.
D. Repeat Embedding
In order to improve the robustness against cropping, the proposed
scheme repeats Sections III-B and III-C sections to embed synchro-
nization code and watermark into every audio segments.
In our experiments, the length of watermark embedding are fixed as
.
IV. W
ATERMARK DETECTING SCHEME
The watermark detecting procedure in the proposed method neither
needs the original audio signal nor any other side information. The
watermark detecting procedure is stated as follows.
1) Locating the beginning position B of the watermarked segment is
achieved based on the frame synchronization technology of digital
communications.
2)
-level DWT is performed on each audio segment after
B, and then get the coefficients as follows:
3) The DCT is performed on the low frequency DWT-coefficient
4) The extraction rule is
5) Finally, the watermark image
can be obtained by descrambling.
4838 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
TABLE I
T
HE
SYNCHRONIZATION CODE
DETECTION
RESULTS FOR
VARIOUS ATTACKS
TABLE II
T
HE WATERMARK DETECTION RESULTS FOR VARIOUS ATTACKS NC
In addition, in order to dispel the influence of subjective and ob-
jective factors, such as experience, health condition, and experiment
condition of the observer, the normalized cross correlation (NC)[13] is
adopted to appraise the similarity between the extracted watermark and
the original one, its definition is
where
and are the original watermark image and the extracted
watermark image, respectively. If this NC exceeds a certain threshold,
we conclude that this audio signal is protected, otherwise is not
protected.
In this correspondence, reliability was measured as the bit error rate
(BER) of extracted watermark, its definition is
BER
where is the number of erroneously detected bits.
We conduct the signal-to-noise ratio (SNR) test, which serves as an
objective measurement of audio signal quality. The SNR is measured
by comparing the watermarked signal against the original signal. The
SNR is defined as follows:
SNR
Length
Length
where
and are samples of the original and the watermarked audio
signals, respectively.
V. E
XPERIMENTAL RESULTS
In order to illustrate the inaudible and robust nature of our water-
marking scheme, the proposed watermarking algorithm is applied to
two digital audio pieces. All of the audio signals in the test are music
with 16 bit signed mono audio signals sampled at 44.1 kHz. We use a
64
64 bit binary image as our watermark for all audio signals and a
16-bit Barker code 1111100110101110 as synchronization code. The
Daubechies-1 wavelet basis is used. The smaller level DWT will in-
fluence the robustness of the watermark; and the larger one will cause
large calculation, so 3-level DWT is performed in this test.
In our experiments, the length of each segment of watermark em-
bedding is fixed as 65536 samples. The each synchronization bit is em-
bedded into the mean value of five samples.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4839
TABLE III
T
HE
WATERMARK DETECTION
RESULTS FOR
VARIOUS
ATTACKS BER
TABLE IV
T
HE
WATERMARK DETECTION
RESULTS FOR
VARIOUS
ATTACKS SNR
Two original audio signals in the test are shown in Fig. 3(a) and
(c). The lengths of them are 9.75 and 6.83 s, respectively. The original
watermark image is displayed in Fig. 3(c). The quantization steps of
the two test samples are
and ,
respectively.
In order to illustrate the robust nature of our watermarking scheme,
attacks including MPEG compression, resampling, requantization,
noise adding, and random cropping are used to estimate the robustness
of our scheme.
Table I summarizes the proposed synchronization code detection re-
sults comparing with that of scheme [12]. Where
is able to find
synchronization code,
is unable to find synchronization code.
Tables II–IV summarize the proposed watermark detection results
comparing with that of scheme [12] against various attacks. In addi-
tion, the detection results of DCT and DWT are given in the Table II,
Table III, and Table IV respectively.
The NC and BER of watermark image and the SNR of digital audio
signal are also given in the Table II, Table III, and Table IV.
VI. C
ONCLUSION
In this correspondence, we propose a novel synchronization dig-
ital audio watermarking algorithm based on the quantization of coef-
ficients. To improve the robustness of audio watermark, the proposed
algorithm is constructed by selecting robust Barker code as synchro-
nization code, embedding synchronization code into the mean value of
several samples and embedding watermark into DWT and DCT coeffi-
cients. The experimental results have illustrated the robust nature of our
synchronization embedding scheme and inaudible nature of our water-
marking scheme. In addition, the watermark can be extracted without
the help from the original digital audio signal and can be easily im-
plemented. Despite the success of the proposed method, it also has a
drawback. The proposed method is not very robust against pitch in-
variant time scale modification. Further research will focus on over-
coming this problem.
REFERENCES
[1] I. J. Cox and M. L. Miller, “The first 50 years of electronic water-
marking,” J. Appl. Signal Process., vol. 56, no. 2, pp. 225–230, 2002.
[2] L. Wei, Y. Yi-Qun, L. Xiao-Qiang, X. Xiang-Yang, and L. Pei-Zhong,
“Overview of digital audio watermarking,” J. Commun., vol. 26, no. 2,
pp. 100–111, 2005.
[3] L. Wen-Nung and C. Li-Chun, “Robust and high-quality time-domain
audio watermarking subject to psycho acoustic masking,” in Proc.
IEEE Int. Symp. Circuits Syst., AZ, 2002, vol. 2, pp. 45–48.
[4] D. Megías, J. Herrera-joancomartí, and J. Minguillón, “A robust audio
watermarking scheme based on MPEG 1 layer III compression,” in
Communications and Multimedia Security—CMS 2003. New York:
Springer-Verlag, 2003, vol. 963, LNCS, pp. 226–238.
[5] H. J. Kim, “Audio watermarking techniques,” in Pacific Rim Workshop
on Digital Steganography, Kyushu Institute of Technology, Kitakyushu,
Japan, Jul. 3–4, 2003.
[6] S. Sheng-He, L. Zhe-Ming, and N. Xia-Mu, Digital Watermarking
Technique. Beijing, China: Science, 2004.
4840 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
[7] D. Kirovski and H. S. Malvar, “Spread spectrum watermarking of audio
signals,” IEEE Trans. Signal Process., vol. 51, no. 4, pp. 1020–1033,
Apr. 2003.
[8] C.-P. Wu, P.-C. Su, and C.-C. Jay Kuo, “Robust audio watermarking for
copyright protection,” in Proc. SPIE, Jul. 1999, vol. 3807, pp. 387–397.
[9] W. Li and X. Y. Xue, “Audio watermarking based on music content
analysis: Robust against time scale modification,” in Proc. 2nd Int.
Workshop on Digital Watermarking, Korea, 2003, pp. 289–300.
[10] H. O. Kim, B. K. Lee, and N. Y. Lee, Wavelet-based audio water-
marking techniques: Robustness and fast synchronization [Online].
Available: http://amath.kaist.ac.kr/research/paper/01–11.pdf
[11] W. Yong, H. Ji-Wu, and Y. Q. Shi, “Meaningful watermarking for
audio with fast resynchronization,” J. Comput. Res. Develop., vol. 40,
no. 20, pp. 215–220, 2003.
[12] J. W. Huang, W. Yong, and Y. Q. Shi, “A blind audio watermarking
algorithm with self-synchronization,” in Proc. IEEE Int. Symp. Circuits
Syst., AZ, 2002, vol. 3, pp. 627–630.
[13] M. Kutter and F. A. P. Petitcolas, “A fair benchmark for image wa-
termarking systems,” in Proc. Electron. Imag., 1999, vol. 3657, pp.
226–239.
On the High-SNR Conditional Maximum-Likelihood
Estimator Full Statistical Characterization
Alexandre Renaux, Student Member, IEEE,
Philippe Forster, Member, IEEE, Eric Chaumette, and
Pascal Larzabal, Member, IEEE
Abstract—In the field of asymptotic performance characterization of the
conditional maximum-likelihood (CML) estimator, asymptotic generally
refers to either the number of samples or the signal-to-noise ratio (SNR)
value. The first case has been already fully characterized, although the
second case has been only partially investigated. Therefore, this corre-
spondence aims to provide a sound proof of a result, i.e., asymptotic (in
SNR) Gaussianity and efficiency of the CML estimator in the multiple
parameters case, generally regarded as trivial but not so far demonstrated.
Index Terms—Array processing, high signal-to-noise ratio (SNR), max-
imum likelihood, statistical efficiency.
I. INTRODUCTION
Parameters estimation of multiple signals impinging on an antenna
array is a fundamental problem in signal processing with applications to
radar, sonar, digital communication and many other fields. A plethora
of algorithms have been proposed in the literature in this sense (see
[1]).
Perhaps the most well-known and frequently used model-based
approach in signal processing is the maximum-likelihood (ML) tech-
nique. When applying the ML technique to a sensors array problem,
Manuscript received September 22, 2005; revised February 22, 2006. The
associate editor coordinating the review of this manuscript and approving it for
publication was Dr. Jean Pierre Delmas. This work has been performed in the
framework of the European Community Contract no. 507325, NEWCOM.
A. Renaux and P. Larzabal are with Ecole Normale Supérieure de Cachan,
SATIE Laboratory, 94235 Cachan Cedex, France (e-mail: renaux@satie.
ens-cachan.fr; larzabal@satie.ens-cachan.fr).
P. Forster is with University Paris 10, GEA Laboratory, 92410 Ville d’Avray,
France (e-mail:philippe.forster@cva.u-paris10.fr).
E. Chaumette is with Thales Naval France, 92200 Bagneux, France
(e-mail:eric.chaumette@fr.thalesgroup.com).
Digital Object Identifier 10.1109/TSP.2006.882072
two main methods have been considered, depending on the model
used for the signal waveforms. When the source signals are modeled
as Gaussian random processes, a stochastic ML (SML) is obtained. If,
on the other hand, when the source signals are modeled as unknown
deterministic quantities, the resulting estimator is referred to as the
conditional ML (CML) estimator (see, e.g., [2], for a review of the
two methods).
Asymptotic statistical performance of these ML methods is an
important field of research. For that purpose, the estimation accuracy
is generally investigated by means of the Cramér–Rao bound. Since
two models are used for the different ML methods, two Cramér–Rao
bounds have been derived: the stochastic Cramér–Rao bound when
the source signals are modelled as Gaussian random processes and
the deterministic Cramér–Rao bound when the source signals are
modelled as unknown deterministic quantities (see, e.g., [2], for a
review of these two bounds).
In the array processing context, the term “asymptotic” can be un-
derstood in two different ways: in the number of samples or in the
signal-to-noise ratio (SNR) value. At large number of samples, the
statistical performance of these ML methods has been fully character-
ized (see [3]). Concerning the high SNR context, the nonefficiency (in
comparison with the stochastic Cramér–Rao bound) and the non-Gaus-
sianity of the SML have been recently proven in [4]. Concerning the
CML method in the high-SNR framework, it is generally accepted that
this estimator is Gaussian and efficient although, to our knowledge,
there is no sound proof of this result in the literature in the multi-pa-
rameters case. Indeed, to the best of our knowledge, the CML estimator
has been only partially investigated in [5], where the Gaussianity of the
CML estimates is proved in the single-parameter case by the way of a
Gaussian observation model with parameterized mean. Moreover, the
asymptotic efficiency of the CML estimator in the high-SNR case has
never been demonstrated. This correspondence aims to complete Kay’s
result, i.e., to establish the Gaussianity and the efficiency (in compar-
ison with the deterministic Cramér–Rao bound) of the CML estimator
in the multiple-parameters case. Moreover, we show how these results
still hold for noncircular complex Gaussian noise. Monte Carlo simu-
lations are provided in order to show the accuracy of the analysis.
The notational convention adopted is as follows: italic indicates a
scalar quantity, as in
; lower case boldface indicates a vector quantity,
as in
; upper case boldface indicates a matrix quantity, as in . The
th row and th column element of the matrix
will be denoted by
. is the real part of
, and
is the imaginary part
of
. The matrix transpose is indicated by a superscript
as in
.
is the determinant of the square matrix
.
is the identity matrix
of order
. denotes the expectation operator and the norm. A
sample of a random vector
is denoted
, where belongs to the
event space
. and denote, respectively, the small “o” and
the stochastic small “o” notation.
II. O
BSERVATION
MODEL AND MAXIMUM-LIKELIHOOD ESTIMATOR
A. Observation Model
In the sequel, we consider the following general observation model:
(1)
where
is a real sample vector, ,
is the real vector of unknown deterministic parameters
of interest with true value
, is a real deterministic vector
depending (generally nonlinearly) on
which is assumed to be identi-
fiable from
. is the additive noise vector, which is
1053-587X/$20.00 © 2006 IEEE