ArticlePDF Available

A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT

Authors:

Abstract and Figures

Synchronization attack is one of the key issues of digital audio watermarking. In this correspondence, a blind digital audio watermarking scheme against synchronization attack using adaptive quantization is proposed. The features of the proposed scheme are as follows: 1) a kind of more steady synchronization code and a new embedded strategy are adopted to resist the synchronization attack more effectively; 2) he multiresolution characteristics of discrete wavelet transform (DWT) and the energy-compression characteristics of discrete cosine transform (DCT) are combined to improve the transparency of digital watermark; 3) the watermark is embedded into the low frequency components by adaptive quantization according to human auditory masking; and 4) the scheme can extract the watermark without the help of the original digital audio signal. Experiment results shows that the proposed watermarking scheme is inaudible and robust against various signal processing such as noise adding, resampling, requantization, random cropping, and MPEG-1 Layer III (MP3) compression
Content may be subject to copyright.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4835
[16] P. Stoica and A. Nehorai, “Performance analysis of an adaptive notch
filter with constrained poles and zeros,IEEE Trans. Acoust., Speech,
Signal Process., vol. ASSP-36, pp. 911–919, Jun. 1988.
[17] J. F. Chicharo and T. S. Ng, “Gradient-based adaptive IIR notch fil-
tering for frequency estimation, IEEE Trans. Acoust., Speech, Signal
Process., vol. 38, pp. 769–777, May 1990.
[18] B. S. Chen, T. Y. Yang, and B. H. Lin, “Adaptive notch filter by direct
frequency estimation,Signal Process., vol. 27, pp. 161–176, 1992.
[19] D. A. Johns, W. M. Snelgrove, and A. S. Sedra, Adaptive recursive
state-space filters using a gradient based algorithm,IEEE Trans. Cir-
cuits Syst., vol. 37, pp. 673–684, Jun. 1990.
[20] M. Nayeri and W. K. Jenkins, Alternate realizations to adaptive IIR
filters and properties of their performance surfaces,IEEE Trans. Cir-
cuits Syst., vol. 36, pp. 485–496, Apr. 1989.
A Novel Synchronization Invariant Audio Watermarking
Scheme Based on DWT and DCT
Xiang-Yang Wang and Hong Zhao
Abstract—Synchronization attack is one of the key issues of digital audio
watermarking. In this correspondence, a blind digital audio watermarking
scheme against synchronization attack using adaptive quantization is pro-
posed. The features of the proposed scheme are as follows: 1) a kind of more
steady synchronization code and a new embedded strategy are adopted
to resist the synchronization attack more effectively; 2) he multiresolution
characteristics of discrete wavelet transform (DWT) and the energy-com-
pression characteristics of discrete cosine transform (DCT) are combined
to improve the transparency of digital watermark; 3) the watermark is em-
bedded into the low frequency components by adaptive quantization ac-
cording to human auditory masking; and 4) the scheme can extract the wa-
termark without the help of the original digital audio signal. Experiment
results shows that the proposed watermarking scheme is inaudible and ro-
bust against various signal processing such as noise adding, resampling,
requantization, random cropping, and MPEG-1 Layer III (MP3) compres-
sion.
Index Terms—Audio watermarking, discrete cosine transform (DCT),
discrete wavelet transform (DWT), synchronization.
I. I
NTRODUCTION
With the rapid development of the network (especially the Internet)
and multimedia technique, the protection of intellectual property rights
has been the key problem which we must solve. Under this background,
digital watermarking has received a large deal of attention recently
and has been a focus in network information security [1]. Digital wa-
termarking can be classified into image watermarking, video water-
marking and audio watermarking according to the range of applica-
tion. The current digital watermarking schemes mainly focus on image
and video copyright protection. A few audio watermarking techniques
Manuscript received January 11, 2006; revised January 26, 2006. This work
was supported in part by the Natural Science Foundation of Liaoning Province
of China by Grant 20032100 and by the Open foundation of State Key Labo-
ratory of Information Security of China by Grant 03-02. The associate editor
coordinating the review of this manuscript and approving it for publication was
Prof. Zixiang Xiong.
X.-Y. Wang is with the School of Computer and Information Technique,
Liaoning Normal University, Dalian 116029, China. He is also with the State
Key Laboratory of Information Security, Institute of Software of Chinese
Academy of Sciences, Beijing 100039, China (e-mail: wxy37@263.net).
H. Zhao is with the School of Computer and Information Technique, Liaoning
Normal University, Dalian 116029, China (e-mail: fhq_xa@126.com).
Digital Object Identifier 10.1109/TSP.2006.881258
have been reported [2]. Especially, it is hard to find the robust audio
watermarking algorithms which can resist the synchronization attack
effectively [3]–[5].
Synchronization attack does not refer to get rid of watermark infor-
mation from the watermarked signal, but refer to change the embedding
position so that the detector cannot detect the right watermark. Up to
now, four robust audio watermarking strategies are adopted to resist
the synchronization attack. They are: all-list-search [2], combination
of spread spectrum and spread spectrum code [6], [7], utilizing the im-
portant feature of origin digital audio [8], [9] (or we call it self-syn-
chronization strategy), and synchronization code [10]–[12]. Among
them, All-list-search strategy need great calculating amount and has
high false positive rate; the second strategy cannot achieve blind detec-
tion. Kirovski
et al. [7] proposed several novel mechanisms for effec-
tive encoding and detection of direct-sequence spread-spectrum wa-
termarks in audio signals, the presented method embed HAS shaped
watermark into modulated complex lapped transform (MCLT) coeffi-
cients; the current self-synchronization algorithm cannot extract fea-
ture points steadily, besides, it usually need large number of threshold
values which make it more difficult to be applied. By contrast, syn-
chronization code strategy has more obvious technological advantages.
Kim et al. [10] proposed a robust audio watermarking strategy using
common binary sequence as synchronization, but the relatively poor
periodic and aperiodic relativity of common binary sequence weakens
the ability of resisting synchronization attack. Barker code has better
self-relativity, so Wang et al. [11] and Huang et al. [12] chooses it as
synchronization mark and embeds it into temporal domain, then, em-
beds the watermark information into DCT domain. It can resist syn-
chronization attack effectively. But it has such defects as follows: 1) it
chooses a 12-bit Barker code which is so short that it is easy to cause
false synchronization; 2) it only embeds the synchronization code by
modifying individual sample value, which reduces the resisting ability
greatly (especially against resampling and MP3 compression); 3) it
does not make full use of human auditory masking effect.
Taking the problems aforementioned into consideration, we intro-
duce a DWT- and DCT-based digital audio blind watermarking al-
gorithm that can resist synchronization attack effectively. We choose
16-bit Barker code as synchronization mark, and embed it by modi-
fying the mean value of several samples. Besides, in order to make full
use of auditory masking effect, we embed the watermark information
into DWT and DCT domain.
This correspondnece proceeds as follows: Section II describes the
basic principle of the proposed algorithm and synchronization code.
Section III introduces the proposed embedding algorithm for synchro-
nization code and watermark information. In Section IV, the detection
procedure is provided. The simulated experimental results and conclu-
sions are given in Section V.
II. F
UNDAMENTAL THEORY AND
SYNCHRONIZATION
A. Fundamental Theory
In our audio watermarking scheme, the watermark can be embedded
into the host audio by three steps. First, the origin digital audio is seg-
mented and then each segment is cut into two sections. Second, with
the spatial watermarking technique, synchronization code is embedded
into the first section. Finally, the DWT and DCT are performed on the
second section, and then the watermark is embedded into the low fre-
quency components by quantization. The construction of embedding
information is shown in Fig. 1, and a diagram of our audio water-
marking technique is shown in Fig. 2.
1053-587X/$20.00 © 2006 IEEE
4836 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
Fig. 1. Construction of embedding information.
Fig. 2. Watermark embedding scheme.
B. Synchronization Code
Synchronization is one of the key issues of audio watermarking.
Watermark detection starts by alignment of watermarked block with
detector. Losing synchronization causes false detection. Time-scale or
frequency-scale modication makes the detector lose synchronization.
So we need exact synchronization algorithms based on robust synchro-
nization code.
Generally, we should avoid false synchronization during selecting
synchronization code. Several reasons contribute to false synchroniza-
tion:
the style of the synchronization code, the length of syn-
chronization code,
the probability of 0 and 1 in synchronization
code. Among of them, the length of synchronization code is especially
important. The longer it is, the more robust it is.
The proposed scheme embeds Barker code in front of the watermark
to locate the position where watermark is embedded. Barker codes,
which are subsets of PN sequences, are commonly used for frame
synchronization in digital communication systems. Barker codes have
low correlation sidelobes. A correlation sidelobe is the correlation of a
codeword with a time-shifted version of itself. The correlation sidelobe
for a -symbol shift of an -bit code sequence is given by
where is an individual code symbol taking values or for
, and the adjacent symbols are assumed to be zero.
III. W
ATERMARK
EMBEDDING SCHEME
In order to guarantee robustness and transparency of watermarking,
the proposed scheme embeds synchronization code in the mean value
of several samples.
Let
Length represent a host digital audio
signal with Length samples.
is a binary image to be embedded within the host audio signal, and
is the pixel value at
.
is a synchronization code with
bits, where
.
The main steps of the embedding procedure developed can be described
as follows.
A. Preprocessing
In order to dispel the pixel space relationship of the binary watermark
image, and improve the security performance of the whole digital wa-
termark system, watermark scrambling algorithm is used at rst. In our
watermark embedding scheme, the binary watermark image is scram-
bled from
to
by using Arnold transform.
To improve the robustness of proposed scheme against cropping and
make the detector available when it loses synchronization, audio seg-
menting is used at rst, and then, synchronization code and watermark
are embedded into each segment. Let
denotes each segment, and
is cut into two sections
and with
and
samples, respec-
tively. Synchronization code and watermark are embedded into
and
, respectively.
B. Synchronization Code Embedding
The proposed watermark embedding method proceeds as follows.
1) The audio segment
is cut into
audio segments, and each
audio segment
having samples, where
2) Calculating the mean value of ,that is
3) The synchronization code can be embedded into each
by quantizing the mean value
, the rule is given by
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4837
where is original sample,
is modied sample, and
if
if
where returns the remainder of the division of
by
, and
is the quantization step.
C. Watermark Embedding
1) DWT: For each audio segment
,
-level DWT is
performed, and we get the wavelet coefcients of
, where is the coarse
signal and the detail signals are
.
2) DCT:To take the advantage of low frequency coefcient which has
a higher energy value and robustness against various signal pro-
cessing, the DCT is only performed on low frequency coefcient
3) Watermark Embedding: In order to guarantee robustness and
transparency of watermark, the proposed scheme embeds wa-
termark signal bit in the magnitude of the DCT-coefcient by
quantization .The proposed method embeds all watermark bits in
every segment of audio signal. The quantization function is given
as follows:
if
if
where ,
, and is the quantization step,
and
4) Inverse DCT: The Inverse DCT is performed on low frequency
coefcient
as follows:
5) Inverse DWT: After substituting the coefcients with ,
-level Inverse DWT is performed, and then the watermarked
digital audio signal is
.
Fig. 3. The origin digital audio signals and watermark.
D. Repeat Embedding
In order to improve the robustness against cropping, the proposed
scheme repeats Sections III-B and III-C sections to embed synchro-
nization code and watermark into every audio segments.
In our experiments, the length of watermark embedding are xed as
.
IV. W
ATERMARK DETECTING SCHEME
The watermark detecting procedure in the proposed method neither
needs the original audio signal nor any other side information. The
watermark detecting procedure is stated as follows.
1) Locating the beginning position B of the watermarked segment is
achieved based on the frame synchronization technology of digital
communications.
2)
-level DWT is performed on each audio segment after
B, and then get the coefcients as follows:
3) The DCT is performed on the low frequency DWT-coefcient
4) The extraction rule is
5) Finally, the watermark image
can be obtained by descrambling.
4838 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
TABLE I
T
HE
SYNCHRONIZATION CODE
DETECTION
RESULTS FOR
VARIOUS ATTACKS
TABLE II
T
HE WATERMARK DETECTION RESULTS FOR VARIOUS ATTACKS NC
In addition, in order to dispel the inuence of subjective and ob-
jective factors, such as experience, health condition, and experiment
condition of the observer, the normalized cross correlation (NC)[13] is
adopted to appraise the similarity between the extracted watermark and
the original one, its denition is
where
and are the original watermark image and the extracted
watermark image, respectively. If this NC exceeds a certain threshold,
we conclude that this audio signal is protected, otherwise is not
protected.
In this correspondence, reliability was measured as the bit error rate
(BER) of extracted watermark, its denition is
BER
where is the number of erroneously detected bits.
We conduct the signal-to-noise ratio (SNR) test, which serves as an
objective measurement of audio signal quality. The SNR is measured
by comparing the watermarked signal against the original signal. The
SNR is dened as follows:
SNR
Length
Length
where
and are samples of the original and the watermarked audio
signals, respectively.
V. E
XPERIMENTAL RESULTS
In order to illustrate the inaudible and robust nature of our water-
marking scheme, the proposed watermarking algorithm is applied to
two digital audio pieces. All of the audio signals in the test are music
with 16 bit signed mono audio signals sampled at 44.1 kHz. We use a
64
64 bit binary image as our watermark for all audio signals and a
16-bit Barker code 1111100110101110 as synchronization code. The
Daubechies-1 wavelet basis is used. The smaller level DWT will in-
uence the robustness of the watermark; and the larger one will cause
large calculation, so 3-level DWT is performed in this test.
In our experiments, the length of each segment of watermark em-
bedding is xed as 65536 samples. The each synchronization bit is em-
bedded into the mean value of ve samples.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4839
TABLE III
T
HE
WATERMARK DETECTION
RESULTS FOR
VARIOUS
ATTACKS BER
TABLE IV
T
HE
WATERMARK DETECTION
RESULTS FOR
VARIOUS
ATTACKS SNR
Two original audio signals in the test are shown in Fig. 3(a) and
(c). The lengths of them are 9.75 and 6.83 s, respectively. The original
watermark image is displayed in Fig. 3(c). The quantization steps of
the two test samples are
and ,
respectively.
In order to illustrate the robust nature of our watermarking scheme,
attacks including MPEG compression, resampling, requantization,
noise adding, and random cropping are used to estimate the robustness
of our scheme.
Table I summarizes the proposed synchronization code detection re-
sults comparing with that of scheme [12]. Where
is able to nd
synchronization code,
is unable to nd synchronization code.
Tables IIIV summarize the proposed watermark detection results
comparing with that of scheme [12] against various attacks. In addi-
tion, the detection results of DCT and DWT are given in the Table II,
Table III, and Table IV respectively.
The NC and BER of watermark image and the SNR of digital audio
signal are also given in the Table II, Table III, and Table IV.
VI. C
ONCLUSION
In this correspondence, we propose a novel synchronization dig-
ital audio watermarking algorithm based on the quantization of coef-
cients. To improve the robustness of audio watermark, the proposed
algorithm is constructed by selecting robust Barker code as synchro-
nization code, embedding synchronization code into the mean value of
several samples and embedding watermark into DWT and DCT coef-
cients. The experimental results have illustrated the robust nature of our
synchronization embedding scheme and inaudible nature of our water-
marking scheme. In addition, the watermark can be extracted without
the help from the original digital audio signal and can be easily im-
plemented. Despite the success of the proposed method, it also has a
drawback. The proposed method is not very robust against pitch in-
variant time scale modication. Further research will focus on over-
coming this problem.
REFERENCES
[1] I. J. Cox and M. L. Miller, The rst 50 years of electronic water-
marking,J. Appl. Signal Process., vol. 56, no. 2, pp. 225230, 2002.
[2] L. Wei, Y. Yi-Qun, L. Xiao-Qiang, X. Xiang-Yang, and L. Pei-Zhong,
Overview of digital audio watermarking,J. Commun., vol. 26, no. 2,
pp. 100111, 2005.
[3] L. Wen-Nung and C. Li-Chun, Robust and high-quality time-domain
audio watermarking subject to psycho acoustic masking, in Proc.
IEEE Int. Symp. Circuits Syst., AZ, 2002, vol. 2, pp. 4548.
[4] D. Megías, J. Herrera-joancomartí, and J. Minguillón, A robust audio
watermarking scheme based on MPEG 1 layer III compression, in
Communications and Multimedia Security—CMS 2003. New York:
Springer-Verlag, 2003, vol. 963, LNCS, pp. 226238.
[5] H. J. Kim, Audio watermarking techniques, in Pacific Rim Workshop
on Digital Steganography, Kyushu Institute of Technology, Kitakyushu,
Japan, Jul. 34, 2003.
[6] S. Sheng-He, L. Zhe-Ming, and N. Xia-Mu, Digital Watermarking
Technique. Beijing, China: Science, 2004.
4840 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006
[7] D. Kirovski and H. S. Malvar, Spread spectrum watermarking of audio
signals, IEEE Trans. Signal Process., vol. 51, no. 4, pp. 10201033,
Apr. 2003.
[8] C.-P. Wu, P.-C. Su, and C.-C. Jay Kuo, Robust audio watermarking for
copyright protection, in Proc. SPIE, Jul. 1999, vol. 3807, pp. 387397.
[9] W. Li and X. Y. Xue, Audio watermarking based on music content
analysis: Robust against time scale modication, in Proc. 2nd Int.
Workshop on Digital Watermarking, Korea, 2003, pp. 289300.
[10] H. O. Kim, B. K. Lee, and N. Y. Lee, Wavelet-based audio water-
marking techniques: Robustness and fast synchronization [Online].
Available: http://amath.kaist.ac.kr/research/paper/0111.pdf
[11] W. Yong, H. Ji-Wu, and Y. Q. Shi, Meaningful watermarking for
audio with fast resynchronization,J. Comput. Res. Develop., vol. 40,
no. 20, pp. 215220, 2003.
[12] J. W. Huang, W. Yong, and Y. Q. Shi, A blind audio watermarking
algorithm with self-synchronization, in Proc. IEEE Int. Symp. Circuits
Syst., AZ, 2002, vol. 3, pp. 627630.
[13] M. Kutter and F. A. P. Petitcolas, A fair benchmark for image wa-
termarking systems, in Proc. Electron. Imag., 1999, vol. 3657, pp.
226239.
On the High-SNR Conditional Maximum-Likelihood
Estimator Full Statistical Characterization
Alexandre Renaux, Student Member, IEEE,
Philippe Forster, Member, IEEE, Eric Chaumette, and
Pascal Larzabal, Member, IEEE
AbstractIn the field of asymptotic performance characterization of the
conditional maximum-likelihood (CML) estimator, asymptotic generally
refers to either the number of samples or the signal-to-noise ratio (SNR)
value. The first case has been already fully characterized, although the
second case has been only partially investigated. Therefore, this corre-
spondence aims to provide a sound proof of a result, i.e., asymptotic (in
SNR) Gaussianity and efficiency of the CML estimator in the multiple
parameters case, generally regarded as trivial but not so far demonstrated.
Index TermsArray processing, high signal-to-noise ratio (SNR), max-
imum likelihood, statistical efficiency.
I. INTRODUCTION
Parameters estimation of multiple signals impinging on an antenna
array is a fundamental problem in signal processing with applications to
radar, sonar, digital communication and many other elds. A plethora
of algorithms have been proposed in the literature in this sense (see
[1]).
Perhaps the most well-known and frequently used model-based
approach in signal processing is the maximum-likelihood (ML) tech-
nique. When applying the ML technique to a sensors array problem,
Manuscript received September 22, 2005; revised February 22, 2006. The
associate editor coordinating the review of this manuscript and approving it for
publication was Dr. Jean Pierre Delmas. This work has been performed in the
framework of the European Community Contract no. 507325, NEWCOM.
A. Renaux and P. Larzabal are with Ecole Normale Supérieure de Cachan,
SATIE Laboratory, 94235 Cachan Cedex, France (e-mail: renaux@satie.
ens-cachan.fr; larzabal@satie.ens-cachan.fr).
P. Forster is with University Paris 10, GEA Laboratory, 92410 Ville dAvray,
France (e-mail:philippe.forster@cva.u-paris10.fr).
E. Chaumette is with Thales Naval France, 92200 Bagneux, France
(e-mail:eric.chaumette@fr.thalesgroup.com).
Digital Object Identier 10.1109/TSP.2006.882072
two main methods have been considered, depending on the model
used for the signal waveforms. When the source signals are modeled
as Gaussian random processes, a stochastic ML (SML) is obtained. If,
on the other hand, when the source signals are modeled as unknown
deterministic quantities, the resulting estimator is referred to as the
conditional ML (CML) estimator (see, e.g., [2], for a review of the
two methods).
Asymptotic statistical performance of these ML methods is an
important eld of research. For that purpose, the estimation accuracy
is generally investigated by means of the CramérRao bound. Since
two models are used for the different ML methods, two CramérRao
bounds have been derived: the stochastic CramérRao bound when
the source signals are modelled as Gaussian random processes and
the deterministic CramérRao bound when the source signals are
modelled as unknown deterministic quantities (see, e.g., [2], for a
review of these two bounds).
In the array processing context, the term asymptotic can be un-
derstood in two different ways: in the number of samples or in the
signal-to-noise ratio (SNR) value. At large number of samples, the
statistical performance of these ML methods has been fully character-
ized (see [3]). Concerning the high SNR context, the nonefciency (in
comparison with the stochastic CramérRao bound) and the non-Gaus-
sianity of the SML have been recently proven in [4]. Concerning the
CML method in the high-SNR framework, it is generally accepted that
this estimator is Gaussian and efcient although, to our knowledge,
there is no sound proof of this result in the literature in the multi-pa-
rameters case. Indeed, to the best of our knowledge, the CML estimator
has been only partially investigated in [5], where the Gaussianity of the
CML estimates is proved in the single-parameter case by the way of a
Gaussian observation model with parameterized mean. Moreover, the
asymptotic efciency of the CML estimator in the high-SNR case has
never been demonstrated. This correspondence aims to complete Kays
result, i.e., to establish the Gaussianity and the efciency (in compar-
ison with the deterministic CramérRao bound) of the CML estimator
in the multiple-parameters case. Moreover, we show how these results
still hold for noncircular complex Gaussian noise. Monte Carlo simu-
lations are provided in order to show the accuracy of the analysis.
The notational convention adopted is as follows: italic indicates a
scalar quantity, as in
; lower case boldface indicates a vector quantity,
as in
; upper case boldface indicates a matrix quantity, as in . The
th row and th column element of the matrix
will be denoted by
. is the real part of
, and
is the imaginary part
of
. The matrix transpose is indicated by a superscript
as in
.
is the determinant of the square matrix
.
is the identity matrix
of order
. denotes the expectation operator and the norm. A
sample of a random vector
is denoted
, where belongs to the
event space
. and denote, respectively, the small o and
the stochastic small o notation.
II. O
BSERVATION
MODEL AND MAXIMUM-LIKELIHOOD ESTIMATOR
A. Observation Model
In the sequel, we consider the following general observation model:
(1)
where
is a real sample vector, ,
is the real vector of unknown deterministic parameters
of interest with true value
, is a real deterministic vector
depending (generally nonlinearly) on
which is assumed to be identi-
able from
. is the additive noise vector, which is
1053-587X/$20.00 © 2006 IEEE
... Watermark locating has been a longstanding challenge in the realm of audio watermarking [27,13,28]. In traditional audio watermarking, a marker known as the "synchronization code [13,28]" is usually added preceding the watermark segment to enable fast localization. ...
... Watermark locating has been a longstanding challenge in the realm of audio watermarking [27,13,28]. In traditional audio watermarking, a marker known as the "synchronization code [13,28]" is usually added preceding the watermark segment to enable fast localization. However, this method has proven vulnerable to desynchronization attacks (e.g., speed variation), which compromises the overall robustness of the watermarking system. ...
... Setup: Since we add multiple watermarks on an utterance, how to effectively locate watermarks has been the subject of extensive research in the field over the years [27,13,28]. To assess the effectiveness of the proposed shift module in conjunction with BFD for addressing this problem, we conducted the watermark locating test. ...
Preprint
Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic data, watermarking presents a proactive and robust defence mechanism against these looming risks. This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet. The watermark is imperceptible to human senses and exhibits strong resilience against various attacks. It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection. Moreover, this framework boasts high flexibility, allowing for the combination of multiple watermark segments to achieve heightened robustness and expanded capacity. Utilizing 10 to 20-second audio as the host, our approach demonstrates an average Bit Error Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over 2800\% in BER compared to the state-of-the-art watermarking tool. See https://aka.ms/wavmark for demos of our work.
... Transform-domain methods are prevalent because they exploit signal features and human auditory properties. This category encompasses well-known techniques such as the discrete cosine transform (DCT) [3][4][5][6], discrete (or fast) Fourier transform (DFT/FFT) [7][8][9][10], discrete wavelet transform (DWT) [4,[11][12][13][14], and singular value decomposition (SVD) [15][16][17]. While transform-domain methods typically excel in terms of robustness and imperceptibility, they inevitably incur computational overhead due to transformations between different domains. ...
... Transform-domain methods are prevalent because they exploit signal features and human auditory properties. This category encompasses well-known techniques such as the discrete cosine transform (DCT) [3][4][5][6], discrete (or fast) Fourier transform (DFT/FFT) [7][8][9][10], discrete wavelet transform (DWT) [4,[11][12][13][14], and singular value decomposition (SVD) [15][16][17]. While transform-domain methods typically excel in terms of robustness and imperceptibility, they inevitably incur computational overhead due to transformations between different domains. ...
Article
Full-text available
Watermarking is a viable approach for safeguarding the proprietary rights of digital media. This study introduces an innovative fast Fourier transform (FFT)-based phase modulation (PM) scheme that facilitates efficient and effective blind audio watermarking at a remarkable rate of 508.85 numeric values per second while still retaining the original quality. Such a payload capacity makes it possible to embed a full-color image of 64 × 64 pixels within an audio signal of just 24.15 s. To bolster the security of watermark images, we have also implemented the Arnold transform in conjunction with chaotic encryption. Our comprehensive analysis and evaluation confirm that the proposed FFT–PM scheme exhibits exceptional imperceptibility, rendering the hidden watermark virtually undetectable. Additionally, the FFT–PM scheme shows impressive robustness against common signal-processing attacks. To further enhance the visual rendition of the recovered color watermarks, we propose using residual neural networks to perform image denoising and super-resolution reconstruction after retrieving the watermarks. The utilization of the residual networks contributes to noticeable improvements in perceptual quality, resulting in higher levels of zero-normalized cross-correlation in cases where the watermarks are severely damaged.
... While transmitting such data's, protection of these data is crucial which involves lot of technologies. The traditional steganography technologies such as Discrete Cosine Transform (DCT) [24], Discrete wavelet transform (DWT) [24] and Least Significant Bit (LSB) [19] have been developed to protect the data against attacks. These methods still have the problem in retrieving the data without any loss at the receiver side. ...
... While transmitting such data's, protection of these data is crucial which involves lot of technologies. The traditional steganography technologies such as Discrete Cosine Transform (DCT) [24], Discrete wavelet transform (DWT) [24] and Least Significant Bit (LSB) [19] have been developed to protect the data against attacks. These methods still have the problem in retrieving the data without any loss at the receiver side. ...
Article
Full-text available
Data hiding along with the security plays a vital role in the wireless network. In recent years, object detection, Feature Extraction (FE) and correlation based technique are developed to predict the pixels that is suitable for embedding. However, traditional data hiding method still suffered in the selection of less distortion pixels. To overcome these shortcomings, this method proposes (i) Moving Object (MO) detection, (ii) FE, (iii) data hiding, and (iv) data encryption. Initially, the MO in the video are detected using optical flow and tracking the weight of MO. Next, data hiding features are extracted using Optimization Fruit fly Algorithm (OFA) with Differential Evolution (DE) and Opposition Based Learning (OBL) algorithm. Then, a modified data hiding techniques based on correlation, regression and pixel segmentation with Prediction Error (PE) techniques are used for improving the performance. Finally, the hidden data is encrypted using Binary Tree Structure (BTS) to provide additional layer of security. Experimental result shows the proposed method is superior to other state-of-the-art methods in terms of Peak-Signal-to-Noise Ratio (PSNR), embedding capacity and security analysis.
... The watermark bit is embedded into the statistics average value of low frequency components. -Wang and Zhao in [107] have proposed a watermarking technique for the audio signal using DCT and DWT. In this technique, DCT coefficients of low-frequency wavelet coefficients are modified according to watermark bits. ...
Article
Full-text available
Robustness, imperceptibility and embedding capacity are the preliminary requirements of any digital audio watermarking technique. However, research has concluded that these requirements are difficult to achieve at the same time. Thus, the watermarking technique is closely dependent on the solution that manages the robustness / imperceptibility trade-off. A large majority of research work has been devoted to improving this trade-off by implementing increasingly advanced techniques. For conciseness and efficiency, the comprehensive review reported in this paper mainly considers the following aspects imperceptibility and robustness among the criteria, as they determine the key performance of most existing audio watermarking systems. In this paper we have introduce the basic concepts of digital audio watermarking, the performance characteristics, and a classification of digital audio watermarking systems according to the extraction/detection process or to human perception. We have also presented various digital audio watermarking applications. Further, we have presented classifications of unintentional and intentional attacks that can be performed on audio watermarking systems and we have highlighted the impact of these attacks on the watermarked audio quality. We have presented two classifications made by researchers, the first one categorizes these attacks into basic and advanced attacks, while the second one classifies the attacks by group according to the process performed on the watermarked audio file. Furthermore, after presenting an overview of the properties of the Human Auditory System (HAS), we have presented several evaluation aspects of audio watermarking systems and we have reviewed various recent robust and imperceptible audio watermarking methods in the spatial, transform and hybrid domains.
... So, the demand of secure data propagation is expanded. Data hiding is the method to preserve the secret information inside the any medium without altering its original features [1,2]. Lot of excellent approaches are proposed and formerly made in practice. ...
Article
Full-text available
The development of multimedia technology has increased the challenge in protecting the information. This paper proposes an effective method for data hiding in videos based on the Pixel Sequence (PS), weight interpolation to protect the secret data from intruders. Initially, the noise is removed from the video with the help of median filter and Moving Objects (MO) inside the frames having different backgrounds are accurately detected using Fast Region-based Convolution Neural Network (FR-CNN). Next, the MO and Non Moving Object (NMO) pixel from each frame are mapped with the adjacent frames to determine the weights applying PS. Then, the cover video is up-scaled with interpolated pixels based on weights of MO and NMO. Finally, the secret data are hidden inside the both interpolated and non-interpolated pixels elevate Embedding Capacity (EC). In interpolated pixels, both Most Significant Bit (MSB) and Least Significant Bit (LSB) of each frames based on their weight are utilized to hide data. Whereas, in non-interpolated pixels, Quorum Function (QF) is used to embed data. Two different data embedding techniques are adopted in each frame to enhance the security. The performance evaluation of the proposed method out-performs the existing methods in terms of precision and recall. Experimental results also show that the stego-video quality is uncompromised and is resistive to security attacks.
... If watermark information directly embedded into signal samples then that type of techniques are called time domain techniques [3], [4], [5]. If watermark is inserted into transform coefficients then those are called transform domain techniques [6], [7], [8]. Transform domain techniques are more robust than the other one. ...
Conference Paper
Full-text available
This paper describes an audio watermarking scheme based on lossy compression. The main idea is taken from an image watermarking approach where the JPEG compression algorithm is used to determine where and how the mark should be placed. Similarly, in the audio scheme suggested in this paper, an MPEG 1 Layer 3 algorithm is chosen for compression to determine the position of the mark bits and, thus, the psychoacoustic masking of the MPEG 1 Layer 3 compression is implicitly used. This methodology provides with a high robustness degree against compression attacks. The suggested scheme is also shown to succeed against most of the StirMark benchmark attacks for audio.
Article
This paper describes a novel technique for embedding watermark bits into digital audio signals. The proposed method is based on the patchwork algorithm on the wavelet domain and does not need the original audio signal in the watermark detection. It uses the wavelet transform generated by the low-pass analysis filter h n whose length is 2 and h 0 = h 1 = 1 to account for a fast synchronization between watermark embedding and detection parts. Several simulation results show that the proposed method is robust against various signal manipulations such as MPEG/Audio layer 3 compression and time scale modification.
Article
This paper surveys the audio watermarking schemes. State-of-the-art of the current watermarking schemes and their implementation techniques are briefly sum-marized. They are classified into five categories: quantization scheme, spread-spectrum scheme, two-set scheme, replica scheme, and self-marking scheme. Advantages and disadvantages of each scheme are also discussed. In addition, synchronization schemes are also surveyed.
Conference Paper
A blind audio information bit hiding algorithm with effective synchronization is proposed in this paper. The algorithm embeds synchronization signals in the time domain to resist the attacks such as cropping while keeping the computation for resynchronization lower. The watermark is placed in blockwise DCT coefficients of the original audio exploiting the HAS (Human Audio System) features. To lower the bit error rate of the watermark in extraction, error-correcting coding is applied. The experimental results show that the hidden imperceptible watermark is robust to the attacks caused by additive noise, MP3 coding, and cropping
Article
In this paper. an adaptive notch filter is investigated for eliminating sinusoids imbedded in noise. The algorithm estimates the sinusoidal frequencies directly from data samples to avoid the variation caused by small perturbation in the estimated coefficients of the filter. And then the notch filter is designed in terms of the estimated frequencies. The stability of the adaptive notch filter can always be ensured without any stability monitoring during adaptive processing. It converges rapidly and attains the Cramer-Rao bound (CRB) for a sufficient large data set. Simulation results are included to demonstrate the performance of the algorithm.
Conference Paper
Synchronization attacks like random cropping and time scale modification are crucial to audio watermarking technique. To combat these attacks, a novel content-dependent temporally localized robust audio watermarking method is proposed in this paper. The basic idea is to embed and detect watermark in selected high energy local regions that represent music transition like drum sounds or note onsets. Such regions correspond to music edge and will not be changed much for the purpose of maintaining high auditory quality. In this way, the embedded watermark is expected to escape the damages caused by audio signal processing, random cropping and time scale modification etc, as shown by the experimental results.
Article
Electronic watermarking can be traced back as far as 1954. The last 10 years has seen considerable interest in digital watermarking, due, in large part, to concerns about illegal piracy of copyrighted content. In this paper, we consider the following questions: is the interest warranted? What are the commercial applications of the technology? What scientific progress has been made in the last 10 years? What are the most exciting areas for research? And where might the next 10 years take us? In our opinion, the interest in watermarking is appropriate. However, we expect that copyright applications will be overshadowed by applications such as broadcast monitoring, authentication, and tracking content distributed within corporations. We further see a variety of applications emerging that add value to media, such as annotation and linking content to the Web. These latter applications may turn out to be the most compelling. Considerable progress has been made toward enabling these applications—perceptual modelling, security threats and countermeasures, and the development of a bag of tricks for efficient implementations. Further progress is needed in methods for handling geometric and temporal distortions. We expect other exciting developments to arise from research in informed watermarking.
Conference Paper
We propose in this paper a new method for embedding digital watermarks into audio signals in the time domain. By testing frequency domain characteristics (i.e., the psychoacoustic model) and making appropriate adjustments, our algorithm is capable of preventing watermark disturbance from human perception. A watermark can be extracted without the knowledge of original audio signals. Experiments show that our watermarking scheme leads to results with good audio quality (comparable to the original ones, according to some subjective tests) and is robust (more than 98% of survival rate) to pirate attacks, such as MP3 compression, low-pass filtering, amplitude normalization, DAVAD same-rate reacquisition, and cropping