ArticlePDF Available

A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT

January 2007
IEEE Transactions on Signal Processing 54(12):4835 - 4840

January 2007
54(12):4835 - 4840

DOI:10.1109/TSP.2006.881258

Source
IEEE Xplore

Authors:

xiang yang Wang

Tasly Holding Group Co., Ltd.

Hong Zhao

MinNan Normal University

Synchronization attack is one of the key issues of digital audio watermarking. In this correspondence, a blind digital audio watermarking scheme against synchronization attack using adaptive quantization is proposed. The features of the proposed scheme are as follows: 1) a kind of more steady synchronization code and a new embedded strategy are adopted to resist the synchronization attack more effectively; 2) he multiresolution characteristics of discrete wavelet transform (DWT) and the energy-compression characteristics of discrete cosine transform (DCT) are combined to improve the transparency of digital watermark; 3) the watermark is embedded into the low frequency components by adaptive quantization according to human auditory masking; and 4) the scheme can extract the watermark without the help of the original digital audio signal. Experiment results shows that the proposed watermarking scheme is inaudible and robust against various signal processing such as noise adding, resampling, requantization, random cropping, and MPEG-1 Layer III (MP3) compression

Construction of embedding information.

…

The origin digital audio signals and watermark.

…

Figures - uploaded by Hong Zhao

Content may be subject to copyright.

Content uploaded by Hong Zhao

Content may be subject to copyright.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4835

[16] P. Stoica and A. Nehorai, “Performance analysis of an adaptive notch

ﬁlter with constrained poles and zeros,” IEEE Trans. Acoust., Speech,

Signal Process., vol. ASSP-36, pp. 911–919, Jun. 1988.

[17] J. F. Chicharo and T. S. Ng, “Gradient-based adaptive IIR notch ﬁl-

tering for frequency estimation,” IEEE Trans. Acoust., Speech, Signal

Process., vol. 38, pp. 769–777, May 1990.

[18] B. S. Chen, T. Y. Yang, and B. H. Lin, “Adaptive notch ﬁlter by direct

frequency estimation,” Signal Process., vol. 27, pp. 161–176, 1992.

[19] D. A. Johns, W. M. Snelgrove, and A. S. Sedra, “Adaptive recursive

state-space ﬁlters using a gradient based algorithm,” IEEE Trans. Cir-

cuits Syst., vol. 37, pp. 673–684, Jun. 1990.

[20] M. Nayeri and W. K. Jenkins, “Alternate realizations to adaptive IIR

ﬁlters and properties of their performance surfaces,” IEEE Trans. Cir-

cuits Syst., vol. 36, pp. 485–496, Apr. 1989.

A Novel Synchronization Invariant Audio Watermarking

Scheme Based on DWT and DCT

Xiang-Yang Wang and Hong Zhao

Abstract—Synchronization attack is one of the key issues of digital audio

watermarking. In this correspondence, a blind digital audio watermarking

scheme against synchronization attack using adaptive quantization is pro-

posed. The features of the proposed scheme are as follows: 1) a kind of more

steady synchronization code and a new embedded strategy are adopted

to resist the synchronization attack more effectively; 2) he multiresolution

characteristics of discrete wavelet transform (DWT) and the energy-com-

pression characteristics of discrete cosine transform (DCT) are combined

to improve the transparency of digital watermark; 3) the watermark is em-

bedded into the low frequency components by adaptive quantization ac-

cording to human auditory masking; and 4) the scheme can extract the wa-

termark without the help of the original digital audio signal. Experiment

results shows that the proposed watermarking scheme is inaudible and ro-

bust against various signal processing such as noise adding, resampling,

requantization, random cropping, and MPEG-1 Layer III (MP3) compres-

sion.

Index Terms—Audio watermarking, discrete cosine transform (DCT),

discrete wavelet transform (DWT), synchronization.

I. I

NTRODUCTION

With the rapid development of the network (especially the Internet)

and multimedia technique, the protection of intellectual property rights

has been the key problem which we must solve. Under this background,

digital watermarking has received a large deal of attention recently

and has been a focus in network information security [1]. Digital wa-

termarking can be classiﬁed into image watermarking, video water-

marking and audio watermarking according to the range of applica-

tion. The current digital watermarking schemes mainly focus on image

and video copyright protection. A few audio watermarking techniques

Manuscript received January 11, 2006; revised January 26, 2006. This work

was supported in part by the Natural Science Foundation of Liaoning Province

of China by Grant 20032100 and by the Open foundation of State Key Labo-

ratory of Information Security of China by Grant 03-02. The associate editor

coordinating the review of this manuscript and approving it for publication was

Prof. Zixiang Xiong.

X.-Y. Wang is with the School of Computer and Information Technique,

Liaoning Normal University, Dalian 116029, China. He is also with the State

Key Laboratory of Information Security, Institute of Software of Chinese

Academy of Sciences, Beijing 100039, China (e-mail: wxy37@263.net).

H. Zhao is with the School of Computer and Information Technique, Liaoning

Normal University, Dalian 116029, China (e-mail: fhq_xa@126.com).

Digital Object Identiﬁer 10.1109/TSP.2006.881258

have been reported [2]. Especially, it is hard to ﬁnd the robust audio

watermarking algorithms which can resist the synchronization attack

effectively [3]–[5].

Synchronization attack does not refer to get rid of watermark infor-

mation from the watermarked signal, but refer to change the embedding

position so that the detector cannot detect the right watermark. Up to

now, four robust audio watermarking strategies are adopted to resist

the synchronization attack. They are: all-list-search [2], combination

of spread spectrum and spread spectrum code [6], [7], utilizing the im-

portant feature of origin digital audio [8], [9] (or we call it self-syn-

chronization strategy), and synchronization code [10]–[12]. Among

them, All-list-search strategy need great calculating amount and has

high false positive rate; the second strategy cannot achieve blind detec-

tion. Kirovski

et al. [7] proposed several novel mechanisms for effec-

tive encoding and detection of direct-sequence spread-spectrum wa-

termarks in audio signals, the presented method embed HAS shaped

watermark into modulated complex lapped transform (MCLT) coefﬁ-

cients; the current self-synchronization algorithm cannot extract fea-

ture points steadily, besides, it usually need large number of threshold

values which make it more difﬁcult to be applied. By contrast, syn-

chronization code strategy has more obvious technological advantages.

Kim et al. [10] proposed a robust audio watermarking strategy using

common binary sequence as synchronization, but the relatively poor

periodic and aperiodic relativity of common binary sequence weakens

the ability of resisting synchronization attack. Barker code has better

self-relativity, so Wang et al. [11] and Huang et al. [12] chooses it as

synchronization mark and embeds it into temporal domain, then, em-

beds the watermark information into DCT domain. It can resist syn-

chronization attack effectively. But it has such defects as follows: 1) it

chooses a 12-bit Barker code which is so short that it is easy to cause

false synchronization; 2) it only embeds the synchronization code by

modifying individual sample value, which reduces the resisting ability

greatly (especially against resampling and MP3 compression); 3) it

does not make full use of human auditory masking effect.

Taking the problems aforementioned into consideration, we intro-

duce a DWT- and DCT-based digital audio blind watermarking al-

gorithm that can resist synchronization attack effectively. We choose

16-bit Barker code as synchronization mark, and embed it by modi-

fying the mean value of several samples. Besides, in order to make full

use of auditory masking effect, we embed the watermark information

into DWT and DCT domain.

This correspondnece proceeds as follows: Section II describes the

basic principle of the proposed algorithm and synchronization code.

Section III introduces the proposed embedding algorithm for synchro-

nization code and watermark information. In Section IV, the detection

procedure is provided. The simulated experimental results and conclu-

sions are given in Section V.

II. F

UNDAMENTAL THEORY AND

SYNCHRONIZATION

A. Fundamental Theory

In our audio watermarking scheme, the watermark can be embedded

into the host audio by three steps. First, the origin digital audio is seg-

mented and then each segment is cut into two sections. Second, with

the spatial watermarking technique, synchronization code is embedded

into the ﬁrst section. Finally, the DWT and DCT are performed on the

second section, and then the watermark is embedded into the low fre-

quency components by quantization. The construction of embedding

information is shown in Fig. 1, and a diagram of our audio water-

marking technique is shown in Fig. 2.

4836 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006

Fig. 1. Construction of embedding information.

Fig. 2. Watermark embedding scheme.

B. Synchronization Code

Synchronization is one of the key issues of audio watermarking.

Watermark detection starts by alignment of watermarked block with

detector. Losing synchronization causes false detection. Time-scale or

frequency-scale modiﬁcation makes the detector lose synchronization.

So we need exact synchronization algorithms based on robust synchro-

nization code.

Generally, we should avoid false synchronization during selecting

synchronization code. Several reasons contribute to false synchroniza-

tion:

the style of the synchronization code, the length of syn-

chronization code,

the probability of “0” and “1” in synchronization

code. Among of them, the length of synchronization code is especially

important. The longer it is, the more robust it is.

The proposed scheme embeds Barker code in front of the watermark

to locate the position where watermark is embedded. Barker codes,

which are subsets of PN sequences, are commonly used for frame

synchronization in digital communication systems. Barker codes have

low correlation sidelobes. A correlation sidelobe is the correlation of a

codeword with a time-shifted version of itself. The correlation sidelobe

for a -symbol shift of an -bit code sequence is given by

where is an individual code symbol taking values or for

, and the adjacent symbols are assumed to be zero.

III. W

ATERMARK

EMBEDDING SCHEME

In order to guarantee robustness and transparency of watermarking,

the proposed scheme embeds synchronization code in the mean value

of several samples.

Let

Length represent a host digital audio

signal with Length samples.

is a binary image to be embedded within the host audio signal, and

is the pixel value at

is a synchronization code with

bits, where

The main steps of the embedding procedure developed can be described

as follows.

A. Preprocessing

In order to dispel the pixel space relationship of the binary watermark

image, and improve the security performance of the whole digital wa-

termark system, watermark scrambling algorithm is used at ﬁrst. In our

watermark embedding scheme, the binary watermark image is scram-

bled from

by using Arnold transform.

To improve the robustness of proposed scheme against cropping and

make the detector available when it loses synchronization, audio seg-

menting is used at ﬁrst, and then, synchronization code and watermark

are embedded into each segment. Let

denotes each segment, and

is cut into two sections

and with

and

samples, respec-

tively. Synchronization code and watermark are embedded into

and

, respectively.

B. Synchronization Code Embedding

The proposed watermark embedding method proceeds as follows.

1) The audio segment

is cut into

audio segments, and each

audio segment

having samples, where

2) Calculating the mean value of ,that is

3) The synchronization code can be embedded into each

by quantizing the mean value

, the rule is given by

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4837

where is original sample,

is modiﬁed sample, and

where returns the remainder of the division of

, and

is the quantization step.

C. Watermark Embedding

1) DWT: For each audio segment

-level DWT is

performed, and we get the wavelet coefﬁcients of

, where is the coarse

signal and the detail signals are

2) DCT:To take the advantage of low frequency coefﬁcient which has

a higher energy value and robustness against various signal pro-

cessing, the DCT is only performed on low frequency coefﬁcient

3) Watermark Embedding: In order to guarantee robustness and

transparency of watermark, the proposed scheme embeds wa-

termark signal bit in the magnitude of the DCT-coefﬁcient by

quantization .The proposed method embeds all watermark bits in

every segment of audio signal. The quantization function is given

as follows:

where ,

, and is the quantization step,

and

4) Inverse DCT: The Inverse DCT is performed on low frequency

coefﬁcient

as follows:

5) Inverse DWT: After substituting the coefﬁcients with ,

-level Inverse DWT is performed, and then the watermarked

digital audio signal is

Fig. 3. The origin digital audio signals and watermark.

D. Repeat Embedding

In order to improve the robustness against cropping, the proposed

scheme repeats Sections III-B and III-C sections to embed synchro-

nization code and watermark into every audio segments.

In our experiments, the length of watermark embedding are ﬁxed as

IV. W

ATERMARK DETECTING SCHEME

The watermark detecting procedure in the proposed method neither

needs the original audio signal nor any other side information. The

watermark detecting procedure is stated as follows.

1) Locating the beginning position B of the watermarked segment is

achieved based on the frame synchronization technology of digital

communications.

-level DWT is performed on each audio segment after

B, and then get the coefﬁcients as follows:

3) The DCT is performed on the low frequency DWT-coefﬁcient

4) The extraction rule is

5) Finally, the watermark image

can be obtained by descrambling.

4838 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006

TABLE I

SYNCHRONIZATION CODE

DETECTION

RESULTS FOR

VARIOUS ATTACKS

TABLE II

HE WATERMARK DETECTION RESULTS FOR VARIOUS ATTACKS NC

In addition, in order to dispel the inﬂuence of subjective and ob-

jective factors, such as experience, health condition, and experiment

condition of the observer, the normalized cross correlation (NC)[13] is

adopted to appraise the similarity between the extracted watermark and

the original one, its deﬁnition is

where

and are the original watermark image and the extracted

watermark image, respectively. If this NC exceeds a certain threshold,

we conclude that this audio signal is protected, otherwise is not

protected.

In this correspondence, reliability was measured as the bit error rate

(BER) of extracted watermark, its deﬁnition is

BER

where is the number of erroneously detected bits.

We conduct the signal-to-noise ratio (SNR) test, which serves as an

objective measurement of audio signal quality. The SNR is measured

by comparing the watermarked signal against the original signal. The

SNR is deﬁned as follows:

SNR

Length

where

and are samples of the original and the watermarked audio

signals, respectively.

V. E

XPERIMENTAL RESULTS

In order to illustrate the inaudible and robust nature of our water-

marking scheme, the proposed watermarking algorithm is applied to

two digital audio pieces. All of the audio signals in the test are music

with 16 bit signed mono audio signals sampled at 44.1 kHz. We use a

64 bit binary image as our watermark for all audio signals and a

16-bit Barker code 1111100110101110 as synchronization code. The

Daubechies-1 wavelet basis is used. The smaller level DWT will in-

ﬂuence the robustness of the watermark; and the larger one will cause

large calculation, so 3-level DWT is performed in this test.

In our experiments, the length of each segment of watermark em-

bedding is ﬁxed as 65536 samples. The each synchronization bit is em-

bedded into the mean value of ﬁve samples.

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006 4839

TABLE III

WATERMARK DETECTION

RESULTS FOR

VARIOUS

ATTACKS BER

TABLE IV

WATERMARK DETECTION

RESULTS FOR

VARIOUS

ATTACKS SNR

Two original audio signals in the test are shown in Fig. 3(a) and

(c). The lengths of them are 9.75 and 6.83 s, respectively. The original

watermark image is displayed in Fig. 3(c). The quantization steps of

the two test samples are

and ,

respectively.

In order to illustrate the robust nature of our watermarking scheme,

attacks including MPEG compression, resampling, requantization,

noise adding, and random cropping are used to estimate the robustness

of our scheme.

Table I summarizes the proposed synchronization code detection re-

sults comparing with that of scheme [12]. Where

is able to ﬁnd

synchronization code,

is unable to ﬁnd synchronization code.

Tables II–IV summarize the proposed watermark detection results

comparing with that of scheme [12] against various attacks. In addi-

tion, the detection results of DCT and DWT are given in the Table II,

Table III, and Table IV respectively.

The NC and BER of watermark image and the SNR of digital audio

signal are also given in the Table II, Table III, and Table IV.

VI. C

ONCLUSION

In this correspondence, we propose a novel synchronization dig-

ital audio watermarking algorithm based on the quantization of coef-

ﬁcients. To improve the robustness of audio watermark, the proposed

algorithm is constructed by selecting robust Barker code as synchro-

nization code, embedding synchronization code into the mean value of

several samples and embedding watermark into DWT and DCT coefﬁ-

cients. The experimental results have illustrated the robust nature of our

synchronization embedding scheme and inaudible nature of our water-

marking scheme. In addition, the watermark can be extracted without

the help from the original digital audio signal and can be easily im-

plemented. Despite the success of the proposed method, it also has a

drawback. The proposed method is not very robust against pitch in-

variant time scale modiﬁcation. Further research will focus on over-

coming this problem.

REFERENCES

[1] I. J. Cox and M. L. Miller, “The ﬁrst 50 years of electronic water-

marking,” J. Appl. Signal Process., vol. 56, no. 2, pp. 225–230, 2002.

[2] L. Wei, Y. Yi-Qun, L. Xiao-Qiang, X. Xiang-Yang, and L. Pei-Zhong,

“Overview of digital audio watermarking,” J. Commun., vol. 26, no. 2,

pp. 100–111, 2005.

[3] L. Wen-Nung and C. Li-Chun, “Robust and high-quality time-domain

audio watermarking subject to psycho acoustic masking,” in Proc.

IEEE Int. Symp. Circuits Syst., AZ, 2002, vol. 2, pp. 45–48.

[4] D. Megías, J. Herrera-joancomartí, and J. Minguillón, “A robust audio

watermarking scheme based on MPEG 1 layer III compression,” in

Communications and Multimedia Security—CMS 2003. New York:

Springer-Verlag, 2003, vol. 963, LNCS, pp. 226–238.

[5] H. J. Kim, “Audio watermarking techniques,” in Paciﬁc Rim Workshop

on Digital Steganography, Kyushu Institute of Technology, Kitakyushu,

Japan, Jul. 3–4, 2003.

[6] S. Sheng-He, L. Zhe-Ming, and N. Xia-Mu, Digital Watermarking

Technique. Beijing, China: Science, 2004.

4840 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 12, DECEMBER 2006

[7] D. Kirovski and H. S. Malvar, “Spread spectrum watermarking of audio

signals,” IEEE Trans. Signal Process., vol. 51, no. 4, pp. 1020–1033,

Apr. 2003.

[8] C.-P. Wu, P.-C. Su, and C.-C. Jay Kuo, “Robust audio watermarking for

[9] W. Li and X. Y. Xue, “Audio watermarking based on music content

analysis: Robust against time scale modiﬁcation,” in Proc. 2nd Int.

Workshop on Digital Watermarking, Korea, 2003, pp. 289–300.

[10] H. O. Kim, B. K. Lee, and N. Y. Lee, Wavelet-based audio water-

marking techniques: Robustness and fast synchronization [Online].

Available: http://amath.kaist.ac.kr/research/paper/01–11.pdf

[11] W. Yong, H. Ji-Wu, and Y. Q. Shi, “Meaningful watermarking for

audio with fast resynchronization,” J. Comput. Res. Develop., vol. 40,

no. 20, pp. 215–220, 2003.

[12] J. W. Huang, W. Yong, and Y. Q. Shi, “A blind audio watermarking

algorithm with self-synchronization,” in Proc. IEEE Int. Symp. Circuits

Syst., AZ, 2002, vol. 3, pp. 627–630.

[13] M. Kutter and F. A. P. Petitcolas, “A fair benchmark for image wa-

termarking systems,” in Proc. Electron. Imag., 1999, vol. 3657, pp.

226–239.

On the High-SNR Conditional Maximum-Likelihood

Estimator Full Statistical Characterization

Alexandre Renaux, Student Member, IEEE,

Philippe Forster, Member, IEEE, Eric Chaumette, and

Pascal Larzabal, Member, IEEE

Abstract—In the ﬁeld of asymptotic performance characterization of the

conditional maximum-likelihood (CML) estimator, asymptotic generally

refers to either the number of samples or the signal-to-noise ratio (SNR)

value. The ﬁrst case has been already fully characterized, although the

second case has been only partially investigated. Therefore, this corre-

spondence aims to provide a sound proof of a result, i.e., asymptotic (in

SNR) Gaussianity and efﬁciency of the CML estimator in the multiple

parameters case, generally regarded as trivial but not so far demonstrated.

Index Terms—Array processing, high signal-to-noise ratio (SNR), max-

imum likelihood, statistical efﬁciency.

I. INTRODUCTION

Parameters estimation of multiple signals impinging on an antenna

array is a fundamental problem in signal processing with applications to

radar, sonar, digital communication and many other ﬁelds. A plethora

of algorithms have been proposed in the literature in this sense (see

[1]).

Perhaps the most well-known and frequently used model-based

approach in signal processing is the maximum-likelihood (ML) tech-

nique. When applying the ML technique to a sensors array problem,

Manuscript received September 22, 2005; revised February 22, 2006. The

associate editor coordinating the review of this manuscript and approving it for

publication was Dr. Jean Pierre Delmas. This work has been performed in the

framework of the European Community Contract no. 507325, NEWCOM.

A. Renaux and P. Larzabal are with Ecole Normale Supérieure de Cachan,

SATIE Laboratory, 94235 Cachan Cedex, France (e-mail: renaux@satie.

ens-cachan.fr; larzabal@satie.ens-cachan.fr).

P. Forster is with University Paris 10, GEA Laboratory, 92410 Ville d’Avray,

France (e-mail:philippe.forster@cva.u-paris10.fr).

E. Chaumette is with Thales Naval France, 92200 Bagneux, France

(e-mail:eric.chaumette@fr.thalesgroup.com).

Digital Object Identiﬁer 10.1109/TSP.2006.882072

two main methods have been considered, depending on the model

used for the signal waveforms. When the source signals are modeled

as Gaussian random processes, a stochastic ML (SML) is obtained. If,

on the other hand, when the source signals are modeled as unknown

deterministic quantities, the resulting estimator is referred to as the

conditional ML (CML) estimator (see, e.g., [2], for a review of the

two methods).

Asymptotic statistical performance of these ML methods is an

important ﬁeld of research. For that purpose, the estimation accuracy

is generally investigated by means of the Cramér–Rao bound. Since

two models are used for the different ML methods, two Cramér–Rao

bounds have been derived: the stochastic Cramér–Rao bound when

the source signals are modelled as Gaussian random processes and

the deterministic Cramér–Rao bound when the source signals are

modelled as unknown deterministic quantities (see, e.g., [2], for a

review of these two bounds).

In the array processing context, the term “asymptotic” can be un-

derstood in two different ways: in the number of samples or in the

signal-to-noise ratio (SNR) value. At large number of samples, the

statistical performance of these ML methods has been fully character-

ized (see [3]). Concerning the high SNR context, the nonefﬁciency (in

comparison with the stochastic Cramér–Rao bound) and the non-Gaus-

sianity of the SML have been recently proven in [4]. Concerning the

CML method in the high-SNR framework, it is generally accepted that

this estimator is Gaussian and efﬁcient although, to our knowledge,

there is no sound proof of this result in the literature in the multi-pa-

rameters case. Indeed, to the best of our knowledge, the CML estimator

has been only partially investigated in [5], where the Gaussianity of the

CML estimates is proved in the single-parameter case by the way of a

Gaussian observation model with parameterized mean. Moreover, the

asymptotic efﬁciency of the CML estimator in the high-SNR case has

never been demonstrated. This correspondence aims to complete Kay’s

result, i.e., to establish the Gaussianity and the efﬁciency (in compar-

ison with the deterministic Cramér–Rao bound) of the CML estimator

in the multiple-parameters case. Moreover, we show how these results

still hold for noncircular complex Gaussian noise. Monte Carlo simu-

lations are provided in order to show the accuracy of the analysis.

The notational convention adopted is as follows: italic indicates a

scalar quantity, as in

; lower case boldface indicates a vector quantity,

as in

; upper case boldface indicates a matrix quantity, as in . The

th row and th column element of the matrix

will be denoted by

. is the real part of

, and

is the imaginary part

. The matrix transpose is indicated by a superscript

as in

is the determinant of the square matrix

is the identity matrix

of order

. denotes the expectation operator and the norm. A

sample of a random vector

is denoted

, where belongs to the

event space

. and denote, respectively, the small “o” and

the stochastic small “o” notation.

II. O

BSERVATION

MODEL AND MAXIMUM-LIKELIHOOD ESTIMATOR

A. Observation Model

In the sequel, we consider the following general observation model:

(1)

where

is a real sample vector, ,

is the real vector of unknown deterministic parameters

of interest with true value

, is a real deterministic vector

depending (generally nonlinearly) on

which is assumed to be identi-

ﬁable from

. is the additive noise vector, which is

WavMark: Watermarking for Audio Generation

Preprint

Aug 2023

Recent breakthroughs in zero-shot voice synthesis have enabled imitating a speaker's voice using just a few seconds of recording while maintaining a high level of realism. Alongside its potential benefits, this powerful technology introduces notable risks, including voice fraud and speaker impersonation. Unlike the conventional approach of solely relying on passive methods for detecting synthetic data, watermarking presents a proactive and robust defence mechanism against these looming risks. This paper introduces an innovative audio watermarking framework that encodes up to 32 bits of watermark within a mere 1-second audio snippet. The watermark is imperceptible to human senses and exhibits strong resilience against various attacks. It can serve as an effective identifier for synthesized voices and holds potential for broader applications in audio copyright protection. Moreover, this framework boasts high flexibility, allowing for the combination of multiple watermark segments to achieve heightened robustness and expanded capacity. Utilizing 10 to 20-second audio as the host, our approach demonstrates an average Bit Error Rate (BER) of 0.48\% across ten common attacks, a remarkable reduction of over 2800\% in BER compared to the state-of-the-art watermarking tool. See https://aka.ms/wavmark for demos of our work.

Hiding Full-Color Images into Audio with Visual Enhancement via Residual Networks

Article

Full-text available

Sep 2023

Watermarking is a viable approach for safeguarding the proprietary rights of digital media. This study introduces an innovative fast Fourier transform (FFT)-based phase modulation (PM) scheme that facilitates efficient and effective blind audio watermarking at a remarkable rate of 508.85 numeric values per second while still retaining the original quality. Such a payload capacity makes it possible to embed a full-color image of 64 × 64 pixels within an audio signal of just 24.15 s. To bolster the security of watermark images, we have also implemented the Arnold transform in conjunction with chaotic encryption. Our comprehensive analysis and evaluation confirm that the proposed FFT–PM scheme exhibits exceptional imperceptibility, rendering the hidden watermark virtually undetectable. Additionally, the FFT–PM scheme shows impressive robustness against common signal-processing attacks. To further enhance the visual rendition of the recovered color watermarks, we propose using residual neural networks to perform image denoising and super-resolution reconstruction after retrieving the watermarks. The utilization of the residual networks contributes to noticeable improvements in perceptual quality, resulting in higher levels of zero-normalized cross-correlation in cases where the watermarks are severely damaged.

Feature extraction based pixel segmentation techniques data hiding and data encryption

Article

Full-text available

Jul 2023
MULTIMED TOOLS APPL

Data hiding along with the security plays a vital role in the wireless network. In recent years, object detection, Feature Extraction (FE) and correlation based technique are developed to predict the pixels that is suitable for embedding. However, traditional data hiding method still suffered in the selection of less distortion pixels. To overcome these shortcomings, this method proposes (i) Moving Object (MO) detection, (ii) FE, (iii) data hiding, and (iv) data encryption. Initially, the MO in the video are detected using optical flow and tracking the weight of MO. Next, data hiding features are extracted using Optimization Fruit fly Algorithm (OFA) with Differential Evolution (DE) and Opposition Based Learning (OBL) algorithm. Then, a modified data hiding techniques based on correlation, regression and pixel segmentation with Prediction Error (PE) techniques are used for improving the performance. Finally, the hidden data is encrypted using Binary Tree Structure (BTS) to provide additional layer of security. Experimental result shows the proposed method is superior to other state-of-the-art methods in terms of Peak-Signal-to-Noise Ratio (PSNR), embedding capacity and security analysis.

Survey of imperceptible and robust digital audio watermarking systems

Article

Full-text available

Apr 2024
MULTIMED TOOLS APPL

Robustness, imperceptibility and embedding capacity are the preliminary requirements of any digital audio watermarking technique. However, research has concluded that these requirements are difficult to achieve at the same time. Thus, the watermarking technique is closely dependent on the solution that manages the robustness / imperceptibility trade-off. A large majority of research work has been devoted to improving this trade-off by implementing increasingly advanced techniques. For conciseness and efficiency, the comprehensive review reported in this paper mainly considers the following aspects imperceptibility and robustness among the criteria, as they determine the key performance of most existing audio watermarking systems. In this paper we have introduce the basic concepts of digital audio watermarking, the performance characteristics, and a classification of digital audio watermarking systems according to the extraction/detection process or to human perception. We have also presented various digital audio watermarking applications. Further, we have presented classifications of unintentional and intentional attacks that can be performed on audio watermarking systems and we have highlighted the impact of these attacks on the watermarked audio quality. We have presented two classifications made by researchers, the first one categorizes these attacks into basic and advanced attacks, while the second one classifies the attacks by group according to the process performed on the watermarked audio file. Furthermore, after presenting an overview of the properties of the Human Auditory System (HAS), we have presented several evaluation aspects of audio watermarking systems and we have reviewed various recent robust and imperceptible audio watermarking methods in the spatial, transform and hybrid domains.

A video data hiding technique based on pixel sequence, weight interpolation and quorum function

Article

Full-text available

Mar 2024
MULTIMED TOOLS APPL

The development of multimedia technology has increased the challenge in protecting the information. This paper proposes an effective method for data hiding in videos based on the Pixel Sequence (PS), weight interpolation to protect the secret data from intruders. Initially, the noise is removed from the video with the help of median filter and Moving Objects (MO) inside the frames having different backgrounds are accurately detected using Fast Region-based Convolution Neural Network (FR-CNN). Next, the MO and Non Moving Object (NMO) pixel from each frame are mapped with the adjacent frames to determine the weights applying PS. Then, the cover video is up-scaled with interpolated pixels based on weights of MO and NMO. Finally, the secret data are hidden inside the both interpolated and non-interpolated pixels elevate Embedding Capacity (EC). In interpolated pixels, both Most Significant Bit (MSB) and Least Significant Bit (LSB) of each frames based on their weight are utilized to hide data. Whereas, in non-interpolated pixels, Quorum Function (QF) is used to embed data. Two different data embedding techniques are adopted in each frame to enhance the security. The performance evaluation of the proposed method out-performs the existing methods in terms of precision and recall. Experimental results also show that the stego-video quality is uncompromised and is resistive to security attacks.

Localization of Copy-Move Forgery in Speech Signals Through Watermarking Using DCT-QIM

Article

Full-text available

Jul 2023

Unobtrusive Watermarking for Copyright Preservation and Authenticity Verification in Digital Images Using Hybrid HVS-Based Technique

Conference Paper

Mar 2024

Transient Detection-based Adaptive Audio Watermarking using Attack-Aware Optimization

Article

Dec 2023
DIGIT SIGNAL PROCESS

A novel numeric embedding scheme for hiding full-color images into audio

Conference Paper

Dec 2023

A Dual-Embedded Tamper Detection Framework Based on Block Truncation Coding for Intelligent Multimedia Systems

Article

Jul 2023
INFORM SCIENCES

A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression

Conference Paper

Full-text available

Oct 2003

This paper describes an audio watermarking scheme based on lossy compression. The main idea is taken from an image watermarking approach where the JPEG compression algorithm is used to determine where and how the mark should be placed. Similarly, in the audio scheme suggested in this paper, an MPEG 1 Layer 3 algorithm is chosen for compression to determine the position of the mark bits and, thus, the psychoacoustic masking of the MPEG 1 Layer 3 compression is implicitly used. This methodology provides with a high robustness degree against compression attacks. The suggested scheme is also shown to succeed against most of the StirMark benchmark attacks for audio.

Meaningful watermarking for audio with fast resynchronization

Article

Feb 2003

Audio Watermarking Techniques

Chapter

Feb 2004

Wavelet-based audio watermarking techniques: robustness and fast synchronization

Article

This paper describes a novel technique for embedding watermark bits into digital audio signals. The proposed method is based on the patchwork algorithm on the wavelet domain and does not need the original audio signal in the watermark detection. It uses the wavelet transform generated by the low-pass analysis filter h n whose length is 2 and h 0 = h 1 = 1 to account for a fast synchronization between watermark embedding and detection parts. Several simulation results show that the proposed method is robust against various signal manipulations such as MPEG/Audio layer 3 compression and time scale modification.

Audio watermarking techniques

Article

Hyoung Joong Kim

This paper surveys the audio watermarking schemes. State-of-the-art of the current watermarking schemes and their implementation techniques are briefly sum-marized. They are classified into five categories: quantization scheme, spread-spectrum scheme, two-set scheme, replica scheme, and self-marking scheme. Advantages and disadvantages of each scheme are also discussed. In addition, synchronization schemes are also surveyed.

A blind audio watermarking algorithm with self-synchronization

Conference Paper

Feb 2002

A blind audio information bit hiding algorithm with effective synchronization is proposed in this paper. The algorithm embeds synchronization signals in the time domain to resist the attacks such as cropping while keeping the computation for resynchronization lower. The watermark is placed in blockwise DCT coefficients of the original audio exploiting the HAS (Human Audio System) features. To lower the bit error rate of the watermark in extraction, error-correcting coding is applied. The experimental results show that the hidden imperceptible watermark is robust to the attacks caused by additive noise, MP3 coding, and cropping

Adaptive notch filter by direct frequency estimation

Article

May 1992
SIGNAL PROCESS

In this paper. an adaptive notch filter is investigated for eliminating sinusoids imbedded in noise. The algorithm estimates the sinusoidal frequencies directly from data samples to avoid the variation caused by small perturbation in the estimated coefficients of the filter. And then the notch filter is designed in terms of the estimated frequencies. The stability of the adaptive notch filter can always be ensured without any stability monitoring during adaptive processing. It converges rapidly and attains the Cramer-Rao bound (CRB) for a sufficient large data set. Simulation results are included to demonstrate the performance of the algorithm.

Audio Watermarking Based on Music Content Analysis: Robust against Time Scale Modification

Conference Paper

Oct 2003
Lect Notes Comput Sci

Synchronization attacks like random cropping and time scale modification are crucial to audio watermarking technique. To combat these attacks, a novel content-dependent temporally localized robust audio watermarking method is proposed in this paper. The basic idea is to embed and detect watermark in selected high energy local regions that represent music transition like drum sounds or note onsets. Such regions correspond to music edge and will not be changed much for the purpose of maintaining high auditory quality. In this way, the embedded watermark is expected to escape the damages caused by audio signal processing, random cropping and time scale modification etc, as shown by the experimental results.

The First 50 Years of Electronic Watermarking

Article

Feb 2002
EURASIP J ADV SIG PR

Electronic watermarking can be traced back as far as 1954. The last 10 years has seen considerable interest in digital watermarking, due, in large part, to concerns about illegal piracy of copyrighted content. In this paper, we consider the following questions: is the interest warranted? What are the commercial applications of the technology? What scientific progress has been made in the last 10 years? What are the most exciting areas for research? And where might the next 10 years take us? In our opinion, the interest in watermarking is appropriate. However, we expect that copyright applications will be overshadowed by applications such as broadcast monitoring, authentication, and tracking content distributed within corporations. We further see a variety of applications emerging that add value to media, such as annotation and linking content to the Web. These latter applications may turn out to be the most compelling. Considerable progress has been made toward enabling these applicationsÃ¢Â€Â”perceptual modelling, security threats and countermeasures, and the development of a bag of tricks for efficient implementations. Further progress is needed in methods for handling geometric and temporal distortions. We expect other exciting developments to arise from research in informed watermarking.

Robust and high-quality time-domain audio watermarking subject to psychoacoustic masking

Conference Paper

Jun 2001

We propose in this paper a new method for embedding digital watermarks into audio signals in the time domain. By testing frequency domain characteristics (i.e., the psychoacoustic model) and making appropriate adjustments, our algorithm is capable of preventing watermark disturbance from human perception. A watermark can be extracted without the knowledge of original audio signals. Experiments show that our watermarking scheme leads to results with good audio quality (comparable to the original ones, according to some subjective tests) and is robust (more than 98% of survival rate) to pirate attacks, such as MP3 compression, low-pass filtering, amplitude normalization, DAVAD same-rate reacquisition, and cropping

A Novel Synchronization Invariant Audio Watermarking Scheme Based on DWT and DCT

Abstract and Figures

Recommended publications

An Adaptive Digital Audio Watermarking Algorithm Based on HAS and Chaos Theory

Image digital watermarking algorithm in DCT domain for resisting brightness-and-contrast adjusting a...

A digital audio watermark embedding algorithm with WT and CCT

A New Adaptive Digital Audio Watermarking Based on Support Vector Regression