ArticlePDF Available

Mask-based fingerprinting scheme for digital video broadcasting

November 2006
Multimedia Tools and Applications 31(2):145-170

November 2006
31(2):145-170

DOI:10.1007/s11042-006-0041-3

Source
DBLP

Authors:

Sabu Emmanuel

Kuwait University

Mohan Kankanhalli

National University of Singapore

In this paper we propose a novel method to achieve video fingerprinting and confidentiality in a broadcasting environment. The fingerprinting technique can be used to generate unique copies for individual subscribers and can be used to identify the copyright violator. Thus for tracing the copyright violator, unique copy per subscriber is needed whereas broadcasting requires a single copy to be transmitted to everyone. The proposed method efficiently incorporates both these requirements. In addition to the fingerprinting requirement to trace the subscriber who is violating the copyright, a confidentiality requirement needs to be implemented against the non-subscribers in the broadcast region. The proposed algorithm efficiently combines both the fingerprinting requirement and confidentiality requirement into one single atomic process. The proposed algorithm uses robust invisible watermarking technique for fingerprinting and masking technique for confidentiality. The additional advantage of the proposed scheme is that it also supports MPEG-2 compressed domain processing, which is useful for many broadcasting standards.

The proposed scheme

…

Digital broadcast video watermarking techniques and comparisons

…

Compressed domain mask blending process

…

The computation overhead

…

MPEG-2 decoder

…

Figures - uploaded by Sabu Emmanuel

Content may be subject to copyright.

Content uploaded by Sabu Emmanuel

Content may be subject to copyright.

Mask-based fingerprinting scheme for digital

video broadcasting

Sabu Emmanuel &Mohan S. Kankanhalli

Published online: 6 October 2006

#Springer Science + Business Media, LLC 2006

Abstract In this paper we propose a novel method to achieve video fingerprinting and

confidentiality in a broadcasting environment. The fingerprinting technique can be used to

generate unique copies for individual subscribers and can be used to identify the copyright

violator. Thus for tracing the copyright violator, unique copy per subscriber is needed

whereas broadcasting requires a single copy to be transmitted to everyone. The proposed

method efficiently incorporates both these requirements. In addition to the fingerprinting

requirement to trace the subscriber who is violating the copyright, a confidentiality require-

ment needs to be implemented against the non-subscribers in the broadcast region. The

proposed algorithm efficiently combines both the fingerprinting requirement and con-

fidentiality requirement into one single atomic process. The proposed algorithm uses robust

invisible watermarking technique for fingerprinting and masking technique for confiden-

tiality. The additional advantage of the proposed scheme is that it also supports MPEG-2

compressed domain processing, which is useful for many broadcasting standards.

Keywords Copyright protection .Digital watermarks .Digital video broadcasting .

MPEG-2 .Pay TV.Video on demand .Video security .Video watermarking

1 Introduction

Video broadcasting systems employ digital techniques for processing, storing and

transmission of video data. This is primarily due to ease of handling these functions in

Multimed Tools Appl (2006) 31: 145–170

DOI 10.1007/s11042-006-0041-3

S. Emmanuel (*)

School of Computer Engineering, Nanyang Technological University,

Nanyang Avenue, Singapore 639798, Singapore

e-mail: asemmanuel@ntu.edu.sg

M. S. Kankanhalli

School of Computing, National University of Singapore,

Kent Ridge, Singapore 117543, Singapore

e-mail: mohan@comp.nus.edu.sg

digital domain than in analog domain. Being digital also brings in the advantage of easy

and perfect replication of digital data. One of the major concerns of broadcasters is that,

subscribers can easily make perfect copies of digital video and distribute to non-subscribers

thus violating the copyright. Hence in digital video broadcasts, especially in pay channels,

fingerprinting techniques are to be employed to identify the subscriber who is violating the

broadcast data. In pay channels of digital video broadcasts, the broadcasters require that

only subscribers should be able to view the video clearly. Hence a confidentiality require-

ment should be implemented against non-subscribers. The existing digital video pay

channels employ conditional access system (CAS) for achieving confidentiality against

non-subscribers [12,23,34]. Our proposed scheme supports both the confidentiality

requirement and the copyright violator identification requirement (fingerprinting require-

ment). In our scheme, the copyright violator identification is made possible through the use

of digital watermarking technique and confidentiality against no-subscribers is obtained

through the use of the masking technique.

Several digital watermarking techniques have been devised to address the copyright

concerns of the content owners, broadcasters and sellers. They are devised for variety of

media viz., text, digital audio, digital image and digital video. Various copyright

concerns include copyright violation detection/deterrence, copy protection, data

authentication and data tamper proofing. A digital watermark is an embedded piece of

information either visible or invisible (audible or inaudible, in the case of audio). In the

case of visible watermarking, [2,29,33] the watermarks are embedded in a way that is

perceptible to a human viewer. And hence the watermarks convey an immediate claim of

ownership, providing credit to the owner and also deter copyright violations. In the case of

invisible watermarking, [9,13,18] the watermarks are embedded in an imperceptible

manner. The invisible watermarks can be fragile or robust. Fragile invisible watermarks

[30] attempt to achieve data integrity (tamper proofing). The fragile invisible watermarks

must be invisible to human observers, altered by the application of most common image

processing techniques and should be able to be quickly extracted by authorized persons.

The extracted watermark indicates where the alterations have taken place. However robust

invisible watermarks attempt to achieve copyright violation detection. The desired

properties for robust invisible watermarks are that they must be invisible to a human

observer, the watermark should be detectable/extractable by an authorized person even after

the media object is subjected to common signal processing techniques or after digital to

analog and analog to digital conversions. The watermark must be robust against attacks and

should resolve the rightful ownership problem (for that watermark must be non-invertible

[11]). Many watermarking algorithms have been proposed in the literature [10,14,19,28,

31,36,39,40]. Thus for the purpose of fingerprinting, robust invisible watermarking

techniques are suitable.

Digital video broadcasting uses broadcast type of transmission and therefore a single

copy of video material is transmitted to everyone in the broadcast region. The broadcast

region consists of subscribers and non-subscribers. The pay channels of digital video

broadcasts require that a confidentiality requirement be implemented against the non-

subscribers. Current pay channels employ CAS for this purpose. Primarily the CAS uses

scrambling technique for providing the confidentiality [12,23,29,42]. The broadcaster

scrambles the video data using a control word (CW ) and broadcasts the scrambled video

data. The subscribers use the same CW to descramble the received scrambled video data to

obtain clear video. The current CAS does not implement any fingerprinting technique to trace

the copyright violator. Since the broadcasting requires a single copy to be transmitted, and the

146 Multimed Tools Appl (2006) 31: 145–170

different watermarks), the watermarking for copyright violator identification should be

performed at each subscriber end. This means that the current CASs, at the subscriber end has

to implement the watermarking/fingerprinting process for copyright violator identification in

addition to the descrambling process. It is more secure if the watermarking and descrambling

processes are combined into a single atomic process. But it is hard to combine the current

descrambling process with watermarking process into a single atomic process. Implementing

descrambling and watermarking processes as two separate processes in a single IC chip is not

secure as the control information to control the watermarking process can be selectively

removed from the broadcast video stream while retaining the control information to control

the descrambling process. Thus, one can obtain an unwatermarked clear video for viewing.

Our proposed scheme uses masking technique for confidentiality and robust invisible

watermarking for copyright violator identification. The masking and watermarking tech-

niques are combined into a single atomic process, making it more secure against attacks.

It is even more challenging to support confidentiality and copyright violator identification

in compressed (MPEG-2) domain broadcast. The challenges are due to the quantization,

which is lossy and interframe coding, which makes use of the motion compensated

predictions. However compressed domain processing for compressed domain broadcast is

necessary for the following reasons. Firstly, the decompression, processing and recompres-

sion will have more computational overhead. Secondly, the MPEG-2 being lossy

compression technique it would degrade the video due to the recompression. Our proposed

scheme supports MPEG-2 compressed domain processing. The MPEG-2 intellectual

property management and protection (IPMP) standard [20,24–26] provides place holders

for sending the information whether the packet is scrambled, the information about which

CAS is used, control messages such as encryption control message (ECM) and encryption

management message (EMM), and copyright identifiers. The IPMP specification only

addresses confidentiality and watermarking. But the IPMP-X (intellectual property

management and protection-extension) includes authentication function also along with

the confidentiality and watermarking functions. The standard however does not specify the

algorithms to be used for confidentiality, watermarking or authentication.

We in this paper describe a novel uncompressed (spatial) and compressed domain

(MPEG-2) method to achieve the requirements—confidentiality against the non-subscribers,

protection against the copyright violations, single and one time created copy for trans-

mission for a broadcast scenario. Confidentiality is obtained by using an opaque blending

mask while copyright violator identification is ensured by robust invisible watermarking.

The proposed method efficiently combines the confidentiality requirement and copyright

violator identification requirement (fingerprinting requirement) into a single atomic process.

The method also supports dynamic join and leave. It requires less computational and

network resources from the broadcaster yet it is easy to implement.

In Section 2we discuss the past work in the area of conditional access systems and

broadcast video watermarking. In Section 3we describe the proposed scheme, in Section 4

implementation and results, in Section 5discussion followed by conclusion in Section 6.A

preliminary version of this paper appeared in [15].

2 Past work

In this section we discuss the past work related to the conditional access systems and digital

watermarking of broadcast video.

Multimed Tools Appl (2006) 31: 145–170 147

2.1 Conditional access system (CAS)

A conditional access system (CAS) is the essential system to facilitate charging the

subscriber some subscription fee. Existing conditional access TV modes are:

&pay TV (operated in the subscription mode)

&pay per view (payment for a single program feature as desired, which could be pre-

booked or impulsive)

&pay per view per time (whereby the viewer’s charges will be a function of the time

spent on the channel).

The pay TV CAS already in existence comprises of two subsystems [12,23,29]. One

subsystem implements the scrambling/descrambling system and the other subsystem

implements the access control system. A scrambling system renders the basic service

content i.e., audio and video useless for an unauthorized receiver. The scrambling system

scrambles the basic service content, by suitably modifying the digital video data, or altering

the audio data through digital processing with the help of a control word (CW ). Several

algorithms are available for scrambling purpose such as the data encryption standard (DES)

and the digital video broadcasting common scrambling algorithm (DVB-CSA) [8,23,34].

The DES is widely used by the companies in USA. European legislation mandates the use

of DVB-CSA in all digital TVs in Europe. The standards adopted for digital TV systems

use MPEG-2 for the source coding. The transmission standard used in Europe is digital

video broadcasting (DVB) standard, whereas in USA it is advanced television system

committee (ATSC) standard. In DVB the scrambling can be carried out on the payload of

MPEG-2 transport stream (TS) packet or on the payload of MPEG-2 packetized elementary

stream (PES) packet. ATSC supports scrambling, only on the payload of MPEG-2 TS

packets. The DVB supports the simulcrypt and multicrypt standards whereas ATSC

supports only simulcrypt standard [8,23]. The simulcrypt standard allows the co-existence

of more than one CASs, simultaneously addressing different consumer bases, in one

transmission. This is possible by using the same common scrambling algorithm with same

CW by all CASs. Each CAS constructs their own entitlement control message (ECM)

containing the description about the program and the encrypted CW. Each CAS encrypts the

CW using their own service key (SK). The ECMs of each participating CAS are then

multiplexed along with the MPEG-2 stream. The subscribers would use their respective

CAS’sSK to obtain the CW from the respective CAS’s ECM. The CW is then used to

descramble the received scrambled video to obtain the clear video. In case of multicrypt, no

program is available through more than one CA system.

The entitlement management messages (EMM) are used to convey new entitlements, or

service keys (SKs) to every subscriber. These EMMs are encrypted with a programmer

distribution key (PDK). The EMMs can be transmitted along with the MPEG-2 stream or

through a separate channel. The SKs are specific to each conditional access system whereas

PDKs are specific to each subscriber. The PDKs are distributed to each subscriber by the

program provider/broadcaster. This PDK can be transmitted over the same transmission

channel by encrypting it using a key called issuer key (IK) or by an already available

programmer distribution key. The issuer key IK is never transmitted on the transmission

channel, but is directly distributed to the subscribers during the initialization phase. The IK

is used to load/invalidate the PDK and SK. In fact these keys IK,PDK &SK along with

subscribers entitlements are to be stored in a processor in the subscriber set top box.

Therefore the processor used should be a secure processor [32].

148 Multimed Tools Appl (2006) 31: 145–170

The subscriber set top boxes can be considered as consisting of a host module and a

CAS module. The host module typically contains an MPEG decoder, a tuner/demodulator

for signal reception and input/output section. The conditional access module implements

the descrambling function and management of various keys (IK,PDK &SK), ECM and

EMM. The CAS module can be implemented in a removable secure processor such as a

PCMCIA card or on a smart card [6,12].

The MPEG-2 standard supports control information for intellectual property management

and protection (IPMP), which includes conditional access and copyright management

[20,24–26]. The control information is handed over through IPMP elementary streams

(IPMP-ESs) and IPMP descriptors (IPMP-Ds). But these IPMP messages are not made used

by current conditional access systems for copyright management. The copyright management

function is envisaged to be implemented along side the MPEG-2 decoder on the host module.

Since the host module is not part of the removable secure processor we argue that the

the copyright management. Removal of these control bits is particularly easy due to the fact

that the input to the MPEG-2 decoder and copyright management circuit are descrambled

streams. Thus implementing descrambling and copyright management processes (water-

marking process) as two processes is not secure. It is noted that any watermark bits embedded

for copyright management before MPEG-2 decoding can be thought of as occurred errors in

the MPEG-2 stream. These errors can cause drift problems while decoding. However our

proposed algorithm combines watermarking and descrambling process into a single atomic

process and performs unmasking and watermarking after MPEG-2 decoding. Therefore our

proposed scheme is more secure and exhibits no drift problem. We next discuss the past work

in the area of digital watermarking of broadcast video.

2.2 Digital watermarking of broadcast video

Techniques for hiding watermarks in digital data have grown steadily more sophisticated

and increasingly robust against attacks. Many video researchers have used them to provide

The European Esprit VIVA project [38] uses the watermarking technique for broadcast

monitoring. The broadcast materials are watermarked in the spatial domain prior to

broadcasting and the watermark is detected using the correlation detector. The broadcast

chain consists of, D/A conversion, A/D conversion, MPEG-2 compression, MPEG-2

decompression, D/A conversion and A/D conversion. The watermark can be detected even

though the watermarked video undergoes all these processing. This can be used for

verification of commercial transmissions, assessment of sponsorship effectiveness, sta-

tistical data collection and analysis of broadcast content. But this scheme does not support

individual watermarking for copyright violator identification in a broadcasting environment

also cannot be used for subscription based video broadcasts where a confidentiality

requirement is required against the non-subscribers.

Anderson & Manifavas have proposed the “Chameleon”scheme [1] that allows a single

broadcast ciphertext to be decrypted to be slightly different plaintexts by users with lightly

different keys. As acknowledged by the authors, the watermarking capability of this scheme

is rather limited for MPEG video. This is because the encryption is done after MPEG

encoding and decryption is done before decoding. The chameleon decryption leaves behind

a watermark, which is some bit changes (equivalent to bit errors). The bit error rate lower

than 0.1% is required for acceptable viewing quality. Since the watermark bits are very few,

the number of distinct watermarks are also few. This affects scalability for broadcasting.

Multimed Tools Appl (2006) 31: 145–170 149

Our proposal does masking in compressed domain but unmasks leaving behind a

watermark after MPEG decoding. Therefore the watermarking can be performed to the

just noticeable distortion (JND) level of perceptual quality of the video.

Brown, Perkins & Crowcroft propose the “Watercasting”technique [4] that has each

receiver in a multicast group receive a slightly different version of the multicast data. This

scheme requires that the source watermark, encrypt and transmit ncopies of the data. The

network bandwidth requirement is high as the source transmits ncopies. Each sender must

trust the chain of network routers. A chain of trusted network providers is required. Each of

them has to be willing to reveal their tree topology to each sender. It also does not offer a

solution to distinguish the copies of receivers on the same subnet. Our scheme requires only

one masked copy. Therefore the resource requirements at the source as well as the network

are less compared to the above case. Our method does not ascribe any active role to the

network routers and can distinguish every receiver.

Chu, Qiao & Nahrstedt presents a secure multicast protocol with copyright protection

[7]. The protocol creates two watermarked streams, assigns a unique random binary

sequence to each user and uses this sequence to arbitrate between the two watermarked

streams. The efficiency is hampered by the need to watermark, encrypt and transmit two

copies of the stream and by the significant amount of key message traffic. Also the authors

state that it may be susceptible to collusion attacks.

Briscoe & Fairman present [3] a number of modular mechanisms “Nark”to enable secure

sessions tailored to each individual multicast receivers. In addition to security it also proposes

solution for non-repudiation and copyright protection essentially using the Chameleon

scheme. Other than the limitations of Chameleon, it also requires a tamper resistant processor

at each receiver.

Judge & Ammar propose the “WHIM”scheme [27], which makes use of a hierarchy of

intermediaries for creating and embedding watermark. This scheme suffers from low watermark

embeddability problem. Also each sender must trust the chain of active network intermediaries

and network providers. Since the scheme does not combine the watermarking and decryption

process at the receiver in one single process, the watermarking process can be bypassed.

The method presented by Parviainen and Parnes [35] creates two distinctly watermarked

copies of each media packet. Both copies are then encrypted with two different randomly

generated encryption keys and are then broadcast/multicasted. Any given receiver has

access to the key of only one of the two encrypted packets of one media packet. For a

media with kpackets the method requires 2kkeys and any one receiver possesses kkeys.

But this scheme has only limited collusion resistance as acknowledged by the authors.

In Table 1we summarize the above discussed past works in digital broadcast video

watermarking area.

This paper is an extension of our earlier work presented in [15] and [16]. The main

contributions in this paper are proofs for compressed domain scaling and mask blending. It

also presents a quantitative measure of degradation, computation overhead and compression

overhead of the proposed algorithm.

We next discuss our proposed scheme.

3 The proposed scheme

The proposed scheme intends to provide copyright violator identification and confidenti-

ality in a broadcasting environment. The scheme supports spatial (uncompressed) and

compressed domain processing. We briefly describe our proposed scheme first.

150 Multimed Tools Appl (2006) 31: 145–170

Brief description The broadcaster first creates a masked video by blending/embedding an

opaque mask frame on to the original uncompressed video/compressed video, frame by

frame. Mask blending process serves the purpose of confidentiality. The masked video is

then broadcast. The subscribers unmask the received masked video using an unmasking

frame (customized for each subscriber) leaving behind a residue in the form of a robust

invisible watermark in the unmasked video. In addition to removing the masking effect,

the unmasking process carries out the watermarking for copyright violator identification.

The masking process is done in the transform (compressed) domain at the encoder for x the

compressed domain processing and for spatial (uncompressed) domain the masking is

performed in the spatial domain itself. The unmasking process is done in the spatial domain

for both compressed and uncompressed domain processing by the decoder in the subscriber

set-top boxes. The proposed scheme is depicted in figure 1, In figure 1,xm

nis the nth

masked video frame, xwa

nis the nth watermarked video frame of subscriber A, xwb

nis the nth

watermarked video frame of subscriber B, v

is the unmasking frame for subscriber A and

is the unmasking frame for subscriber B. We will now explain the method in detail.

3.1 Confidentiality requirement

The confidentiality requirement is intended to force the non-subscribers to join the

broadcast and is obtained through the use of a mask blending/embedding procedure, which

Table 1 Digital broadcast video watermarking techniques and comparisons

Confidentiality

violator

identification

Supports

MPEG-2

compression Remarks

VIVA No No Yes –For broadcast monitoring

Chameleon Yes Yes Drift problem –Low watermark embedding capacity

Watercasting Yes Yes Yes –Active role by network routers needed

–Cannot distinguish the copies of

receivers on the same subnet

Chu et al. Yes Yes Yes –Susceptible to collusion attacks

Nark Yes Yes Drift problem –Low watermark embedding capacity

–Tamper resistant processor needed

WHIM Yes Yes Yes –Low watermark embedding capacity

–Active intermediaries required

–Watermarking and decryption processes

are not combined into one single

process

Parviainen

et al.

Yes Yes Ye s –Susceptible to collusion attacks

Proposed

Algorithm

Yes Yes Ye s –High watermark embedding capacity

–No active role to any intermediaries/

routers

–Fingerprints every copy

–Not susceptible to collusion attacks

–No tamper resistance processor needed

–Watermarking and decryption processes

are combined into one single process

Multimed Tools Appl (2006) 31: 145–170 151

can be performed in spatial domain or in compressed domain. The masked video is created

only once and is then broadcast over air or network. We next explain the spatial domain

mask blending followed by compressed domain mask blending.

3.1.1 Spatial domain mask blending process

A video is a set of K×Lsized frames whose nth frame is denoted by x

. Next the broadcaster

constructs a K×Lsized mask frame v. The opaque mask frame’s purpose is to severely

degrade the viewing experience by obscuring the video. The intended effect is similar to

that of video scrambling. The mask frame vis blended onto every frame of the video:

nk;lðÞ¼αxnk;lðÞþβvk;lðÞ8k;lð1Þ

where xm

nis the nth masked video frame and α,βare scaling factors such that α+β=1 and

0<α,β≤1. The scaling factors can be used to adjust the strength of the mask. The Eq. 1

defines the mask blending process in the spatial domain. The masked video is then

broadcast. The receivers who are non-subscribers would only be able to view xm

n, which is

obscured. The requirement here is that the output of the decoder/set top box to the display

should be xm

nfor non-subscribers. Next we explain how we meet this requirement in

MPEG-2 compressed domain.

3.1.2 Compressed domain processing

In our scheme we assume that the MPEG-2 compressed video stream is available for

broadcasting. The decoder/set top box at the receiver decompresses the received MPEG-2

data before sending to the display. For a non-subscriber the data send to the display must be

nthe obscured video.

This can be achieved by appropriately processing the video in the compressed domain at

the encoder before broadcasting. The obscured output at the non-subscriber’s decoder

would be given by the Eq. 1xm

nk;lðÞ¼αb

xnk;lðÞþβvk;lðÞ∀k,l. The term b

xnis used

instead of x

to reflect the loss caused by MPEG-2 compression. We first show how αb

xnis

Fig. 1 The proposed scheme

152 Multimed Tools Appl (2006) 31: 145–170

obtained by processing the quantized error Discrete Cosine Transform (DCT) coefficients

and then the addition of βvto αb

xnis shown. The figure 2depicts the compressed domain

mask blending process. Acronyms in figures 2and 3and their expansions are as follows:

VLC—variable length code, SW—switch, MC—motion compensation, FDCT—forward

discrete cosine transform, IDCT—inverse discrete cosine transform, DC DCT—DC

coefficient of discrete cosine transform [22].

We use the following notation in this section:

a;bhifg½implies that, a;bhirefers to an

8 × 8 pixel block, where “a”is the DC DCT coefficient of the 8 × 8 pixel block, “b”

represents the 63 AC DCT coefficients of the 8× 8 pixel block, a;b

hifg

refers to the set of

all 8 × 8 pixel blocks consisting a frame and

a;b

hifg

½represents the operator “φ”

applied on to all the blocks of the frame.

hifg

½implies that, c

refers to an 8 × 8 pixel

block, where “c”is the 8 × 8 pixel block. c

hifg

refers to the set of all 8 × 8 pixel blocks

consisting a frame and

hifg

½represents the operator “φ”applied on to all the blocks of

the frame.

3.1.2.1 Compressed domain scaling of video frames

In this subsection we describe how we compute the αb

xn. We can observe in figure 2that the

MPEG-2 compressed video stream is first passed to the VLC decoder and demultiplexer

box which outputs error DCT (discrete cosine transform) coefficients, motion vectors,

control parameters. We use these motion vectors and control parameters for scaling the

video frames and also for mask blending. The quantized error DCT coefficients are then

scaled by a factor αas required by Eq. 1. The box ‘SW’in figures 2&3is switch. The next

box, which implements ‘subtract 1αðÞ*128*N

sfrom DC_DCT’where ‘N’comes from the

N×Npoint DCT and here N=8, ‘s’is the intra_DC_differential quantization step-size (a

control parameter, which was used during MPEG-2 encoding of video frames), is necessary

due to the use of fixed prediction value of 128 for the MB_intra blocks at the decoder for

Fig. 2 Compressed domain mask blending process

Multimed Tools Appl (2006) 31: 145–170 153

MPEG-2 video streams. We now prove that this sequence of computation would result in

αb

xn.

Proof The input to each box in figures 2&3is frame by frame but each box implements

block by block processing on the input frame and the blocks consist of 8×8 pixels. As per

the encoding/decoding order the first frame in a group of pictures (GOP) is encoded as an I

frame. All the blocks in an I frame are intra coded. Let E

= 0,1, ..7 f

= 0,1,..7 be

the quantized error DCT coefficients of a block of nth original video frame, which is

encoded as an I frame. The intra coded “MB_intra”blocks use a fixed prediction value of

128 and hence we use the term “error”and the notation E

). Í

Let us assume that we transmit the VLC coded scaled and shifted quantized error DC

DCT coefficient αEn0;0ðÞ

1αðÞ

*128*N



and scaled quantized error AC DCT coefficients

αE

) for f

= 0,..7 f

= 0,..7 except ( f

) = (0,0) of all the blocks in the I frame block

by block i.e., we transmit the VLC code for,

αEn0;0ðÞ

1αðÞ

*128*N



;αEnf1;f2

ðÞfor f1¼0;::7f2¼0;::7 except f1;f2

ðÞ¼0;0ðÞ

  

ð2Þ

Fig. 3 MPEG-2 decoder

154 Multimed Tools Appl (2006) 31: 145–170

Since VLC and Inverse VLC are lossless we have at the output of the inverse quantizer at

the MPEG-2 decoder (figure 3),

Q1αEn0;0ðÞ

1α

ðÞ

*128*N



;αEnf1;f2

ðÞfor f1¼0;::7f2¼0;::7except ðf1;f2Þ¼ 0;0ðÞ

  

 

¼αb

En0;0ðÞ1αðÞ

*128*N;αb

Enf1;f2

ðÞfor f1¼0;::7f2¼0;::7 except ðf1;f2Þ¼ 0;0ðÞ

  

ð3Þ

where b

Enf1;f2

ðÞ

is the inverse quantizer output of E

After the Inverse DCT (IDCT),

IDCT αb

En0;0ðÞ1αðÞ

*128*N;αb

Enf1;f2

ðÞfor f1¼0;::7f2¼0;::7 except ðf1;f2Þ¼ 0;0ðÞ

   



¼αb

xnk;lðÞα*128ðÞ1αðÞ

*128 8k;l

¼αb

xnk;lðÞ128 8k;lð4Þ

The intra coded blocks use a fixed prediction value of 128. The output of the MPEG-2

decoder, after adding the fixed prediction 128 we have,

output ¼αb

xnk;lðÞ128 þ128 8k;l

¼αb

ð5Þ

We see that the MPEG-2 decoder output is scaled by a factor α. This scaled version is used

as a prediction for the subsequent P and B frames of the current GOP (group of pictures).

The above described processing is applied to the intra coded blocks of P and B frames as

well to obtain the scaled version.

Let us assume that the following frame is encoded as P frame. Let it be the (n+ 1)th

frame. P frames consists of inter/intra coded blocks. The intra coded blocks are processed

exactly the same way as described above to obtain the scaled version. But for inter coded

blocks we note that there is a scaled version of the I frame at the MPEG-2 decoder for

prediction. For inter coded blocks we transmit the VLC coded, scaled quantized error (DC

and AC) DCT coefficients αE

n+1

) for f

= 0,1,..7 f

= 0,1,..7. For simplicity of

description we assume the P frame consists only of inter coded blocks. At the MPEG-2

decoder, after the inverse quantizer we have,

Q1αEnþ1f1;f2

ðÞfor f1¼0;1;::7f2¼0;1; :::7

hifg

½

¼αb

Enþ1f1;f2

ðÞfor f1¼0;1;::7f2¼0;1; :::7

DEno

ð6Þ

where b

Enþ1f1;f2

ðÞis the inverse quantizer output of E

n+1

After the Inverse DCT (IDCT), since b

Enþ1f1;f2

ðÞis the motion compensated prediction

error DCT coefficients, we can write,

IDCT αb

Enþ1f1;f2

ðÞfor f1¼0;1;::7f2¼0;1; :::7



¼αb

xnþ1k;l

ðÞ

b

xnkdu;ldv

ðÞðÞ

8k;l

¼αb

xnþ1k;lðÞαb

xnkdu;ldv

ðÞ8k;l

ð7Þ

where d

are motion vectors.

Multimed Tools Appl (2006) 31: 145–170 155

The output of the MPEG-2 decoder, after adding motion compensated prediction we

have,

output ¼αb

xnþ1k;lðÞαb

xnkdu;ldv

ðÞðÞþαb

xnkdu;ldv

ðÞ8k;l

¼αb

xnþ1

ð8Þ

We can see that the output is scaled version of b

xnþ1. The scaled version of I and P frames

are used as a prediction for the next B frames. The B frames can be consisting of inter/intra

coded blocks. The intra coded blocks are processed exactly the same way as that of intra

coded blocks in I frames to obtain the scaled version of the B frames at the output of the

MPEG-2 decoder. Inter coded block would mean the predictions to be forward, backward

or interpolated motion compensated predictions. The inter coded blocks are processed

exactly the same way as that of inter coded blocks in P frames to obtain the scaled version

of the B frames at the output of the MPEG-2 decoder. In case of interpolated motion

compensated predictions, there are two motion vectors but it can be easily shown that the

method for forward/backward motion compensated prediction applies for this case also.

After multiplying the error DCT coefficients by the scaling factor (if intra coded blocks,

the DC is shifted) we round the result to an integer as required by the MPEG-2 as shown in

figure 2. The proof above did not take into account the loss due to rounding. This loss is

small, which can be observed from figure 5b & d, and is inevitable in any case of additive

watermarking (followed by MPEG compression) where scaling is applied before adding the

appropriate strength of watermark. Next, we discuss how we obtain the addition of βvto the

αb

xnas required.

3.1.2.2 Compressed domain mask blending

The appropriately generated mask error DCT coefficients are then added to the rounded

scaled quantized video error DCT coefficients as shown in figure 2(at the broadcaster site).

These values along with the motion vectors, control parameters and other control

information are then VLC encoded and transmitted. We use the same motion vectors,

control parameters and control information obtained at the output of the VLC decoder and

demultiplexer (figure 2) for generating the appropriate mask error DCT coefficients. This

means that the same motion vectors, control parameters and control information which were

used for video encoding is used for generating the appropriate mask error DCT coefficients.

Further, any constraint placed on the control parameters would apply to the video encoding

as well as to the mask.

The αb

xnand βvat the output of the MPEG-2 decoder (subscriber site) can be considered

as result of separate inputs to the MPEG-2 decoder system. One input (due to video) to the

MPEG-2 decoder causes αb

xnand the other input (due to mask) causes the output βv.

Therefore the masking process at the broadcaster site, which causes βvat the output of the

MPEG-2 decoder (subscriber site), can be treated separately (and it needs to be shown, how

we obtain the presence of βvat the output of the MPEG-2 decoder). The input, which

causes αb

xnat the output is decoded properly by the MPEG-2 decoder. The masking process

must be done with the aim that the masked video should be obscured and also when the

MPEG-2 decoder decodes the masked video, it must result in an output frame βvadded to

every frame at the output of the MPEG-2 decoder as seen in figure 3.

So we begin with a mask frame v(luminance component only), which is then scaled

by βto obtain βv. Further all the processing is done block by block on the frame and the

blocks consists of 8×8 pixels. Note that the embedding process is done only for the

luminance blocks. From this βv, we create two scaled versions of this frame, one frame

mask_i which consists of blocks to mask corresponding intra-coded blocks (which have no

156 Multimed Tools Appl (2006) 31: 145–170

motion compensation) of the video and another frame mask_n which consists of blocks to

mask corresponding inter-coded blocks of the video using the following expressions:

mask i¼IDCT βV0;0ðÞ

s;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0; :::7j¼f2¼0; :::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  



ð9Þ

mask n¼IDCT βV0;0ðÞ

Q20;0ðÞ

;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0; :::7j¼f2¼0; :::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  



ð10Þ

Where βV(0,0) is DC DCT coefficient of one block of βvand βV(f

) for f

= 0,..7 f

0,..7 except ( f

) = (0,0) is 63 AC DCT coefficients of the same block of βv. We assume

that the intra and inter quantization matrix values are the same for the AC DCT coefficients

i.e., Intra Qmat(i,j)=Inter Qmat (i,j) for i= 0,..7 j= 0,7 except (i,j) = (0,0) where Intra

Qmat(i,j) is the intra quantization matrix and Inter Qmat(i,j) is the inter quantization matrix.

(This assumption can be relaxed if the watermark used is robust. We use robust spread

spectrum based watermark.) Therefore,

Q1i;jðÞ¼Q2i;jðÞ¼Qi;jðÞfor i¼0;::7j¼0;::7 except i;jðÞ¼0;0ðÞ ð11Þ

where,

Q1i;jðÞ¼

2q scale Intra Qmat i;jðÞ

32 for i¼0;::7j¼0;::7 except i;jðÞ¼0;0ðÞ

ð12Þ

Q2i;jðÞ¼

2q scale Inter Qmat i;jðÞ

32 for i¼0;1;::7j¼0;1;::7ð13Þ

hence,

Qi;jðÞ¼

2q scale Intra or Inter

Qmat i;jðÞ

32 for i¼0;::7j¼0;::7 except i;jðÞ¼0;0ðÞ

ð14Þ

The factor q_scale is the quantization scale factor and is assumed to be constant. The

intra_DC_Differential_quantization step size ‘s’can be 2, 4 or 8. And,

Q20;0ðÞ¼

2q scale Inter Qmat 0;0ðÞ

32 ð15Þ

For masking an intra type of block at (x,y) location in the video frame, we just need to

add the DCT of the prediction error between the block in mask_i (at the same location x,y)

and 128

sas can be seen from figure 2. For masking the forward or backward or the

interpolated types of block at location (x,y) in the video frame, we just add the DCT of the

prediction error between the block in mask_n at the same location and the motion

compensated prediction for the inter coded block. The motion compensated prediction for

inter coded block is divided by Q

(i,j). This will not be lossy as long as ‘s’is divisible by

(0,0). For the skipped blocks, nothing needs to be added, just the macroblock skip

Multimed Tools Appl (2006) 31: 145–170 157

information is to be transmitted. But for the MB_pattern coded blocks, one has to use the

union of the coded block pattern of the video and the masking process.

The above procedure for blending the mask in the compressed domain will result in a

constant βvframe to be at the output of the MPEG-2 decoder:

Proof We discuss the mask blending process, which causes βvat the output of the MPEG-2

decoder. Input to each box in figures 2&3is frame by frame but each box implements the

block by block processing on the input frame and the blocks consists of 8 × 8 pixels. Í

Let us mask an I frame consisting of intra coded blocks. Then the input of the FDCT

(forward DCT) in figure 2is the prediction error frame mask ibu;vðÞ

128

sfor u¼



0;1; :::7v¼0;1; :::7ig.Wheremask_i_b is one block of mask_i. The output of the FDCT is

FDCT mask i bu;vðÞ

128

sfor u¼0;1;:::7v¼0;1;:::7





¼βV0;0ðÞ

s128*N



;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0;::7j¼f2¼0;::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  

ð16Þ

Since VLC and Inverse VLC are lossless we have at the input of the inverse quantizer at the

MPEG-2 decoder (figure 3) is same as that at the output of the FDCT in figure 2.Theoutput

of the inverse quantizer is

Q1βV0;0

ðÞ

s128*N



;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0;::7j¼f2¼0;::7except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  

 

¼βV0;0ðÞ128*NðÞ;βVf

1;f2

ðÞfor f1¼0;::7f2¼0;::7except f1;f2

ðÞ¼0;0ðÞh if g

ð17Þ

Output of the Inverse DCT

IDCT βV0;0ðÞ128*NðÞ;βVf

1;f2

ðÞfor f1¼0;::7f2¼0;::7 except f1;f2

ðÞ¼0;0ðÞ

hif g

½ 

¼βvk;lðÞ128 8k;l

ð18Þ

Since intra coded blocks use a fixed prediction value of 128. The output of the MPEG-2

decoder, after adding the fixed prediction 128 we have,

Output ¼βvk;lðÞ128ðÞþ128 8k;l

¼βvð19Þ

This βvis used as a prediction for the next P and B frames. The above described processing is

applied to the intra coded blocks of P and B frames as well.

Let us consider the masking of P or B frame. The P or B frame consists of inter/intra

coded blocks. The intra coded blocks are processed exactly the same way as described

above. Inter coded block would mean the predictions to be forward, backward or

interpolated motion compensated predictions. All these would refer to the picture stores

containing previous and future stores for prediction. But both the previous and future

picture stores contain the same βvframe. Therefore the analysis for forward, backward or

interpolated prediction is the same. In case of interpolated motion compensated predictions,

there are two motion vectors but it can be easily shown that the method for forward/

158 Multimed Tools Appl (2006) 31: 145–170

backward motion compensated prediction applies for this case also. For simplicity of

description we assume the P or B frames consists only of inter coded blocks (with forward/

backward motion compensated predictions).

For masking the inter coded blocks we use mask_n instead of mask_i. Then the input of

the FDCT in figure 2is the prediction error frame, fhmask nbu;vðÞ

βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼

u¼0;:::7j¼v¼0;:::7ig Where, mask_n_b refers to one block of mask_n,βv_b refers to

one block of βvand d

are motion vectors. The output of the FDCT is

FDCT mask n bu;vðÞ

βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼u¼0;1;:::7j¼v¼0;1;:::7

  

¼FDCT mask nbu;vðÞfor u¼0;:::7v¼0;:::7

hifg

½

FDCT βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼u¼0;:::7j¼v¼0;:::7



¼βV0;0ðÞ

Q20;0ðÞ

;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0;::7j¼f2¼0;::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  

FDCT βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼u¼0; :::7j¼v¼0; :::7



ð20Þ

Since VLC and Inverse VLC are lossless we have at the input of the inverse quantizer at the

MPEG-2 decoder (figure 3) is same as that at the output of the FDCT in figure 2. The

output of the inverse quantizer is

Q1

βV0;0ðÞ

Q20;0ðÞ

;βVf

1;f2

ðÞ

Qi;jðÞ for i¼f1¼0;::7j¼f2¼0;::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  

FDCT βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼u¼0; :::7j¼v¼0; :::7



¼Q1βV0;0ðÞ

Q20;0

ðÞ

;βVf

1;f2

ðÞ

Qi;j

ðÞfor i¼f1¼0;::7j¼f2¼0;::7 except i;jðÞ¼f1;f2

ðÞ¼0;0ðÞ

  

 

Q1FDCT βvbudu;vdv

ðÞ

Q2i;jðÞ for i¼u¼0; :::7j¼v¼0; :::7



¼βV0;0ðÞ;βVf

1;f2

ðÞfor f1¼0;::7f2¼0;::7 except f1;f2

ðÞ¼0;0ðÞ

hifg

FDCT βvbudu;vdv

ðÞfor u¼0; :::7v¼0; :::7

hifg

½

ð21Þ

Output of the Inverse DCT,

IDCT βV0;0ðÞ;βVf

1;f2

ðÞfor f1¼0;::7f2¼0;::7 except f1;f2

ðÞ¼0;0ðÞ

hifg

FDCT βvbudu;vdv

ðÞfor u¼0; :::7v¼0; :::7

hifg

½



¼βvk;lðÞIDCT FDCT βvbudu;vdv

ðÞfor u¼0; :::7v¼0; :::7

hifg

½½8k;l

¼βvk;lðÞβvkdu;ldv

ðÞ 8k;lð22Þ

Multimed Tools Appl (2006) 31: 145–170 159

Output of the MPEG-2 decoder after adding the fixed prediction the motion compensated

prediction

Output ¼βvk;l

ðÞ

βvkdu;ldv

ðÞðÞ

þβvkdu;ldv

ðÞ

8k;l¼βvð23Þ

The presence of βvwill cause the video to be obscured. Subscribers would be provided

with an unmasking frame to view the video clearly. Unmasking is done in the spatial

domain (i.e., after the decoding) as explained in Section 3.2.2. We will now explain the

methods of obtaining copyright violator identification.

3.2 Copyright violator identification

The copyright violator identification property is obtained through the use of a robust

invisible watermark created specifically for each subscriber by the broadcaster and is kept

secret by the broadcaster. The unmasking frame carries this robust invisible watermark to

the subscriber set top box and gets embedded in the video during unmasking process. The

unmasking process is a single atomic process, which combines the watermarking for

explain the unmasking frame construction and unmasking process.

3.2.1 Unmasking frame construction

Whenever a new subscriber wants to subscribe to the broadcast, the subscriber sends a join

request containing verifiable subscriber’s identity and makes arrangements to pay the

necessary subscription fee. The broadcaster then verifies the subscriber’s identity and then

creates a robust invisible watermark W

specifically for the subscriber ‘R

’. The robust

invisible watermark construction is explained in Section 3.3. The robust invisible

watermark is kept secret by the broadcaster. The broadcaster then creates an unmasking

frame v

for subscriber ‘R

’(in figure 1we use v

and v

instead of v

. The subscripts ‘a’and

‘b’indicate that the unmasking frames v

and v

are for subscriber A and subscriber B,

respectively):

vik;lðÞ¼βvk;lðÞ

Wbi k;lðÞ8k;lð24Þ

The broadcaster then transmits v

to the subscriber through a secure (encrypted) channel. the

broadcaster also stores the subscriber’s identity and watermark W

In a table named

‘SubscriberInfoTable’.

3.2.2 Unmasking process

To view the unobscured channel broadcast, the subscriber R

’s set top box, performs this

computation:

xwi

nk;lðÞ¼xm

nk;lðÞvik;lðÞ



ðÞ8k;lð25Þ

where xwi

nis the watermarked video frame for ‘R

’. Eq. 25, defines the unmasking process.

in the case of compressed domain broadcasts, the output of the mpeg-2 decoder in the set

top box is the masked video xm

n. The unmasking is applied after the mpeg-2 decoding.

160 Multimed Tools Appl (2006) 31: 145–170

Notice that xwi

ncontains the robust invisible watermark W

left behind as a residue i.e.,

xwi

nk;lðÞ¼xnk;lðÞþWbi k;lðÞ8k;lð26Þ

Thus the unmasking process defined by Eq. 25 is a single atomic process, which combines the

watermarking for copyright violator identification and unmasking for confidentiality

requirement. in the case of compressed domain broadcasts the Eq. 26 will contain b

xninstead

of x

to reflect the fact that the mpeg-2 is a lossy compression technique. the unmasking

frame v

is for the exclusive use of subscriber ‘R

’. If subscriber ‘R

’leaks/sells v

or xwi

nto a

non-subscriber, the illegal video would contain R

’s watermark. thus, any piracy done by

‘R

’is easily detected because of the invisible watermark present in the pirated video.

We use the spread spectrum watermark proposed by Hartung et. al. [18] in our

implementation. However one could use any robust invisible watermark. We use the

correlation receiver technique for the watermark detection [18]. We will now provide details

about the watermark construction and detection procedures.

3.3 Watermark construction

We construct a watermark frame [18] of dimension Kpixels by Lpixels, same as that of the

video frames. we consider the watermark frame as a 1-dimensional signal acquired by

raster-scanning (scanning left to right and then top to bottom). Assume that the information

to be embedded consists of bits having values {−1,1}. let us create a sequence a

out of it

(the watermark information to be embedded).

Let

aj;aj21;1

;j¼0;1; ::::Nð27Þ

be a sequence of bits, which is then spread using the chip rate C

to obtain the spread

sequence b

. The C

and Nare selected in such a way that C

×N=K×L, the frame

dimension.

bi¼ajjCri<jþ1ðÞCr;8jð28Þ

The spreading provides redundancy and improves the robustness to geometrical attacks

such as cropping. the spread sequence is then multiplied with a pseudo random noise

sequence p

where p

∈{−1,1}. It is then amplified by a scaling factor ‘k’(a positive number,

selected in such a way that watermark still remains invisible in the watermarked frames, and is

also detectable) to get the watermark.

wi¼

bipi8ið29Þ

The watermark w

could be arranged as a frame (dimension K×L), which is the watermark

frame.

3.4 Watermark detection

The detection of the hidden information a

is done by employing the correlation receiver

[18]. The correlation receiver does not require the original unwatermarked video signal for

the detection. to detect a

we multiply the watermarked video xw

iby the same pseudo

random noise sequence p

that was used for watermark construction followed by a

Multimed Tools Appl (2006) 31: 145–170 161

summation over the window for each embedded information, yielding the correlation s

The sign (s

) is the a

sj¼X

jþ1ðÞcr1

i¼jcr

pixw

i¼X

jþ1ðÞcr1

i¼jcr

pixiþX

jþ1ðÞcr1

i¼jcr

piwi¼X

jþ1ðÞcr1

i¼jcr

pixiþX

jþ1ðÞcr1

i¼jcr

bið30Þ

where x

is the original unwatermarked video. The first term in Eq. 30 is zero if p

and x

uncorrelated. However this is not always the case in real. So to obtain a better result we first

prefilter the watermarked video xw

iand remove most of the unwatermarked video content.

But if we have the original unwatermarked video we just need to subtract the original

unwatermarked video from the watermarked video xw

i. Assuming that the first term in

Eq. 30 is almost zero,

sj¼X

jþ1ðÞcr1

i¼jcr

bi¼X

jþ1ðÞcr1

i¼jcr

aj¼

ajX

jþ1ðÞcr1

i¼jcr

i¼aj

pð31Þ

Since kand A2

pare positive, we have

sign sj

¼sign aj



¼ajð32Þ

Next we explain the protocol used to identify the copyright violator from the unauthorized

copy found with the non-subscriber.

3.5 Copyright violator identification protocol

Suppose a legal recipient makes multiple copies of the unmasked watermarked video xwi

or the unmasking frame v

and redistributes to non-subscribers. The broadcaster can identify

the subscriber who has redistributed the video by detecting the watermark W

present in

the unauthorized copy found with the non-subscriber. For this purpose the broadcaster picks

up one by one the watermarks created by him, the W

s from the SubscriberInfoTable and

then correlates it with the copy found with the non-subscriber. The highest correlation value

with certain minimum threshold value is used to identify the watermark W

present in the

copy of the video. If the correlation value is smaller than the minimum threshold we declare

that the watermark is not found. Once the watermark W

is identified it could be obtained

from the SubscriberInfoTable the identity of the subscriber R

who is the legal recipient.

The broadcaster can then initiate necessary legal measures and prove to the judge the

existence of W

in the unauthorized copy.

4 Implementation and results

We have implemented our technique for spatial and MPEG-2 compressed domain, and

tested it on several video clips. However we show here only the results of compressed

domain as the results of spatial domain is similar. We apply the proposed scheme only on

the luminance channel of the video frames. However it is possible to implement it on the

162 Multimed Tools Appl (2006) 31: 145–170

chrominance channels as well. We have worked with various sets of scaling factors α,β

and also various mask images. The higher the βvalue is set, the more is the obscurity and

mask frames with high saturation values will also have more obscurity. The unmasked

watermarked video frames for ‘R

’contains the invisible watermark for ‘R

’.The

watermarks in these unmasked watermarked video frames can be detected using the

correlation receiver. Figure 4depicts the full masking and unmasking results for one frame

of one of the test videos with frame dimension 720 × 576.

4.1 Quantitative measure of degradation

To evaluate the degradation caused and to evaluate the performance of the proposed scheme

a quantitative measurement is required. For this purpose the signal to noise ratio in dB

(SNR

indB

) is used and is defined by

SNRindB ¼10 log10

nk;lðÞ



nk;lðÞxwi

nk;lðÞ



8k;lð33Þ

Where E{.} is the expectation operator. The b

xnis used instead of x

to reflect the loss

caused due to MPEG-2 compression. This quantitative value however does not truly reflect

the perceptual quality of the unmasked watermarked video. The numbers give us a

quantitative measure of the degradation. The signal to noise ratio has been calculated for

several video clips of MPEG-7 video categories and is plotted in figure 5. The MPEG-7

video set consists of ten categories with 30 items. But our test covers only eight categories,

which have 27 items, 12:50:47 h duration and 1,210,642 number of frames. The video clips

used are with frame dimension 352 × 288 and the transmission rate used is 5 Mbits/s.

The signal to noise ratio calculation is performed frame wise and the SNRmax refers to

the highest signal to noise ratio, SNRavg refers to the average of the signal to noise ratios

and SNRmin is the minimum signal to noise ratio. The signal to noise ratio in dB with

watermark is plotted in figure 5c and that without watermark in figure 5d. It can be

observed that the degradation is very small. The degradation has two components one due

to the rounding operation and the other due to the watermark. The watermark amplitudes

Fig. 4 Masking and unmasking results

Multimed Tools Appl (2006) 31: 145–170 163

used are +2, −2 (i.e., kused is 2). This degradation is not visible in the unmasked

watermarked video in any of the clips and is perceptually similar to the original video. The

amplitude of the watermark should be selected in such a way that the degradation should

not be visible. The degradation due to other processing errors with zero watermark strength

shows that the processing degradation is negligible. The figure 5a & b are the plots of noise

power with and without watermark respectively. The NPWRmax, NPWRavg and

NPWRmin refer to maximum, average and minimum noise powers respectively.

4.2 Computation overhead

The computation overhead of the mask blending process in comparison to the MPEG-2

compression was investigated. A macroblock size of 16 pixels row by 16 pixels column

(consisting of four blocks) was taken as a unit of our investigation. Except the motion

estimation all other processing are done block (eight pixels raw by eight pixels column)

wise.

The following assumptions were made while finding the computational costs. We

assume that the two dimensional forward discrete cosine transform (FDCT) and the inverse

discrete cosine transform (IDCT) are implemented using the Haque’s fast block matrix

decomposed algorithm [17,37]. This algorithm requires for a block size of Npixels raw by

Npixels column, (3/4) N

log

Nreal multiplications and 3N

log

N−2N

+2Nreal

additions. The inter coded macroblocks can be forward, backward or interpolated coded.

We assume the interpolation function used is pixel averaging function. For motion vector

estimation, we assume that a fast six step algorithm is used and further we assume the use

Fig. 5 Plots of noise power and signal to noise ratio

164 Multimed Tools Appl (2006) 31: 145–170

of sum of absolute difference (SAD) as a measure of best match. The SAD is defined for a

16 pixel row by 16 pixel column macroblock as follows:

SAD k;lðÞ¼

i¼0X

j¼0

xnkþi;lþjðÞxmkþdk þi;lþdl þjðÞ

ð34Þ

where x

(k+i,l+j) is pixel intensity value at macroblock position (k,l) in source picture n

and x

(k+dk +i,l+dl +j) is the pixel intensity at macroblock position (k+dk,l+dl)in

reference picture m. The 16 × 16 array in picture mis displaced horizontally by dk and

vertically by dl. By convention (k,l) refers to the upper left corner of the macroblock,

indices (i,j) refer to values to the right and down, and displacements (dk,dl) are positive

when to the right and down. For a six step algorithm, in the first step we evaluate SAD at

nine displacements, and subsequent four steps we calculate SAD at eight displacements

followed by the last step for 0.5 pixel precision at four displacements. The last step SAD is

computed between interpolated values of the macroblock. We ignore the computation cost

of this interpolation as is not substantial. We also do not consider the computation cost

involved in the variable length coding which is a part of MPEG-2 compression. With these

assumptions the computational costs of MPEG-2 compression and mask blending in

MPEG-2 compressed domain are found and are depicted in Table 2.

We see that when compared to the computation requirement of MPEG-2 compression,

the mask blending process require less computation. The main contributor of computation

to the MPEG-2 compression is the motion estimation, which is not present in the mask

blending process. The actual computation cost of MPEG-2 compression would have been

more had we considered the computation cost due to variable length coding and also the

motion estimation for 0.5 pixel accuracy. The mask blending is done only once, irrespective

of the number of subscribers.

4.3 Compression overhead

In case of raw video broadcasting, the mask blending does not increase the message size i.e.,

the original video and the masked video are of the same size. But in case of compressed

domain processing, the compression ratio would be affected as seen from figure 6. The

compression ratio is defined as follows:

Compression Ratio ¼Size Of Compressed Masked Video

Size Of Compressed Original Video ð35Þ

Table 2 The computation overhead

Macroblock

MPEG-2 compression Mask blending process

Multiplications/

divisions

Additions/

subtractions

Multiplications/

divisions

Additions/

subtractions

Intra coded I & P 1,664 3,968 832 2,368

Intra coded B 832 2,112 832 2,368

Forward coded P 1,664 26,963 1,088 2,368

Forward or backward

coded B

832 25,107 1,088 2,368

Interpolated coded B 1,920 50,470 1,344 2,624

Multimed Tools Appl (2006) 31: 145–170 165

For various video categories we find out the compression ratio defined by the Eq. 35 and is

plotted in figure 6. We see that there is a small compression overhead ranging from 0.001 to

0.4%. The compression overhead is found to be increased when the mask strength is

increased from β= 0.3 to β= 0.7.

5 Discussion

We now discuss in detail some of the salient aspects of our scheme.

Key Management The unmasking frame acts as access control key in our proposed scheme.

For a frame dimension of 352 × 288, the size of the unmasking frame in bytes is 101,376 bytes

(99 kB). In order to have better security and also to support dynamic leave feature, the mask

could be changed frequently. A receiver can join the broadcast anytime he wishes by sending

the join request to the broadcaster. Therefore the proposed scheme supports dynamic join and

leave feature. The key revocation to close off a subscriber who is no longer paying can be

done at the time of mask change. Whenever the mask is changed the corresponding

unmasking frame for each subscriber is to be generated and transmitted. The unmasking

frame is handed over to the subscriber through a secure channel. Assuming the mask frame

validity period 30 min, video frame dimension 352 × 288, video frame rate 25 frames/s and

the typical MPEG-2 compression ratio 1:10, the size of the MPEG-2 video in bytes is

(352 × 288) × 25 × 60 × 30/10 = 456,192,000 bytes. Therefore, the key message size is just

0.022% of the 30 min MPEG-2 video size, which is small. The control word size of the

current CAS system is in bits (168 for ATSC), which is very small in comparison to the

unmasking frame size. But the current CAS systems do not support fingerprinting. It is hard

to integrate the fingerprinting and the current descrambling process of CAS into a single

atomic process as discussed in Section 2.1. In case of digital cinema broadcast to the cinema

halls, the fingerprinting requirement is as important as that of CAS due to the high value of

new digital movie releases.

Security Concerns The security of the watermarks has been well studied [19,21]. We, in

our scheme, make use of robust invisible watermark by Hartung et al. [18], which is a

spread spectrum based watermark. Therefore the security of the watermark in our scheme is

Compression Overhead

0.999

1.001

1.002

1.003

1.004

News Sports Document ary Music Home Movie Commercial s Animat ion

Video Categories

Compression Ratio

α = 0.7, β = 0.3 α = 0.5, β = 0.5 α = 0.3, β = 0.7

Fig. 6 Plot of compression overhead with (α= 0.7, β= 0.3), (α= 0.5, β= 0.5) and (α= 0.3, β= 0.7)

166 Multimed Tools Appl (2006) 31: 145–170

similar to that of [18]. There are many remedies and counter attacks presented in [21]to

make the spread spectrum based watermarks more resistant against attacks.

Suppose ‘n’subscribers collude to make an unwatermarked video by averaging the

corresponding frames of each subscriber’s video, or they collude by averaging the

unmasking frames to make an unmasking frame which does not carry watermark (more

precisely inverse watermark) information. Boneh & Shaw [5,21] have shown, how to

construct watermark signals to defeat this kind of averaging collusion attacks. In the event

of collusion, Boneh & Shaw’s schemes would point out the colluding parties. There is

another kind of collusion where the colluding parties can assemble video by randomly

selecting frames from each of their watermarked videos. But since watermarking is frame

wise, this would reveal the colluding parties. The subscribers can assemble an unmasking

frame strip by strip, by switching between different unmasking frames or can create an

unmasked video, strip by strip from each corresponding frame, by switching between

different unmasked watermarked video. In either case to defeat this kind of collusion the

watermark signal/information bits have to be pseudo-randomly distributed to pixels using

pseudo random noise sequence p

[21].

The inversion/ambiguity attack can be carried out for false ownership claim of

watermarked data. The attacker first guesses a false watermark and then derives a false

original data from the watermarked data using the guessed watermark. By producing the

guessed watermark and the derived false original the attacker then claims ownership of the

watermarked data. This attack can be defeated by designing a non-invertible watermark

[21]. One of the ways of designing non-invertible watermark is by using the cryptographi-

cally secure time stamps provided by trusted third parties and encoded in the watermark

[41]. Another way is by making the watermark to be dependent on the original data in a one

way fashion (for example using hash function) [11,41].

Another attack is to estimate the mask frame from the masked video frames or

watermark frame from the unmasked watermarked video frames. In order to make the

estimation of mask frame or watermark frame using Wiener estimator more difficult, the

power spectrum of mask frame and watermark frame should be a scaled version of the video

signal power spectrum [21]. It can be shown that it is hard to separate mask from masked

video through brute force technique as it requires enormous amount of computing power

coupled with human interaction to identify the correct/acceptable original video frame and

mask frame. The broadcaster must make sure that the mask frame should not be made

known (either through applying mask on a blank frame or through some other means) or

should not be easily guessed by the receivers (subscribers & non-subscribers). To make the

proposed method more robust against the attacks, one needs to design a porous mask and

robust invisible watermark, which means we mask and watermark only some random pixel

locations (substantially large in number in order to have the opacity effect) or change the

mask often or use multiple masks and use them interchangeably.

Other Advantages In the proposed scheme the operations performed for masking and

unmasking are simple additions. Therefore the scheme is not compute-intensive. It can be

easily seen that all operations are O(n) where ‘n’is the size of the frame. The masked video

frames are created only once for a video but the unmasking frames and the robust invisible

watermark are computed whenever a new subscriber subscribes to the service. The

unmasking frames are also computed whenever the mask frames are changed (for better

security) but the robust invisible watermark need not be created then. The unmasking frame

is handed over to the subscriber through a secure channel and becomes the access control

mechanism. The proposed scheme combines the unmasking and watermarking process into

Multimed Tools Appl (2006) 31: 145–170 167

one single atomic process, making the attacker to put more effort to get away with an

unwatermarked video.

The proposed scheme supports MPEG-2 compressed domain processing. The masking for

confidentiality requirement is carried out on quantized error DCT coefficients of MPEG-2

stream. Therefore, in our proposed scheme, in order to mask an already MPEG-2 compressed

video, the broadcaster needs only partial decoding (run length decoding) to be done. After

masking, the masked error DCT coefficients are VLC coded transmitted. Though the masking

is carried out on quantized error DCT coefficients at the broadcaster site, it is done in such a

way that unmasking can be done after MPEG-2 decoding (uncompressed domain) at the

subscriber site. The unmasking of masked video using the unmasking frame (which carries

watermark information) results in watermarking of video. Since, the watermarking is carried

out after MPEG-2 decoding there is no drifting problem during decoding.

6 Conclusion

We have developed a scheme to simultaneously obtain confidentiality and fingerprinting for

spatial and MPEG-2 compressed broadcast video. Broadcasting demands a single copy for

transmission where as fingerprinting demands several individually watermarked copies.

The proposed scheme satisfies both the demands. Confidentiality is achieved by blending

an additive mask frame. By selecting a proper mask and by controlling the masking

operation one could achieve a transparency continuously ranging from fully transparent to

absolutely opaque. Fingerprinting is obtained through additive robust invisible water-

marking. The proposed scheme combines the unmasking and watermarking process into

one single process, which makes the attacker to put more effort to get away with an

unwatermarked copy. The proposed scheme supports dynamic join and leave, requires low

resources in terms of computing power and bandwidth and not complex in terms of

implementation. Our future directions are to extend the scheme to audio stream of the

broadcasts.

References

1. Anderson R, Manifavas C (1997, January) Chameleon—a new kind of stream cipher. Encryption in Haifa

2. Braudaway GW, Magerlein KA, Mintzer F (1997) Protecting publicly available images with a visible

image watermark. Int Conf Image Proc 1:524 –527

3. Briscoe B, Fairman I (1999, June) Nark: receiver based multicast key management and non-repudiation.

BT Technical Report

4. Brown I, Crowcroft J, Perkins C (1999, November) Watercasting: distributed watermarking of multicast

media. Networked group communications. Italy, pp 286 –300

5. Boneh D, Shaw J (1998, September) Collusion-secure fingerprinting for digital data. IEEE Trans Inf

Theory 44:1897–1905

6. Buer M, Wallace J (1996, August) Integrated security for digital video broadcast. IEEE Trans Consum

Electron 42(3):500 –503

7. Chu HH, Qiao L, Nahrstedt K (1999, January) A secure multicast protocol with copyright protection.

Proceedings of SPIE symposium on electronic imaging: science and technology

8. Clayson PL, Dallard NS (1997, September) Systems issues in the implementation of DVB simulcrypt

conditional access. Int Broadcast Conv 470 –475

9. Cox IJ, Killian J, Leighton T, Shamoon T (1997, December) Secured spread spectrum watermarking for

multimedia. IEEE Trans Image Process 6(12):1673 –1687

10. Cox IJ, Miller ML, Bloom JA Digital watermarking. Morgan Kaufmann Publishers, Inc., San Francisco,

2001

168 Multimed Tools Appl (2006) 31: 145–170

11. Craver S, Memon N, Yeo B, Yeung MM (1998, May) Resolving rightful ownerships with invisible

watermarking techniques: limitations, attacks and implications. IEEE J Sel Areas Commun 16(4):573–586

12. Cutts DJ (1997, February) DVB conditional access. Electron Commun Eng J 9(1):21–27

13. Dittman J, Stabenau M, Steinmetz R (1998) Robust MPEG video watermarking technologies. ACM

International Multimedia Conference, pp 71–80

14. Doërr G, Dugelay J-L (2003) A guide tour of video watermarking. Signal Process, Image Commun 18

(4):263–282

15. Emmanuel S, Kankanhalli MS (2001, August) Copyright protection for MPEG-2 compressed broadcast

video. Proceedings of the IEEE international conference on multimedia and expo (ICME 2001), Tokyo

16. Emmanuel S, Kankanhalli MS (2003) A digital rights management scheme for broadcast video.

Multimedia Systems Journal 8(6):444 –458

17. Haque MA (1985, December) A two-dimensional fast cosine transform. IEEE Trans Acoust Speech

Signal Process 33(6):1532–1539

18. Hartung F, Girod B (1998, May) Watermarking of uncompressed and compressed video. Signal Process

66(3):283 –301

19. Hartung F, Kutter M (1999, July) Multimedia watermarking techniques. Proc IEEE 87(7):1079–1107

20. Hartung F, Ramme F (2000, November) Digital rights management and watermarking of multimedia

content for M-commerce applications. IEEE Commun Mag 38(11):78–84

21. Girod B, Hartung F, Su JK (1999, January) Spread spectrum watermarking: malicious attacks and

counterattacks. Proceedings of the SPIE, Electronic Imaging 99, San Jose, USA, vol 3657, pp 147–158

22. Haskell BG, Netravali AN, Puri A (1997) Digital video: an introduction to MPEG-2. International

Thomson, Chapman & Hall

23. http://www.atsc.org/standards.html. ATSC Standard A/70: Conditional Access System for Terrestrial

Broadcast with Amendment

24. http://www.chiariglione.org/mpeg/index.htm. The MPEG Home Page. Text of ISO/IEC 13818-1

25. ISO/IEC 13818-1: Generic coding of moving pictures and associated audio: systems. (MPEG-2 Systems)

26. ISO/IEC 14496-1: Coding of audiovisual objects: systems. (MPEG-4 Systems)

27. Ammar M, Judge P (2000, June) WHIM: watermarking multicast video with a hierarchy of

intermediaries. NOSSDAV, North Carolina

28. Kalker T (1999) System issues in digital image and video watermarking for copyright protection. IEEE

Int Conf Multimedia Comput Syst 1:562–567

29. Kankanhalli MS, Rajmohan, Ramakrishnan KR (1999) Adaptive visible watermarking of images. IEEE

Int Conf Multimedia Comput Syst 1:568–573

30. Kundur D, Hatzinakos D (1999, July) Digital watermarking for telltale tamper proofing and

authentication. Proc IEEE 87(7):1167–1180

31. Linnartz JP, Depovere G, Kalker T Philips electronics response to call for proposals issued by the data

hiding subGroup copy protection technical working group

32. Macq BM, Quisquater JJ (1995, June) Cryptology for digital TV broadcasting. Proc IEEE 83

(6):944 –957

33. Meng J, Chang SF (1998) Embedding visible video watermarks in the compressed domain. Int Conf

Image Proc 1:474 –477

34. Mooij W (1997, September) Advances in conditional access technology. Int Broadcast Conv,

pp 461–464

35. Parviainen R, Parnes P (2001, May) Large scale distributed watermarking of multicast media through

encryption. Proceedings of the CMS 2001, Germany

36. Piva A, Barni M, Bartolini F, Cappellini V (1997) DCT-based watermark recovering without resorting to

the uncorrupted original image. Int Conf Image Proc 1:520 –523

37. Rao KR, Yip P (1990) Discrete cosine transform algorithms advantages applications. Academic

38. Strycker LD, Termont P, Vandewege J, Haitsma J, Kalker A, Maes M, Depovere G (2000, August)

Implementation of a real-time digital watermarking process for broadcast monitoring on a TriMedia

VLIW processor. IEE Proceedings, Visual Image Signal Processing 147(4)

39. Su K, Kundur D, Hatzinakos D (2004) Spatially localized image-dependent watermarking for statistical

invisibility and collusion resistance. IEEE Trans Multimedia

40. Voyatzis G, Pitas I (1999, July) The use of watermarking in the protection of digital multimedia products.

Proc IEEE 87(7)

41. Wolfgang RB, Delp EJ (1997, June) A watermarking technique for digital imagery: further studies.

Proceedings of the international conference on imaging science, systems and applications (CISST 97),

Las Vegas, NV, USA, pp 279 –287

42. Zeng W, Lei S (1999, November) Efficient frequency domain digital video scrambling for content access

control. ACM Multimedia ’99 Proceedings, Orlando, Florida

Multimed Tools Appl (2006) 31: 145–170 169

Sabu Emmanuel received his B.E. (Electronics & Communication Engineering) from Regional Engineering

College, Durgapur (1988), M.E. (Electrical Communication Engineering) from Indian Institute of Science

(IISc.), Bangalore (1998), and Ph.D. (Computer Science) from National University of Singapore (NUS)

(2002). He is an Assistant Professor at the School of Computer Engineering, Nanyang Technological

University, Singapore. His current research interests are in media forensics, digital rights management

(DRM), and wireless communication security. He has been a member of the technical program committee of

several international conferences.

Mohan Kankanhalli obtained his BTech (Electrical Engineering) from the Indian Institute of Technology,

Kharagpur and his MS/PhD (Computer and Systems Engineering) from the Rensselaer Polytechnic Institute.

He is a Professor at the School of Computing at the National University of Singapore. He is on the editorial

boards of several journals including the ACM Transactions on Multimedia Computing, Communications,

and Applications, IEEE Transactions on Multimedia, ACM/Springer Multimedia Systems Journal, Pattern

Recognition Journal and the IEEE Transactions on Information Forensics and Security. His current research

interests are in Multimedia Systems (content processing, retrieval) and Multimedia Security (surveillance,

authentication and digital rights management).

170 Multimed Tools Appl (2006) 31: 145–170

An audio-video mixed fingerprint generation algorithm based on key frames

Article

Sep 2011

Recent years, digital fingerprint technology is more and more used in multimedia content security protection. But the current digital fingerprint technology still exist many disadvantages, such as complex extraction method, huge fingerprint size and difficulty in real-time monitoring of multimedia contents. This paper proposed a new audio-video mixed fingerprint (AVMF) generation algorithm, which indicates the feature information of audio and video content. Thus, it can authenticate audio and video data simultaneously, and avoid the flaws of content security monitoring by one type of fingerprint. The first work of this paper is the extraction of video key I frames based on the graph theory knowledge—minimum vertex cover set. And then, we present the methods of formation and mix of the audio and video fingerprint, which are based on DCT-coefficient characteristics and XOR respectively. Experiment and simulation results illustrate that the proposed algorithm can simplify the generation process of digital fingerprints and reduce the fingerprint size,, and it have ability of error-detection and error correction. Therefore, this algorithm provides theoretical basis and technical support for fast and real-time detection of multimedia contents security in internet.

Cryptology for Digital TV Broadcasting

Article

Full-text available

Jun 1995

Digital TV broadcasting needs new cryptological tools for conditional access, copyright protection and image authentication. The aim of this paper is to overview the corresponding systems' features. The description of a conditional access system is given. It is shown that equitable systems need the use of a trusted third party. The design of efficient copyright protection by watermarking images and image authentication by signatures are also briefly discussed

Watercasting: distributed watermarking of multicast media

Conference Paper

Jan 1999

Adaptive visible watermarking of images

Article

Jan 1999

Discrete Cosine Transform

Chapter

Dec 1990

This chapter presents discrete cosine transform. The development of fast algorithms for efficient implementation of the discrete Fourier transform (DFT) by Cooley and Tukey in 1965 has led to phenomenal growth in its applications in digital signal processing (DSP). The discovery of the discrete cosine transform (DCT) in 1974 has provided a significant impact in the DSP field. While the original DCT algorithm is based on the FFT, a real arithmetic and recursive algorithm, developed by Chen, Smith, and Fralick in 1977, was the major breakthrough in the efficient implementation of the DCT. A less well-known but equally efficient algorithm was developed by Corrington. Subsequently, other algorithms, such as the decimation-in-time (DIT),decimation-in-frequency (DIF), split radix, DCT via other discrete transforms such as the discrete Hartley transform (DHT) or the Walsh-Hadamard transform (WHT), prime factor algorithm (PFA), a fast recursive algorithm, and planar rotations, which concentrate on reducing the computational complexity and/or improving the structural simplicity, have been developed. The dramatic development of DCT-based DSP is by no means an accident.

Secure Spread Spectrum Watermarking for Multimedia

Technical Report

Jan 1995

Protecting Publicly-Available Images with a Visible Image Watermark

Article

Mar 1996
Proceedings of SPIE

In some cases, it is desired to produce images that are suitable for one application and unsuitable for others. With the 'Vatican Library Accessible Worldwide' and 'Luther Digital' projects, for example, it was desired to make available through the Internet images of Vatican Library and Luther Library manuscripts that are suitable for scholarly study yet are unsuitable as a source for unapproved publication. One of the techniques used to accomplish this is the visible image watermark. Our technical goals for watermarking include (1) applying a readily visible mark to the image that clearly identifies its ownership, (2) permitting all image detail to be visible through the watermark, and (3) making the watermark difficult to remove. In this paper, we describe the technique we use. We also discuss the characteristics of good watermarks and options we have used in their application. illustrative watermarked images are presented.

Digital video: An introduction to MPEG-2

Article

Jan 1998
J ELECTRON IMAGING

From the Publisher: This book offers comprehensive coverage of the MPEG-2 audio / visual digital compression standard. The treatment includes the specifics needed to implement an MPEG-2 Decoder, including the syntax and semantics of the coded bitstreams. Since the MPEG-2 Encoders are not specified by the standard, and are actually closely held secrets of many vendors, the book only outlines the fundamentals of encoder design and algorithm optimization.

Discrete cosine transform. Algorithms, advantages, applications

Article

Jan 1990

Embedding visible watermark in compressed video stream

Article

Jan 1998

Watercasting: Distributed Watermarking of Multicast Media

Conference Paper

Jun 2004
Lect Notes Comput Sci

We outline a scheme by which encrypted multicast audiovisual data may be watermarked by lightweight active network components in the multicast tree. Every recipient receives a slightly different version of the marked data, allowing those who illegally re-sell that data to be traced. Groups of cheating users or multicast routers can also be traced. There is a relationship between the requirements for the scheme proposed here, the requirements for reliable multicast protocols, and proposed mechanisms to support layered delivery of streamed media in the Internet.

Mask-based fingerprinting scheme for digital video broadcasting

Abstract and Figures

Recommended publications

Direct Fingerprinting on Multicasting Compressed Video

Mask-based interactive watermarking protocol for video

Copyright protection for MPEG-2 compressed broadcast video

Video fingerprinting and encryption principles for digital rights management

DIGITAL IMAGE WATERMARKING BASED ON A RELATION BETWEEN SPATIAL AND FREQUENCY DOMAINS