ArticlePDF Available

Hybrid-Based Compressed Domain Video Fingerprinting Technique

July 2012
Computer and Information Science 5(5)

July 2012
5(5)

DOI:10.5539/cis.v5n5p25

License
CC BY 4.0

Authors:

A. S. Abbass

modern academy in maadi

Aliaa Youssif

Arab Academy for Science, Technology & Maritime Transport (AASTMT)

Atef Z. Ghalwash

Helwan University

Video fingerprinting is a newer research area. It is also called “content-based video copy detection” or “content-based video identification” in literature. The goal is to locate videos with segments substantially identical to segments of a query video while tolerating common artifacts in video processing. Its value as a tool to curb piracy and legally monetize contents becomes more and more apparent in recent years with the wide spread of Internet videos through user generated content (UGC) sites like YouTube. Its practical applications to a certain extent overlap with those of digital watermarking, which requires adding artificial information into the contents. Fingerprints are compact content-based signature that summarizes a video signal or another media signal. Several video fingerprinting methods have been proposed for identifying video, in which fingerprints are extracted by analyzing video in both spatial and temporal dimension. However, these conventional methods have one resemblance, in which video decompression is still required for extracting the fingerprint from a compressed video. In practical, faster computational time can be achieved if fingerprint is extracted directly from the compressed domain. So far, too fewer methods are known to propose video fingerprinting in compressed domain. This paper presents a video fingerprinting technique that works directly in the compressed domain. Experimental results show that the proposed fingerprint is highly robust against most signal processing transformations.

Full decompression versus partial decompression of the compressed video The bold segmented rectangle is the zone where the partially decoded-based techniques work and the rounded rectangle is where the proposed technique work.

…

. Distortions used in the study

…

Macroblock type modes in P pictures

…

Macroblock type modes in B pictures

…

The proposed technique performance using the median to binarize the proposed fingerprint

…

Figures - uploaded by A. S. Abbass

Content may be subject to copyright.

Content uploaded by A. S. Abbass

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Computer and Information Science; Vol. 5, No. 5; 2012

ISSN 1913-8989 E-ISSN 1913-8997

Published by Canadian Center of Science and Education

Hybrid-Based Compressed Domain Video

Fingerprinting Technique

Abbass S. Abbass1, Aliaa A. A. Youssif1 & Atef Z. Ghalwash1

1 Faculty of computers and information, Helwan University, Cairo, Egypt

Correspondence: Abbass S. Abbass, Faculty of computers and information, Helwan University, Cairo, Egypt.

E-mail: abbass1652@yahoo.com; aliaay@yahoo.com; aghalwash@edara.gov.eg

Received: June 4, 2012 Accepted: June 18, 2012 Online Published: July 15, 2012

doi:10.5539/cis.v5n5p25 URL: http://dx.doi.org/10.5539/cis.v5n5p25

Abstract

Video fingerprinting is a newer research area. It is also called “content-based video copy detection” or

“content-based video identification” in literature. The goal is to locate videos with segments substantially

identical to segments of a query video while tolerating common artifacts in video processing. Its value as a tool

to curb piracy and legally monetize contents becomes more and more apparent in recent years with the wide

spread of Internet videos through user generated content (UGC) sites like YouTube. Its practical applications to a

certain extent overlap with those of digital watermarking, which requires adding artificial information into the

contents. Fingerprints are compact content-based signature that summarizes a video signal or another media

signal. Several video fingerprinting methods have been proposed for identifying video, in which fingerprints are

extracted by analyzing video in both spatial and temporal dimension. However, these conventional methods have

one resemblance, in which video decompression is still required for extracting the fingerprint from a compressed

video. In practical, faster computational time can be achieved if fingerprint is extracted directly from the

compressed domain. So far, too fewer methods are known to propose video fingerprinting in compressed domain.

This paper presents a video fingerprinting technique that works directly in the compressed domain. Experimental

results show that the proposed fingerprint is highly robust against most signal processing transformations.

Keywords: video fingerprinting, compressed domain, perceptual hash

1. Introduction

Text, image, audio, and video can be represented as digital data. The explosion of Internet applications leads

people into the digital world, and communication via digital data becomes recurrent. However, new issues also

arise and have been explored, such as data security in digital communications, copyright protection of digitized

properties, and invisible communication via digital media (Nianhua, 2011). Due to the rapid development of

video production technology and the decreasing cost of video acquisition tools and storage, a vast amount of

video data is generated around the world every day, including feature films, television programs,

personal/home/family videos, surveillance videos, game videos, etc. Digital video has opened up the potential of

using video sources in ways other than the traditional serial playback. However, this requires the development of

new technologies for accessing and manipulating digital video (Moxley, 2010).

Additionally, the amount of digital video data, which has the potential of becoming much greater than that of

traditional analog video, necessitates the development of digital video management tools for handling massive

video databases. Also, the ease with which all digital media can be flawlessly copied makes the development of

appropriate rights protection and authentication tools highly desirable. There necessitates techniques for

automatically managing this vast amount of information, such that users can structure them quickly, understand

their content and organize them in an efficient manner. An emerging technology which is useful for the

management of video, particularly with respect to rights protection, is fingerprinting (Cherubini, 2009) and

(Peng, 2010) also known as perceptual hashing or replica detection. This is defined as the identification of a

video segment using a representation called fingerprint (or sometimes perceptual hash), which is extracted from

the video content (Saikia, 2011).

The fingerprint must uniquely identify a video segment, but does not necessarily need to represent its content.

Additionally, it must remain the same when a video segment is manipulated, usually by common video

processing operations such as resizing, cropping, histogram equalization, compression etc. Fingerprints can be

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

used for establishing whether two given segments are either identical or derived from each other, and also for

establishing whether a video segment is identical with (or derived from) any segment within a given video

database (Liu, 2010).

Several video fingerprinting algorithms that work at the pixel level have been proposed. Working directly with

pixels is, nowadays, computationally feasible and accurate. However, those solutions do not address the

magnitude of the resources needed for such a system and little analysis is provided in order to use a video

fingerprinting solution in practical cases. The compressed domain processing techniques use the information

extracted directly from compressed bitstream, and therefore are more advantageous than the uncompressed

domain in computational means. To achieve this, the information already inherent in the video stream, which

was included during the compression stage, is utilized (Maria, 2009).

A partial decompression must still be done to extract the information necessary for processing, however this

overhead is small compared to full decompression of the video stream. It is shown that in MPEG video

decompression, approximately 40% of the CPU time is spent in Inverse Discrete Cosine Transform (IDCT)

calculations, even when using fast DCT algorithms (Young-min, 2000). This paper propose a video

fingerprinting technique that work directly in the compressed domain at the stage of variable length decoding

(Figure 1), so it is computationally more efficient than uncompressed domain techniques and even the methods

that utilize DCT coefficients that are partially decoded.



MPEG

Var ia bl eLength

Decoding

Inverse

Quantization IDCT

Post

Processing

Raw

Reference

Frames

Motion

Compensation

Parsing

Figure 1. Full decompression versus partial decompression of the compressed video

The bold segmented rectangle is the zone where the partially decoded-based techniques work and the rounded

rectangle is where the proposed technique work.

The paper is organized as follows: Related work is given in Section (2); Section (3) describes our proposed

fingerprinting technique; Experimental results, comparisons and discussion are shown in Section (4); Finally,

Section (5) concludes the paper and point out some directions for future work.

2. Related Works

Some video similarity detection methods use uncompressed MPEG video to directly extract the features. Content

of the frames, DC values of macroblocks or motion vectors are used as features. Ardizzone (1999) use motion

vectors for feature extraction. They use global motion feature or motion based segmented feature as a signature

of the video. In global motion extraction step, statistical distribution of directions (i.e., an angle histogram) is

calculated. The angle histogram is computed by dividing the [-180, 180] interval into subintervals. Sum of

magnitudes of motion vectors in intervals constructs the angle histogram. In motion based segmentation, motion

vectors are clustered and labeled. Labels are given according to the similarity of motion vectors or the histogram

of motion vector magnitudes. Dominant regions are taken into account in comparison step.

Joly, Frelicot and Buisson extract local fingerprints around interest points in (Joly, 2003). These interest points

are detected with the Harris detector and compared using the Nearest Neighbor method. They propose statistical

similarity search in (Joly & Buisson, 2005) and (Joly, 2005). Joly et al. use this method and propose

distortion-based probabilistic approximate similarity search technique in order to speed up scanning in content

based video retrieval framework (Joly, Buisson, & Frelicot, 2007). Zhao et al. extract PCA-SHIFT descriptors

and use it for video matching in (Zhao, 2007). They use the nearest neighbor search for matching and SVMs for

learning matching patterns with their duplicates. Law et al. propose a video indexing method using temporal

contextual information which is extracted from local descriptors of interest points in (Law-To, 2006) and

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

(Law-To & Gouet-Branet, 2006). They use this contextual information in a voting function.

Poullot et al. present a method for monitoring a real time TV channel in (Poullot, 2007). They use the method for

comparing the incoming data with indexed videos in database. Innovations of the method are z-grid for building

indexes, uniformity-based sorting and adapted partitioning of the components. Lienhart et al. (Yang, 2004) use

color coherence vector to characterize the key frames of the video. Sanchez et al. (1999) discuss using color

histograms of key frames for copy detection. They test the developed system on TV commercials and the system

is sensitive to color variations. Hampapur (2000) uses edge features but he ignores the color variations. Indyk

(1999) use distance between two scenes as its signature. However, it is a weak and limited signature.

So far, too fewer methods are known to propose video fingerprinting in compressed domain, one of them which

DC coefficient is used to model the fingerprint (Mikhalev et al., 2008). Given the extracted DC coefficient, the

video fingerprint method constructs the video frame, the modeled video frame is evaluated to obtain the

key-frames of the video, which would be further analyzed for generating the fingerprints. Naphade (1999) use

histogram intersection of the YUV histograms of the DC sequence of the MPEG video. It is an efficient method

in terms of compression. Recently AlBaqir (2009) proposed a video fingerprinting method, in which motion

vectors are used to model the fingerprint. He considers utilizing motion vector to construct approximated motion

field since the motion vectors are commonly generated during video compression for exploiting the temporal

redundancy within a video.

3. Proposed Technique

In general, MPEG normally classify the video frames into I (intra) frame, P (predicted) frame and B

(bi-directional) frame, and each frame is divided to macroblocks (MB) which are 16x16 pixel motion

compensation units within a frame. I frames can only have Intra blocks. P and B pictures can have different

modes according to motion content. Macroblock type modes in P and B frames are given in Figure 2 and Figure

3 respectively. If intra coding is selected, the corresponding MB is encoded individually by exploiting the two

dimensional discrete cosine transform (2D-DCT) coefficients (Equation 1). On the other hand, if inter coding is

selected, the MB is then encoded using motion estimation-motion compensation (ME/MC) algorithms

(Richardson, 2003; Heath, 2002).





M1M1

12 i,jij

i0 j0

2n 1 j

x( n , n ) X c c cos 2M













 (1)

where,

and Xi,j is M×M block of pixels.

Figure 2. Macroblock type modes in P pictures

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

The purpose of using ME/MC is to reduce redundancy in temporal direction. To exploit temporal redundancy,

MPEG algorithms compute an interframe difference called prediction error (Pei, 1999).

prediction error(i, j) MN

m,n

mn m i,n j

[,t) ,t1)]

(I (I







 (2)

where M and N are macroblock sizes. For a given macroblock, the encoder first determines whether it is Motion

Compensated (MC) or Non Motion Compensated (NO_MC) and a scheme is used to determine whether the

current block is intra/inter coded based on the prediction error. The scheme can be quite complex, but the general

idea is to code the difference between target macroblock and reference macroblock when the prediction error is

small, otherwise, intra-code the macroblock. For normal scenes, prediction is performed in P and B frames.

When there is a scene change, this prediction drops significantly, which lead to some macroblocks to be encoded

in intra mode. Strictly speaking, if a frame is inside of a shot, then the macroblocks should be predicted well

from previous or next frames. However, when the frames are on the shot boundary, the frames cannot be

predicted from the related macroblocks, and a high prediction error occurs (Yeo, 1995). This causes most of the

macroblocks of the P frames to be intra coded instead of motion compensated.

Figure 3. Macroblock type modes in B pictures

The proposed technique fuses the macroblock type’s information and the motion field generated by using the

motion vectors in the MPEG stream to capture the intrinsic content of the video. Only VLC decoding and

number of macroblocks times addition operation is necessary to obtain the MB data (see the rounded rectangle

zone in Figure 1).

The proposed technique (Figure 4) is divided into two stages namely the fingerprint extraction stage and the

similarity matching stage respectively. In the first stage the compressed MPEG video clip is parsed to extract the

macroblocks information and motion vector data. Then, 10 bin one dimensional histogram is constructed using

the macroblocks types (Figure 5). The macroblocks types used to generate this histogram aggregate not only the

normal types in the normal scenes but also the types that happened in the scene change-like scenes. So tis

histogram carries important clues to the spatial and temporal content of the video. Lastly, the proposed technique

merge the before mentioned feature to the motion field feature (AlBaqir, 2009) to generate two-part fingerprint.

Let call the first part of the proposed fingerprint as MBTH (Macroblock Type Information Histogram) and the

second part MFH (Motion Field Histogram).

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

Figure 4. Proposed technique block diagram



123 10

MBTH X , X , X ,...X (3)

10 t





 (4)

where ηt is the total number of MB in frame t.

To further reduce the resultant fingerprint redundancy, the proposed technique quantizes the generated

fingerprint into a binary sequence (for each part separately as Figure 4 shows) using the median filtering. This

can be achieved by fixing a threshold level, and quantize every bin value within each histogram according to the

threshold. By having a fingerprint in a binary sequence, a more efficient matching process can be performed

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

using Hamming distance between the compared fingerprints. Instead of using the histogram intersection

(Equation 5) or even the Jaccard coefficient (Equation 6) to compare the resultant fingerprints.

(i) (i )

W ( Q, R ) min( Q , R )



 (5)

J(Q,R ) QR



 (6)

where Q and R are two pair of fingerprints, each containing N bins.

Figure 5. The macroblocks types used in generating the first part of the proposed fingerprint

The fusion of the two-part fingerprint in the second stage is performed as follows:

12 1 2

(MFH ,MFH ) 1- ) (MBTH MBTHSimilarity Score ( , )





 (7)

Where α is the fusion parameter, Ψ is the distance metric, MFHx is the MFH for video x.

4. Experimental Results

To evaluate the performance of the proposed technique, a test set of 200 videos all taken from the ocean

(ReefVid) was used. Then attacks were individually mounted on the videos, so a new test set equals to 3600

videos was generated. The mounted attacks included added watermark, mosaic effect, Embossment effect,

flipping, blindness effect, cropping, contrast adjustment, brightness modification, and bit rate change as Table 1

illustrates. Also, the hardware spec of the PC used for the experiments was an Intel dual-core running at 2 GHz

with 3GB memory. However, all tests were ran using a single core. Finally, the technique used to compare with

is the motion field technique (AlBaqir, 2009) which also operates in the compressed domain as the proposed one

to be a fair comparison.

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

Table 1. Distortions used in the study

Index Distortion

1 Watermark

2 Mosaic

3 Embossment

4 Horizontal Flipping

5 Vertical Flipping

6 Adding Horizontal Lines

7 Adding Vertical Lines

8 Cropping (Big Window)

9 Cropping (Small Window)

10 Contrast Adjustment(negative image)

11 Contrast Adjustment(maximum contrast)

12 Brightness Adjustment (-50%)

13 Brightness Adjustment (-25%)

14 Brightness Adjustment (+50%)

15 Brightness Adjustment (+25%)

16 Different Bit Rate(512 Kbps)

17 Different Bit Rate(800 Kbps)

Four sets of the experiments are conducted to study the following issues:

1) Determine the best value of the fusion parameter.

2) Studying the binarization issue.

3) Studying the behavior of the proposed technique against content-preserving attacks (Table 1) to investigate the

robustness and uniqueness of the proposed fingerprint.

4) Comparing the proposed work against existing technique working in the compressed domain (Motion Field

(AlBaqir, 2009)).

The average retrieval rate across all the 17-attacks for the proposed technique with and without using

binarization is shown in Figure 6. The figure shows that the proposed work is improved when using the

binarization, also using α=0.2 gives the best result in the two cases.



100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Average Retrieval Ra te %

Proposed Technique Without Median

Proposed Technique With Median

Figure 6. The proposed technique performance using the median to binarize the proposed fingerprint

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

Figure 7 depict the results of comparing the proposed technique against the base line technique (the motion field

(AlBaqir, 2009)) in detailed manner using the before mentioned distortions (from 1 to 17 as mentioned in Table

1). It is clear that the proposed technique outperform the motion field technique. Finally Figure 8 concludes that

the proposed fingerprint outperform its constituent fingerprints also the macroblocks types is more important

than the motion vectors. An acceptable reasoning to the results is as follows: the motion field technique try to

build the motion trajectories of the video content depending on using the available information in the

compressed domain, but not all the macroblocks in the compressed domain carry motion information rather a lot

of them are labeled as NO_MC or skipped macroblocks. On the other hand, the proposed technique alleviate this

drawback by using the information associated with each macroblock in the compressed stream (the macroblocks

types), and by the proper merging of the motion field and the macroblocks types.

0 102030405060708090100

Wa term a rk

Mosaic

Emb ossm en t

Horizontal Flipping

Vertic al Flip ping

Adding Horizontal Lines

Add ing Vertica l Lines

Cropping (Big Window(

Cropping (Sma ll Window(

Contrast Adjustment(negative image(

Con trast Adjustmen t(max imum con trast(

Brightness Adjustment (-50%(

Brightness Adjustment (-25%(

Brightness Adjustment (+50%(

Brightness Adjustment (+25%(

Different Bit Ra te(512Kbps(

Different Bit Ra te(800Kbps(

Ave rag e Retr iev al Ra te %

Distortion

Proposed Technique Motion Field Technique

Figure 7. The performance of the proposed technique against the baseline technique across all used distortions



74.11%

82.35%

93.52%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

Motion Field Feature Macroblocks Type Feature Proposed

Avera g e Retrieva l Rate %

Feature

Figure 8. Comparison of the different fingerprint extraction methods

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

5. Conclusion

This paper proposes a video fingerprinting method in the compressed domain that utilizes the macroblock and

the motion vectors information in a hybrid way. The proposed work gives promising results despite of its low

computational overhead against a large spectrum of the content-based video transformations. Also, the proposed

work shows that the macroblocks types is more important than the motion vectors as intrinsic content-preserving

feature in the video compressed domain. One direction for future work is combining this technique with

compressed domain watermarking methods to design a robust content management methodology and apply it in

the broadcast monitoring area. Also, the proposed work can be adapted to work in real time environment like

cellular phones associated with cloud computing metaphor.

References

AlBaqir, M. (2009). Video fingerprinting in compressed domain. MSc. Thesis, Delft university of technology.

Ardizzone, E., Cascia, M. L., Avanzato, A., & Bruna, A. (1999). Video indexing using mpeg motion

compensation vectors. In ICMCS '99: Proceedings of the IEEE International Conference on Multimedia

Computing and Systems, (Washington, DC, USA), p. 725, IEEE Computer Society.

http://dx.doi.org/10.1109/MMCS.1999.778574

Cherubini, M., de Oliveira, R., & Oliver, N. (2009). Understanding near-duplicate videos: A user-centric

approach. In Proc. ACM Conference on Multimedia, pp. 35-44.

Hampapur, A., & Bolle, R. (2000). Feature based indexing for media tracking. IEEE International Conference on

Multimedia, 3, 1709-1712.

Heath, T., Howlett, T., & Keller, J. (2002). Automatic Video Segmentation in the Compressed Domain. IEEE

Aerospace Conference.

Indyk, G., & Shivakumar, N. (1999). Finding pirated video sequences on the internet. Stanford Infolab Technical

Report.

Joly, A., Buisson, O., & Frelicot, C. (2005). Statistical similarity search applied to content-based video copy

detection. In ICDEW ’05: Proceedings of the 21st International Conference on Data Engineering

Workshops, (Washington, DC, USA), p. 1285, IEEE Computer Society.

http://dx.doi.org/10.1109/ICDE.2005.291

Joly, A., Buisson, O., & Frelicot, C. (2007). Content-based copy retrieval using distortion-based probabilistic

similarity search. IEEE Transactions on Multimedia, 9, 293-306.

http://dx.doi.org/10.1109/TMM.2006.886278

Joly, A., Frelicot, C., & Buisson, O. (2003). Robust content-based video copy identification in a large reference

database. In Proceedings of ACM International Conference on Image and Video Retrieval (CIVR), vol.

2728, pp. 511-516.

Joly, A., Frelicot, C., & Buisson, O. (2005). Content-based video copy detection in large databases: A local

fingerprints statistical similarity search approach. ICIP 2005, IEEE International Conference on Image

Processing, vol. 1, pp. I-505-8. http://dx.doi.org/10.1109/ICIP.2005.1529798

Law-To, J., Buisson, O., Gouet-Brunet, V., & Boujemaa, N. (2006). Robust voting algorithm based on labels of

behavior for video copy detection. In MULTIMEDIA '06: Proceedings of the 14th annual ACM

international conference on Multimedia, (New York, NY, USA), pp. 835-844, ACM.

Law-To, J., Gouet-Branet, V., Buisson, O., & Boujemaa, N. (2006). Local behaviours labelling for content based

video copy detection. ICPR 2006. 18th International Conference on Pattern Recognition, vol. 3, pp. 232-235.

http://dx.doi.org/10.1109/ICPR.2006.767

Liu, Y., & Yao, L. (2010). Research of Robust Video Fingerprinting. In Proceedings International Conference on

Computer Application and System Modeling, pp. 43-46.

Maria, C., & Athanassios, N. S. (2009). Real-time keyframe extraction towards video content identification. 16th

International Conference on Digital Signal Processing, pp.1-6.

Mikhalev, A. et al. (2008). Video fingerprint structure, database construction and search algorithms, Direct Video

& Audio Content Search Engine (DIVAS) project, Deliverable number D 4.2.

Moxley, E., Mei, T., & Manjunath, B. S. (2010). Video annotation through search and graph reinforcement

mining. IEEE Transactions on Multimedia, 12(3), 183-193. http://dx.doi.org/10.1109/TMM.2010.2041101

www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012

Naphade, M. R., Yeung, M. M., & Yeo, B. L. (1999). Novel scheme for fast and efficient video sequence

matching using compact signatures. SPIE, vol. 3972, pp. 564-572. http://dx.doi.org/10.1117/12.373590

Nianhua, X., Li, L., Xianglin, Z., & Maybank, S. (2011). A Survey on Visual Content-Based Video Indexing and

Retrieval. IEEE SMC, pp. 797-819.

Pei, S. C., & Chou, Y. Z. (1999). Efficient MPEG Compressed Video Analysis Using Macroblock Type

Information. IEEE Transactions on Multimedia, 1(4), 321-333. http://dx.doi.org/10.1109/6046.807952

Peng, C., Zhipeng, W., Shuqiang, J., & Qingming, H. (2010). Fast copy detection based on Slice Entropy

Scattergraph. IEEE International Conference on Multimedia (ICME), pp 1236-1241.

Poullot, S., Buisson, O., & Crucianu, M. (2007). Z-grid-based probabilistic retrieval for scaling up

content-based copy detection. In CIVR '07: Proceedings of the 6th ACM international conference on Image

and video retrieval, (New York, NY, USA), pp. 348-355, ACM.

ReefVid: Free Reef Video Clip Database. Retrieved from http://www.reefvid.org/

Richardson, I. E. G. (2003). H.264 and MPEG4 Video Compression - Video Coding for Next Generation

Multimedia. England: Wiley & Sons.

Saikia, N., & Bora, P. K. (2011). Robust video hashing using the 3D-DWT. National Conference on

Communications (NCC), pp 1-5. http://dx.doi.org/10.1109/NCC.2011.5734750

Sanchez, J. M., Binefa, X., Vitria, J., & Radeva, P. (1999). Local color analysis for scene break detection applied

to TV commercials recognition. In Visual Information Systems, vol. 1614 of Lecture Notes in Computer

Science, Springer Berlin / Heidelberg.

Yang, X., Tian, Q., & Chang, E. C. (2004). A color fingerprint of video shot for content identification. In

Proceedings of the 12th annual ACM international conference on Multimedia Systems, pp. 276-279.

Yeo, B. L., & Liu, B. (1995). Rapid Scene Analysis on Compressed Video. IEEE Transactions on Circuits and

Systems for Video Technology, 5(6), 533-544. http://dx.doi.org/10.1109/76.475896

Young-min, K., Sung, W. C., & Seong-whan, L. (2000). Fast Scene Change Detection Using Direct Feature

Extraction from MPEG Compressed Videos. International Conference on Pattern Recognition (ICPR'00),

vol. 3.

Zhao, W. L., Ngo, C. W., Tan, H. K., & Wu, X. (2007). Near-duplicate Keyframe identification with interest

point matching and pattern learning. IEEE Transactions on Multimedia, 9, 1037-1048.

http://dx.doi.org/10.1109/TMM.2007.898928

Compressed Domain Based Review on Digital Video Watermarking Techniques

Article

Full-text available

Sep 2019

The present review covers the literature of digital video watermarking schemes based on considering the video signal while encoding or based on encoded video in an increasing order of their publication year from 1998 to 2016. Through extensive literature, some grouping is required. Therefore, only papers published by a method of peer review in high class journal are reviewed. Most of the papers are deals with MPEG structure. Few papers are following the structure of H.264/AVC. These articles include experimental, analytical and numerical work for specific applications. This review also takes papers from good conferences and meeting on video watermarking in addition to above reviewing journal articles. The limitations and the still existing challenges are described in the later part of the review facilitate for those authors who further require doing some innovative work in this field. Eventually, the aim of the authors is to covers the in-depth discussion on already published literature based on compressed domain video watermarking.

Article

Full-text available

Dec 2019

This paper presents a fast and effective technique for videos’ visual similarity detection and measurement using compact fixed-length signatures. The proposed technique facilitates for building real-time and scalable video matching/retrieval systems through generating a representative signature for a given video shot. The generated signature (Statistical Dominant Colour Profile, SDCP) effectively encodes the colours’ spatio-temporal patterns in a given shot, towards a robust real-time matching. Furthermore, the SDCP signature is engineered to better address the visual similarity problem, through its relaxed representation of shot contents. The compact fixed-length aspect of the proposed signature is the key to its high matching speed (>1000 fps) compared to the current techniques that relies on exhaustive processing, such as dense trajectories. The SDCP signature encodes a given video shot with only 294 values, regardless of the shot length, which facilitates for speedy signature extraction and matching. To maximize the benefit of the proposed technique, compressed-domain videos are utilized as a case study following their wide availability. However, the proposed technique avoids full video decompression and operates on tiny frames, rather than full-size decompressed frames. This is achievable through using the tiny DC-images sequence of the MPEG compressed stream. The experiments on various standard and challenging datasets (e.g. UCF101 13k videos) shows the technique’s robust performance, in terms of both, retrieval ability and computational performances.

Compact Signature-Based Compressed Video Matching Using Dominant Color Profiles (DCP)

Conference Paper

Full-text available

Aug 2014

Abstract— This paper presents a novel technique for efficient and generic matching of compressed video shots, through compact signatures extracted directly without decompression. The compact signature is based on the Dominant Color Profile (DCP); a sequence of dominant colors extracted and arranged as a sequence of spikes in analogy to the human retinal representation of a scene. The proposed signature represents a given video shot with ~490 integer values, facilitating for realtime processing to retrieve a maximum set of matching videos. The technique is able to work directly on MPEG compressed videos, without full decompression, as it utilizes the DC-image as a base for extracting color features. The DC-image has a highly reduced size, while retaining most of visual aspects, and provides high performance compared to the full I-frame. The experiments and results on various standard datasets show the promising performance, both the accuracy and the efficient computation complexity, of the proposed technique.

Video Similarity Measurement and Search

Chapter

Full-text available

Jan 2019

The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR). Video matching for the retrieval purpose is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. This chapter presents and discusses the core field concepts, problems and recent trends. This will provide the reader with the required amount of knowledge to select suitable features’ set and adequate techniques to develop robust research in this field.

Fast Compressed Domain Copy Detection with Motion Vector Imaging

Conference Paper

Apr 2018

Real-time Keyframe extraction towards video content identification

Conference Paper

Full-text available

Aug 2009

Keyframe extraction constitutes a fundamental unit in many video retrieval-related applications. In the emerging research field of content-based video copy detection, efficient representation of video content at keyframe level is crucial, due to the fact that similarity search is mainly performed between content-representative frames. In this paper a sequential search algorithm that bypasses the process of temporal video segmentation is proposed for keyframe extraction in MPEG videos. We aim at providing an efficient, real-time and fully-automatic way of extracting keyframes in videos, where not only the laborious task of offline video database indexing is avoided, but also query video processing is performed in the same manner as the reference video database. Significant reduction in computational cost is achieved by exploiting DCT coefficients in feature extraction. The effectiveness of the proposed scheme is evaluated in terms of quality and speed on the manually annotated TRECVid 2007 test video dataset.

Fast Copy Detection based on Slice Entropy Scattergraph

Conference Paper

Full-text available

Aug 2010

With the exponential growth of digital video resources, huge amount of videos are uploaded onto the Internet. Therefore, the Content Based Copy Detection (CBCD) issue becomes a hot research topic and has been extensively studied recently. However, most of the approaches lack the power to efficiently handle large data corpus while maintaining a good detection quality. In this paper, we propose a fast CBCD approach based on the Slice Entropy Scattergraph (SES). SES employs video spatio-temporal slices which can greatly decrease the storage and computational complexity. It is based on entropy and its deviation so as to preserve as much as the video information. Besides, SES takes advantage of a scattergraph which is succinct and efficient to plot the distribution of video content. To effectively describe SES, we introduce three descriptors: Projection Histograms, Shape Contexts and Polynomial Coefficients. The experiments on CIVR'07 Copy Detection Corpus and Video Transformation Corpus show the performance improvement of our approach both on efficiency and effectiveness.

Robust voting algorithm based on labels of behavior for video copy detection

Conference Paper

Full-text available

Oct 2006

This paper presents an efcient approach for copies detection in a large videos archive consisting of several hundred of hours. The video content indexing method consists of extracting the dynamic behavior on the local description of interest points and further on the estimation of their trajectories along the video sequence. Ana- lyzing the low-level description obtained allows to highlight trends of behaviors and then to assign a label of behavior to each local descriptor. Such an indexing approach has several interesting prop- erties: it provides a rich, compact and generic description, while labels of behavior provide a high-level description of the video content. Here, we focus on video Content Based Copy Detection (CBCD). Copy detection is problematic as similarity search prob- lem but with prominent differences. To be efcient, it requires a dedicated on-line retrieval method based on a specic voting func- tion. This voting function must be robust to signal transformations and discriminating versus high similarities which are not copies. The method we propose in this paper is a dedicated on-line retrieval method based on a combination of the different dynamic contexts computed during the off-line indexing. A spatio-temporal registra- tion based on the relevant combination of detected labels is then applied. This approach is evaluated using a huge video database of 300 hours with different video tests. The method is compared to a state-of-the art technique in the same conditions. We illustrate that taking labels into account in the specic voting process reduces false alarms signicantly and drastically improves the precision.

Robust content-based video copy identification in a large reference database

Article

Jan 2003

Robust video hashing using the 3D-DWT

Conference Paper

Jan 2011

Robust or perceptual hash functions for video extract hashes on the basis of the perceptual contents of videos. Besides being sensitive to content differences in distinct videos, such a hash function should be robust against the content-preserving operations. This paper presents a solution for the hashing of videos at the group-of-frames level using the 3-dimensional discrete wavelet transform. A hash of a group-of-frames is computed from the spatio-temporal low-pass band of the transform. The local variances of the wavelet coefficients in the band are used to derive the hash. Experimental results demonstrate the robustness of the hash function against content-preserving operations and the sensitivity to content differences.

H. 264 and MPEG4 video compresson, video coding for next-generation multimedia

Article

I. E. G. Richardson

Finding pirated video sequences on the internet

Article

Jan 1999

Research of robust video fingerprinting

Conference Paper

Nov 2010

The performance of a video fingerprinting system, which is usually measured in terms of pairwise independence and robustness, is directly related to the fingerprint that the system uses. This paper has proposed a novel fingerprinting methods base on the gradient orientations of the luminance centroid, which is not only pairwisely independent but also robust against common video processing steps including lossy compression, resizing, frame rate change, global change in brightness, color, gamma, etc.

Video Annotation Through Search and Graph Reinforcement Mining

Article

May 2010

Unlimited vocabulary annotation of multimedia documents remains elusive despite progress solving the problem in the case of a small, fixed lexicon. Taking advantage of the repetitive nature of modern information and online media databases with independent annotation instances, we present an approach to automatically annotate multimedia documents that uses mining techniques to discover new annotations from similar documents and to filter existing incorrect annotations. The annotation set is not limited to words that have training data or for which models have been created. It is limited only by the words in the collective annotation vocabulary of all the database documents. A graph reinforcement method driven by a particular modality (e.g., visual) is used to determine the contribution of a similar document to the annotation target. The graph supplies possible annotations of a different modality (e.g., text) that can be mined for annotations of the target. Experiments are performed using videos crawled from YouTube. A customized precision-recall metric shows that the annotations obtained using the proposed method are superior to those originally existing for the document. These extended, filtered tags are also superior to a state-of-the-art semi-supervised technique for graph reinforcement learning on the initial user-supplied annotations.

A color fingerprint of video shot for content identification

Conference Paper

Oct 2004

In this paper we propose a novel space-time color feature representation for video shot and apply it to content identification. In this representation the shot is cut into k equal size segments, and each segment is represented by a blending image formed through averaging the pixels' values of each frame in this segment along time direction. Each blending image is then divided into equal size blocks, and two color patterns named major and minor colors among mean R,G,B are extracted for each block. Hence each shot can be represented by a fixed-length string. The experiment shows this representation is not only robust to image quality reduction, frame size and frame rate change, but also to color distortion such as brightness/contrast adjustment. We also give a video similarity measure based on this color feature to identify shot chunks. We conducted experiment on 100 video clips, and quite low error rates can be achieved when identifying small size shot chunks with significant color distortion. From the experiment we believe that this color feature is a compact and robust representation for video content, and effective for content identification.

Hybrid-Based Compressed Domain Video Fingerprinting Technique

Abstract and Figures

Recommended publications

Collusion Secure Scalable Video Fingerprinting Scheme

mPDF: Framework for Watermarking PDF Files using Image Watermarking Algorithms

Collusion-Resilient Fingerprinting Using Random Pre-Warping

Design Principles for Active Audio and Video Fingerprinting