ArticlePDF Available

Hybrid-Based Compressed Domain Video Fingerprinting Technique

Authors:
  • modern academy in maadi
  • Arab Academy for Science, Technology & Maritime Transport (AASTMT)

Abstract and Figures

Video fingerprinting is a newer research area. It is also called “content-based video copy detection” or “content-based video identification” in literature. The goal is to locate videos with segments substantially identical to segments of a query video while tolerating common artifacts in video processing. Its value as a tool to curb piracy and legally monetize contents becomes more and more apparent in recent years with the wide spread of Internet videos through user generated content (UGC) sites like YouTube. Its practical applications to a certain extent overlap with those of digital watermarking, which requires adding artificial information into the contents. Fingerprints are compact content-based signature that summarizes a video signal or another media signal. Several video fingerprinting methods have been proposed for identifying video, in which fingerprints are extracted by analyzing video in both spatial and temporal dimension. However, these conventional methods have one resemblance, in which video decompression is still required for extracting the fingerprint from a compressed video. In practical, faster computational time can be achieved if fingerprint is extracted directly from the compressed domain. So far, too fewer methods are known to propose video fingerprinting in compressed domain. This paper presents a video fingerprinting technique that works directly in the compressed domain. Experimental results show that the proposed fingerprint is highly robust against most signal processing transformations.
Content may be subject to copyright.
Computer and Information Science; Vol. 5, No. 5; 2012
ISSN 1913-8989 E-ISSN 1913-8997
Published by Canadian Center of Science and Education
25
Hybrid-Based Compressed Domain Video
Fingerprinting Technique
Abbass S. Abbass1, Aliaa A. A. Youssif1 & Atef Z. Ghalwash1
1 Faculty of computers and information, Helwan University, Cairo, Egypt
Correspondence: Abbass S. Abbass, Faculty of computers and information, Helwan University, Cairo, Egypt.
E-mail: abbass1652@yahoo.com; aliaay@yahoo.com; aghalwash@edara.gov.eg
Received: June 4, 2012 Accepted: June 18, 2012 Online Published: July 15, 2012
doi:10.5539/cis.v5n5p25 URL: http://dx.doi.org/10.5539/cis.v5n5p25
Abstract
Video fingerprinting is a newer research area. It is also called “content-based video copy detection” or
“content-based video identification” in literature. The goal is to locate videos with segments substantially
identical to segments of a query video while tolerating common artifacts in video processing. Its value as a tool
to curb piracy and legally monetize contents becomes more and more apparent in recent years with the wide
spread of Internet videos through user generated content (UGC) sites like YouTube. Its practical applications to a
certain extent overlap with those of digital watermarking, which requires adding artificial information into the
contents. Fingerprints are compact content-based signature that summarizes a video signal or another media
signal. Several video fingerprinting methods have been proposed for identifying video, in which fingerprints are
extracted by analyzing video in both spatial and temporal dimension. However, these conventional methods have
one resemblance, in which video decompression is still required for extracting the fingerprint from a compressed
video. In practical, faster computational time can be achieved if fingerprint is extracted directly from the
compressed domain. So far, too fewer methods are known to propose video fingerprinting in compressed domain.
This paper presents a video fingerprinting technique that works directly in the compressed domain. Experimental
results show that the proposed fingerprint is highly robust against most signal processing transformations.
Keywords: video fingerprinting, compressed domain, perceptual hash
1. Introduction
Text, image, audio, and video can be represented as digital data. The explosion of Internet applications leads
people into the digital world, and communication via digital data becomes recurrent. However, new issues also
arise and have been explored, such as data security in digital communications, copyright protection of digitized
properties, and invisible communication via digital media (Nianhua, 2011). Due to the rapid development of
video production technology and the decreasing cost of video acquisition tools and storage, a vast amount of
video data is generated around the world every day, including feature films, television programs,
personal/home/family videos, surveillance videos, game videos, etc. Digital video has opened up the potential of
using video sources in ways other than the traditional serial playback. However, this requires the development of
new technologies for accessing and manipulating digital video (Moxley, 2010).
Additionally, the amount of digital video data, which has the potential of becoming much greater than that of
traditional analog video, necessitates the development of digital video management tools for handling massive
video databases. Also, the ease with which all digital media can be flawlessly copied makes the development of
appropriate rights protection and authentication tools highly desirable. There necessitates techniques for
automatically managing this vast amount of information, such that users can structure them quickly, understand
their content and organize them in an efficient manner. An emerging technology which is useful for the
management of video, particularly with respect to rights protection, is fingerprinting (Cherubini, 2009) and
(Peng, 2010) also known as perceptual hashing or replica detection. This is defined as the identification of a
video segment using a representation called fingerprint (or sometimes perceptual hash), which is extracted from
the video content (Saikia, 2011).
The fingerprint must uniquely identify a video segment, but does not necessarily need to represent its content.
Additionally, it must remain the same when a video segment is manipulated, usually by common video
processing operations such as resizing, cropping, histogram equalization, compression etc. Fingerprints can be
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
26
used for establishing whether two given segments are either identical or derived from each other, and also for
establishing whether a video segment is identical with (or derived from) any segment within a given video
database (Liu, 2010).
Several video fingerprinting algorithms that work at the pixel level have been proposed. Working directly with
pixels is, nowadays, computationally feasible and accurate. However, those solutions do not address the
magnitude of the resources needed for such a system and little analysis is provided in order to use a video
fingerprinting solution in practical cases. The compressed domain processing techniques use the information
extracted directly from compressed bitstream, and therefore are more advantageous than the uncompressed
domain in computational means. To achieve this, the information already inherent in the video stream, which
was included during the compression stage, is utilized (Maria, 2009).
A partial decompression must still be done to extract the information necessary for processing, however this
overhead is small compared to full decompression of the video stream. It is shown that in MPEG video
decompression, approximately 40% of the CPU time is spent in Inverse Discrete Cosine Transform (IDCT)
calculations, even when using fast DCT algorithms (Young-min, 2000). This paper propose a video
fingerprinting technique that work directly in the compressed domain at the stage of variable length decoding
(Figure 1), so it is computationally more efficient than uncompressed domain techniques and even the methods
that utilize DCT coefficients that are partially decoded.
MPEG
Var ia bl eLength
Decoding
Inverse
Quantization IDCT
Post
Processing
Raw
Reference
Frames
Motion
Compensation
Parsing
Figure 1. Full decompression versus partial decompression of the compressed video
The bold segmented rectangle is the zone where the partially decoded-based techniques work and the rounded
rectangle is where the proposed technique work.
The paper is organized as follows: Related work is given in Section (2); Section (3) describes our proposed
fingerprinting technique; Experimental results, comparisons and discussion are shown in Section (4); Finally,
Section (5) concludes the paper and point out some directions for future work.
2. Related Works
Some video similarity detection methods use uncompressed MPEG video to directly extract the features. Content
of the frames, DC values of macroblocks or motion vectors are used as features. Ardizzone (1999) use motion
vectors for feature extraction. They use global motion feature or motion based segmented feature as a signature
of the video. In global motion extraction step, statistical distribution of directions (i.e., an angle histogram) is
calculated. The angle histogram is computed by dividing the [-180, 180] interval into subintervals. Sum of
magnitudes of motion vectors in intervals constructs the angle histogram. In motion based segmentation, motion
vectors are clustered and labeled. Labels are given according to the similarity of motion vectors or the histogram
of motion vector magnitudes. Dominant regions are taken into account in comparison step.
Joly, Frelicot and Buisson extract local fingerprints around interest points in (Joly, 2003). These interest points
are detected with the Harris detector and compared using the Nearest Neighbor method. They propose statistical
similarity search in (Joly & Buisson, 2005) and (Joly, 2005). Joly et al. use this method and propose
distortion-based probabilistic approximate similarity search technique in order to speed up scanning in content
based video retrieval framework (Joly, Buisson, & Frelicot, 2007). Zhao et al. extract PCA-SHIFT descriptors
and use it for video matching in (Zhao, 2007). They use the nearest neighbor search for matching and SVMs for
learning matching patterns with their duplicates. Law et al. propose a video indexing method using temporal
contextual information which is extracted from local descriptors of interest points in (Law-To, 2006) and
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
27
(Law-To & Gouet-Branet, 2006). They use this contextual information in a voting function.
Poullot et al. present a method for monitoring a real time TV channel in (Poullot, 2007). They use the method for
comparing the incoming data with indexed videos in database. Innovations of the method are z-grid for building
indexes, uniformity-based sorting and adapted partitioning of the components. Lienhart et al. (Yang, 2004) use
color coherence vector to characterize the key frames of the video. Sanchez et al. (1999) discuss using color
histograms of key frames for copy detection. They test the developed system on TV commercials and the system
is sensitive to color variations. Hampapur (2000) uses edge features but he ignores the color variations. Indyk
(1999) use distance between two scenes as its signature. However, it is a weak and limited signature.
So far, too fewer methods are known to propose video fingerprinting in compressed domain, one of them which
DC coefficient is used to model the fingerprint (Mikhalev et al., 2008). Given the extracted DC coefficient, the
video fingerprint method constructs the video frame, the modeled video frame is evaluated to obtain the
key-frames of the video, which would be further analyzed for generating the fingerprints. Naphade (1999) use
histogram intersection of the YUV histograms of the DC sequence of the MPEG video. It is an efficient method
in terms of compression. Recently AlBaqir (2009) proposed a video fingerprinting method, in which motion
vectors are used to model the fingerprint. He considers utilizing motion vector to construct approximated motion
field since the motion vectors are commonly generated during video compression for exploiting the temporal
redundancy within a video.
3. Proposed Technique
In general, MPEG normally classify the video frames into I (intra) frame, P (predicted) frame and B
(bi-directional) frame, and each frame is divided to macroblocks (MB) which are 16x16 pixel motion
compensation units within a frame. I frames can only have Intra blocks. P and B pictures can have different
modes according to motion content. Macroblock type modes in P and B frames are given in Figure 2 and Figure
3 respectively. If intra coding is selected, the corresponding MB is encoded individually by exploiting the two
dimensional discrete cosine transform (2D-DCT) coefficients (Equation 1). On the other hand, if inter coding is
selected, the MB is then encoded using motion estimation-motion compensation (ME/MC) algorithms
(Richardson, 2003; Heath, 2002).
M1M1
2
12 i,jij
i0 j0
2n 1 j
x( n , n ) X c c cos 2M





 (1)
where,
and Xi,j is M×M block of pixels.
Figure 2. Macroblock type modes in P pictures
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
28
The purpose of using ME/MC is to reduce redundancy in temporal direction. To exploit temporal redundancy,
MPEG algorithms compute an interframe difference called prediction error (Pei, 1999).
1
prediction error(i, j) MN
2
MN
m,n
22
mn m i,n j
[,t) ,t1)]
(I (I



 (2)
where M and N are macroblock sizes. For a given macroblock, the encoder first determines whether it is Motion
Compensated (MC) or Non Motion Compensated (NO_MC) and a scheme is used to determine whether the
current block is intra/inter coded based on the prediction error. The scheme can be quite complex, but the general
idea is to code the difference between target macroblock and reference macroblock when the prediction error is
small, otherwise, intra-code the macroblock. For normal scenes, prediction is performed in P and B frames.
When there is a scene change, this prediction drops significantly, which lead to some macroblocks to be encoded
in intra mode. Strictly speaking, if a frame is inside of a shot, then the macroblocks should be predicted well
from previous or next frames. However, when the frames are on the shot boundary, the frames cannot be
predicted from the related macroblocks, and a high prediction error occurs (Yeo, 1995). This causes most of the
macroblocks of the P frames to be intra coded instead of motion compensated.
Figure 3. Macroblock type modes in B pictures
The proposed technique fuses the macroblock type’s information and the motion field generated by using the
motion vectors in the MPEG stream to capture the intrinsic content of the video. Only VLC decoding and
number of macroblocks times addition operation is necessary to obtain the MB data (see the rounded rectangle
zone in Figure 1).
The proposed technique (Figure 4) is divided into two stages namely the fingerprint extraction stage and the
similarity matching stage respectively. In the first stage the compressed MPEG video clip is parsed to extract the
macroblocks information and motion vector data. Then, 10 bin one dimensional histogram is constructed using
the macroblocks types (Figure 5). The macroblocks types used to generate this histogram aggregate not only the
normal types in the normal scenes but also the types that happened in the scene change-like scenes. So tis
histogram carries important clues to the spatial and temporal content of the video. Lastly, the proposed technique
merge the before mentioned feature to the motion field feature (AlBaqir, 2009) to generate two-part fingerprint.
Let call the first part of the proposed fingerprint as MBTH (Macroblock Type Information Histogram) and the
second part MFH (Motion Field Histogram).
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
29
Figure 4. Proposed technique block diagram

t
t
123 10
MBTH X , X , X ,...X (3)
10 t
i
i1
X
(4)
where ηt is the total number of MB in frame t.
To further reduce the resultant fingerprint redundancy, the proposed technique quantizes the generated
fingerprint into a binary sequence (for each part separately as Figure 4 shows) using the median filtering. This
can be achieved by fixing a threshold level, and quantize every bin value within each histogram according to the
threshold. By having a fingerprint in a binary sequence, a more efficient matching process can be performed
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
30
using Hamming distance between the compared fingerprints. Instead of using the histogram intersection
(Equation 5) or even the Jaccard coefficient (Equation 6) to compare the resultant fingerprints.
N
(i) (i )
i1
W ( Q, R ) min( Q , R )
(5)
QR
J(Q,R ) QR
(6)
where Q and R are two pair of fingerprints, each containing N bins.
Figure 5. The macroblocks types used in generating the first part of the proposed fingerprint
The fusion of the two-part fingerprint in the second stage is performed as follows:
12 1 2
(MFH ,MFH ) 1- ) (MBTH MBTHSimilarity Score ( , )

 (7)
Where α is the fusion parameter, Ψ is the distance metric, MFHx is the MFH for video x.
4. Experimental Results
To evaluate the performance of the proposed technique, a test set of 200 videos all taken from the ocean
(ReefVid) was used. Then attacks were individually mounted on the videos, so a new test set equals to 3600
videos was generated. The mounted attacks included added watermark, mosaic effect, Embossment effect,
flipping, blindness effect, cropping, contrast adjustment, brightness modification, and bit rate change as Table 1
illustrates. Also, the hardware spec of the PC used for the experiments was an Intel dual-core running at 2 GHz
with 3GB memory. However, all tests were ran using a single core. Finally, the technique used to compare with
is the motion field technique (AlBaqir, 2009) which also operates in the compressed domain as the proposed one
to be a fair comparison.
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
31
Table 1. Distortions used in the study
Index Distortion
1 Watermark
2 Mosaic
3 Embossment
4 Horizontal Flipping
5 Vertical Flipping
6 Adding Horizontal Lines
7 Adding Vertical Lines
8 Cropping (Big Window)
9 Cropping (Small Window)
10 Contrast Adjustment(negative image)
11 Contrast Adjustment(maximum contrast)
12 Brightness Adjustment (-50%)
13 Brightness Adjustment (-25%)
14 Brightness Adjustment (+50%)
15 Brightness Adjustment (+25%)
16 Different Bit Rate(512 Kbps)
17 Different Bit Rate(800 Kbps)
Four sets of the experiments are conducted to study the following issues:
1) Determine the best value of the fusion parameter.
2) Studying the binarization issue.
3) Studying the behavior of the proposed technique against content-preserving attacks (Table 1) to investigate the
robustness and uniqueness of the proposed fingerprint.
4) Comparing the proposed work against existing technique working in the compressed domain (Motion Field
(AlBaqir, 2009)).
The average retrieval rate across all the 17-attacks for the proposed technique with and without using
binarization is shown in Figure 6. The figure shows that the proposed work is improved when using the
binarization, also using α=0.2 gives the best result in the two cases.
50
55
60
65
70
75
80
85
90
95
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Average Retrieval Ra te %
α
Proposed Technique Without Median
Proposed Technique With Median
Figure 6. The proposed technique performance using the median to binarize the proposed fingerprint
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
32
Figure 7 depict the results of comparing the proposed technique against the base line technique (the motion field
(AlBaqir, 2009)) in detailed manner using the before mentioned distortions (from 1 to 17 as mentioned in Table
1). It is clear that the proposed technique outperform the motion field technique. Finally Figure 8 concludes that
the proposed fingerprint outperform its constituent fingerprints also the macroblocks types is more important
than the motion vectors. An acceptable reasoning to the results is as follows: the motion field technique try to
build the motion trajectories of the video content depending on using the available information in the
compressed domain, but not all the macroblocks in the compressed domain carry motion information rather a lot
of them are labeled as NO_MC or skipped macroblocks. On the other hand, the proposed technique alleviate this
drawback by using the information associated with each macroblock in the compressed stream (the macroblocks
types), and by the proper merging of the motion field and the macroblocks types.
0 102030405060708090100
Wa term a rk
Mosaic
Emb ossm en t
Horizontal Flipping
Vertic al Flip ping
Adding Horizontal Lines
Add ing Vertica l Lines
Cropping (Big Window(
Cropping (Sma ll Window(
Contrast Adjustment(negative image(
Con trast Adjustmen t(max imum con trast(
Brightness Adjustment (-50%(
Brightness Adjustment (-25%(
Brightness Adjustment (+50%(
Brightness Adjustment (+25%(
Different Bit Ra te(512Kbps(
Different Bit Ra te(800Kbps(
Ave rag e Retr iev al Ra te %
Distortion
Proposed Technique Motion Field Technique
Figure 7. The performance of the proposed technique against the baseline technique across all used distortions
74.11%
82.35%
93.52%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Motion Field Feature Macroblocks Type Feature Proposed
Avera g e Retrieva l Rate %
Feature
Figure 8. Comparison of the different fingerprint extraction methods
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
33
5. Conclusion
This paper proposes a video fingerprinting method in the compressed domain that utilizes the macroblock and
the motion vectors information in a hybrid way. The proposed work gives promising results despite of its low
computational overhead against a large spectrum of the content-based video transformations. Also, the proposed
work shows that the macroblocks types is more important than the motion vectors as intrinsic content-preserving
feature in the video compressed domain. One direction for future work is combining this technique with
compressed domain watermarking methods to design a robust content management methodology and apply it in
the broadcast monitoring area. Also, the proposed work can be adapted to work in real time environment like
cellular phones associated with cloud computing metaphor.
References
AlBaqir, M. (2009). Video fingerprinting in compressed domain. MSc. Thesis, Delft university of technology.
Ardizzone, E., Cascia, M. L., Avanzato, A., & Bruna, A. (1999). Video indexing using mpeg motion
compensation vectors. In ICMCS '99: Proceedings of the IEEE International Conference on Multimedia
Computing and Systems, (Washington, DC, USA), p. 725, IEEE Computer Society.
http://dx.doi.org/10.1109/MMCS.1999.778574
Cherubini, M., de Oliveira, R., & Oliver, N. (2009). Understanding near-duplicate videos: A user-centric
approach. In Proc. ACM Conference on Multimedia, pp. 35-44.
Hampapur, A., & Bolle, R. (2000). Feature based indexing for media tracking. IEEE International Conference on
Multimedia, 3, 1709-1712.
Heath, T., Howlett, T., & Keller, J. (2002). Automatic Video Segmentation in the Compressed Domain. IEEE
Aerospace Conference.
Indyk, G., & Shivakumar, N. (1999). Finding pirated video sequences on the internet. Stanford Infolab Technical
Report.
Joly, A., Buisson, O., & Frelicot, C. (2005). Statistical similarity search applied to content-based video copy
detection. In ICDEW ’05: Proceedings of the 21st International Conference on Data Engineering
Workshops, (Washington, DC, USA), p. 1285, IEEE Computer Society.
http://dx.doi.org/10.1109/ICDE.2005.291
Joly, A., Buisson, O., & Frelicot, C. (2007). Content-based copy retrieval using distortion-based probabilistic
similarity search. IEEE Transactions on Multimedia, 9, 293-306.
http://dx.doi.org/10.1109/TMM.2006.886278
Joly, A., Frelicot, C., & Buisson, O. (2003). Robust content-based video copy identification in a large reference
database. In Proceedings of ACM International Conference on Image and Video Retrieval (CIVR), vol.
2728, pp. 511-516.
Joly, A., Frelicot, C., & Buisson, O. (2005). Content-based video copy detection in large databases: A local
fingerprints statistical similarity search approach. ICIP 2005, IEEE International Conference on Image
Processing, vol. 1, pp. I-505-8. http://dx.doi.org/10.1109/ICIP.2005.1529798
Law-To, J., Buisson, O., Gouet-Brunet, V., & Boujemaa, N. (2006). Robust voting algorithm based on labels of
behavior for video copy detection. In MULTIMEDIA '06: Proceedings of the 14th annual ACM
international conference on Multimedia, (New York, NY, USA), pp. 835-844, ACM.
Law-To, J., Gouet-Branet, V., Buisson, O., & Boujemaa, N. (2006). Local behaviours labelling for content based
video copy detection. ICPR 2006. 18th International Conference on Pattern Recognition, vol. 3, pp. 232-235.
http://dx.doi.org/10.1109/ICPR.2006.767
Liu, Y., & Yao, L. (2010). Research of Robust Video Fingerprinting. In Proceedings International Conference on
Computer Application and System Modeling, pp. 43-46.
Maria, C., & Athanassios, N. S. (2009). Real-time keyframe extraction towards video content identification. 16th
International Conference on Digital Signal Processing, pp.1-6.
Mikhalev, A. et al. (2008). Video fingerprint structure, database construction and search algorithms, Direct Video
& Audio Content Search Engine (DIVAS) project, Deliverable number D 4.2.
Moxley, E., Mei, T., & Manjunath, B. S. (2010). Video annotation through search and graph reinforcement
mining. IEEE Transactions on Multimedia, 12(3), 183-193. http://dx.doi.org/10.1109/TMM.2010.2041101
www.ccsenet.org/cis Computer and Information Science Vol. 5, No. 5; 2012
34
Naphade, M. R., Yeung, M. M., & Yeo, B. L. (1999). Novel scheme for fast and efficient video sequence
matching using compact signatures. SPIE, vol. 3972, pp. 564-572. http://dx.doi.org/10.1117/12.373590
Nianhua, X., Li, L., Xianglin, Z., & Maybank, S. (2011). A Survey on Visual Content-Based Video Indexing and
Retrieval. IEEE SMC, pp. 797-819.
Pei, S. C., & Chou, Y. Z. (1999). Efficient MPEG Compressed Video Analysis Using Macroblock Type
Information. IEEE Transactions on Multimedia, 1(4), 321-333. http://dx.doi.org/10.1109/6046.807952
Peng, C., Zhipeng, W., Shuqiang, J., & Qingming, H. (2010). Fast copy detection based on Slice Entropy
Scattergraph. IEEE International Conference on Multimedia (ICME), pp 1236-1241.
Poullot, S., Buisson, O., & Crucianu, M. (2007). Z-grid-based probabilistic retrieval for scaling up
content-based copy detection. In CIVR '07: Proceedings of the 6th ACM international conference on Image
and video retrieval, (New York, NY, USA), pp. 348-355, ACM.
ReefVid: Free Reef Video Clip Database. Retrieved from http://www.reefvid.org/
Richardson, I. E. G. (2003). H.264 and MPEG4 Video Compression - Video Coding for Next Generation
Multimedia. England: Wiley & Sons.
Saikia, N., & Bora, P. K. (2011). Robust video hashing using the 3D-DWT. National Conference on
Communications (NCC), pp 1-5. http://dx.doi.org/10.1109/NCC.2011.5734750
Sanchez, J. M., Binefa, X., Vitria, J., & Radeva, P. (1999). Local color analysis for scene break detection applied
to TV commercials recognition. In Visual Information Systems, vol. 1614 of Lecture Notes in Computer
Science, Springer Berlin / Heidelberg.
Yang, X., Tian, Q., & Chang, E. C. (2004). A color fingerprint of video shot for content identification. In
Proceedings of the 12th annual ACM international conference on Multimedia Systems, pp. 276-279.
Yeo, B. L., & Liu, B. (1995). Rapid Scene Analysis on Compressed Video. IEEE Transactions on Circuits and
Systems for Video Technology, 5(6), 533-544. http://dx.doi.org/10.1109/76.475896
Young-min, K., Sung, W. C., & Seong-whan, L. (2000). Fast Scene Change Detection Using Direct Feature
Extraction from MPEG Compressed Videos. International Conference on Pattern Recognition (ICPR'00),
vol. 3.
Zhao, W. L., Ngo, C. W., Tan, H. K., & Wu, X. (2007). Near-duplicate Keyframe identification with interest
point matching and pattern learning. IEEE Transactions on Multimedia, 9, 1037-1048.
http://dx.doi.org/10.1109/TMM.2007.898928
... Abbass S. Abbass et al. [37] introduced MPEG based information for generating and embedding the fingerprint into a video that works directly in a compressed domain. The scheme combined the macro blocks and motion vector information for embedding purpose, gives promising results for the content based video copy detection. ...
Article
Full-text available
The present review covers the literature of digital video watermarking schemes based on considering the video signal while encoding or based on encoded video in an increasing order of their publication year from 1998 to 2016. Through extensive literature, some grouping is required. Therefore, only papers published by a method of peer review in high class journal are reviewed. Most of the papers are deals with MPEG structure. Few papers are following the structure of H.264/AVC. These articles include experimental, analytical and numerical work for specific applications. This review also takes papers from good conferences and meeting on video watermarking in addition to above reviewing journal articles. The limitations and the still existing challenges are described in the later part of the review facilitate for those authors who further require doing some innovative work in this field. Eventually, the aim of the authors is to covers the in-depth discussion on already published literature based on compressed domain video watermarking.
... x n ). Equation 3 shows the exponential function. However, in this case we have the first order exponential c = 0 and the second order c = 0. {a, b, c} represents the resultant fitted coefficients that will be used instead of the original curve values based on the order of the function (first/second). ...
Article
Full-text available
This paper presents a fast and effective technique for videos’ visual similarity detection and measurement using compact fixed-length signatures. The proposed technique facilitates for building real-time and scalable video matching/retrieval systems through generating a representative signature for a given video shot. The generated signature (Statistical Dominant Colour Profile, SDCP) effectively encodes the colours’ spatio-temporal patterns in a given shot, towards a robust real-time matching. Furthermore, the SDCP signature is engineered to better address the visual similarity problem, through its relaxed representation of shot contents. The compact fixed-length aspect of the proposed signature is the key to its high matching speed (>1000 fps) compared to the current techniques that relies on exhaustive processing, such as dense trajectories. The SDCP signature encodes a given video shot with only 294 values, regardless of the shot length, which facilitates for speedy signature extraction and matching. To maximize the benefit of the proposed technique, compressed-domain videos are utilized as a case study following their wide availability. However, the proposed technique avoids full video decompression and operates on tiny frames, rather than full-size decompressed frames. This is achievable through using the tiny DC-images sequence of the MPEG compressed stream. The experiments on various standard and challenging datasets (e.g. UCF101 13k videos) shows the technique’s robust performance, in terms of both, retrieval ability and computational performances.
... Spatial Partial decompression needed for extraction. Suitable for copy detection and fingerprinting [13,17]. ...
Conference Paper
Full-text available
Abstract— This paper presents a novel technique for efficient and generic matching of compressed video shots, through compact signatures extracted directly without decompression. The compact signature is based on the Dominant Color Profile (DCP); a sequence of dominant colors extracted and arranged as a sequence of spikes in analogy to the human retinal representation of a scene. The proposed signature represents a given video shot with ~490 integer values, facilitating for realtime processing to retrieve a maximum set of matching videos. The technique is able to work directly on MPEG compressed videos, without full decompression, as it utilizes the DC-image as a base for extracting color features. The DC-image has a highly reduced size, while retaining most of visual aspects, and provides high performance compared to the full I-frame. The experiments and results on various standard datasets show the promising performance, both the accuracy and the efficient computation complexity, of the proposed technique.
Chapter
Full-text available
The quantity of digital videos is huge, due to technological advances in video capture, storage and compression. However, the usefulness of these enormous volumes is limited by the effectiveness of content-based video retrieval systems (CBVR). Video matching for the retrieval purpose is the core of these CBVR systems where videos are matched based on their respective visual features and their evolvement across video frames. Also, it acts as an essential foundational layer to infer semantic similarity at advanced stage, in collaboration with metadata. This chapter presents and discusses the core field concepts, problems and recent trends. This will provide the reader with the required amount of knowledge to select suitable features’ set and adequate techniques to develop robust research in this field.
Conference Paper
Full-text available
Keyframe extraction constitutes a fundamental unit in many video retrieval-related applications. In the emerging research field of content-based video copy detection, efficient representation of video content at keyframe level is crucial, due to the fact that similarity search is mainly performed between content-representative frames. In this paper a sequential search algorithm that bypasses the process of temporal video segmentation is proposed for keyframe extraction in MPEG videos. We aim at providing an efficient, real-time and fully-automatic way of extracting keyframes in videos, where not only the laborious task of offline video database indexing is avoided, but also query video processing is performed in the same manner as the reference video database. Significant reduction in computational cost is achieved by exploiting DCT coefficients in feature extraction. The effectiveness of the proposed scheme is evaluated in terms of quality and speed on the manually annotated TRECVid 2007 test video dataset.
Conference Paper
Full-text available
With the exponential growth of digital video resources, huge amount of videos are uploaded onto the Internet. Therefore, the Content Based Copy Detection (CBCD) issue becomes a hot research topic and has been extensively studied recently. However, most of the approaches lack the power to efficiently handle large data corpus while maintaining a good detection quality. In this paper, we propose a fast CBCD approach based on the Slice Entropy Scattergraph (SES). SES employs video spatio-temporal slices which can greatly decrease the storage and computational complexity. It is based on entropy and its deviation so as to preserve as much as the video information. Besides, SES takes advantage of a scattergraph which is succinct and efficient to plot the distribution of video content. To effectively describe SES, we introduce three descriptors: Projection Histograms, Shape Contexts and Polynomial Coefficients. The experiments on CIVR'07 Copy Detection Corpus and Video Transformation Corpus show the performance improvement of our approach both on efficiency and effectiveness.
Conference Paper
Full-text available
This paper presents an efcient approach for copies detection in a large videos archive consisting of several hundred of hours. The video content indexing method consists of extracting the dynamic behavior on the local description of interest points and further on the estimation of their trajectories along the video sequence. Ana- lyzing the low-level description obtained allows to highlight trends of behaviors and then to assign a label of behavior to each local descriptor. Such an indexing approach has several interesting prop- erties: it provides a rich, compact and generic description, while labels of behavior provide a high-level description of the video content. Here, we focus on video Content Based Copy Detection (CBCD). Copy detection is problematic as similarity search prob- lem but with prominent differences. To be efcient, it requires a dedicated on-line retrieval method based on a specic voting func- tion. This voting function must be robust to signal transformations and discriminating versus high similarities which are not copies. The method we propose in this paper is a dedicated on-line retrieval method based on a combination of the different dynamic contexts computed during the off-line indexing. A spatio-temporal registra- tion based on the relevant combination of detected labels is then applied. This approach is evaluated using a huge video database of 300 hours with different video tests. The method is compared to a state-of-the art technique in the same conditions. We illustrate that taking labels into account in the specic voting process reduces false alarms signicantly and drastically improves the precision.
Conference Paper
Robust or perceptual hash functions for video extract hashes on the basis of the perceptual contents of videos. Besides being sensitive to content differences in distinct videos, such a hash function should be robust against the content-preserving operations. This paper presents a solution for the hashing of videos at the group-of-frames level using the 3-dimensional discrete wavelet transform. A hash of a group-of-frames is computed from the spatio-temporal low-pass band of the transform. The local variances of the wavelet coefficients in the band are used to derive the hash. Experimental results demonstrate the robustness of the hash function against content-preserving operations and the sensitivity to content differences.
Conference Paper
The performance of a video fingerprinting system, which is usually measured in terms of pairwise independence and robustness, is directly related to the fingerprint that the system uses. This paper has proposed a novel fingerprinting methods base on the gradient orientations of the luminance centroid, which is not only pairwisely independent but also robust against common video processing steps including lossy compression, resizing, frame rate change, global change in brightness, color, gamma, etc.
Article
Unlimited vocabulary annotation of multimedia documents remains elusive despite progress solving the problem in the case of a small, fixed lexicon. Taking advantage of the repetitive nature of modern information and online media databases with independent annotation instances, we present an approach to automatically annotate multimedia documents that uses mining techniques to discover new annotations from similar documents and to filter existing incorrect annotations. The annotation set is not limited to words that have training data or for which models have been created. It is limited only by the words in the collective annotation vocabulary of all the database documents. A graph reinforcement method driven by a particular modality (e.g., visual) is used to determine the contribution of a similar document to the annotation target. The graph supplies possible annotations of a different modality (e.g., text) that can be mined for annotations of the target. Experiments are performed using videos crawled from YouTube. A customized precision-recall metric shows that the annotations obtained using the proposed method are superior to those originally existing for the document. These extended, filtered tags are also superior to a state-of-the-art semi-supervised technique for graph reinforcement learning on the initial user-supplied annotations.
Conference Paper
In this paper we propose a novel space-time color feature representation for video shot and apply it to content identification. In this representation the shot is cut into k equal size segments, and each segment is represented by a blending image formed through averaging the pixels' values of each frame in this segment along time direction. Each blending image is then divided into equal size blocks, and two color patterns named major and minor colors among mean R,G,B are extracted for each block. Hence each shot can be represented by a fixed-length string. The experiment shows this representation is not only robust to image quality reduction, frame size and frame rate change, but also to color distortion such as brightness/contrast adjustment. We also give a video similarity measure based on this color feature to identify shot chunks. We conducted experiment on 100 video clips, and quite low error rates can be achieved when identifying small size shot chunks with significant color distortion. From the experiment we believe that this color feature is a compact and robust representation for video content, and effective for content identification.