ArticlePDF Available

CNN spatiotemporal features and fusion for surveillance video forgery detection

Authors:

Abstract and Figures

Surveillance cameras are widely used to provide protection and security; also their videos are used as strong evidence in the courts. Through the availability of video editing tools, it has become easy to distort these evidence. Sometimes, to hide the traces of forgery, some post-processing operations are performed after editing. Hence, the authenticity and integrity of surveillance videos have become urgent to scientifically validate. In this paper, we propose inter-frame forgeries (frame deletion, frame insertion, and frame duplication) detection system using 2D convolution neural network (2D-CNN) of spatiotemporal information and fusion for deep automatically feature extraction; Gaussian RBF multi-class support vector machine (RBF-MSVM) is used for the classification process. The experimental results show that the efficiency of the proposed system for detecting all inter-frame forgeries, even when the forged videos have undergone additional post-processing operations such as Gaussian noise, Gaussian blurring, brightness modifications, and compression.
Content may be subject to copyright.
A preview of the PDF is not available
... Various video tampering detection techniques have been proposed in the literature to detect inter-frame tampering; these techniques are based on extracting manual features, such as statistical features [10][11][12], pixel and texture characteristics [13][14][15], motion residual, and optical flow [16][17][18], and a few are based on deep learning [8,[19][20][21]. The manual features are sensitive to post-processing operations like blurring, brightness, noise, and compression. ...
... First, there is limited applicability; many video tampering detection techniques are restricted by factors like the number of tampered frames, frame rate, and video format, which limits their practical use [25,26]. For example, the deep learningbased method proposed in Ref. [21] can only detect inter-frame tampering if the tampered frames exist in multiples of 10, failing if there are fewer than 25 tampered frames. Similarly, the method proposed by Bakas and Naskar in [27] cannot detect frame duplication of more than 20 frames. ...
... Moreover, the proposed technique does not impose any constraint on the minimum number of inserted/deleted frames in a video to make the tampering detectable; it can detect the insertion and deletion of as few as ten frames, along with the type of tampering. On the contrary, the method in Ref. [21] detects tampering if tampered frames exist in multiples of 10 and cannot detect tampering of less than 25 frames. ...
Article
Full-text available
Inter-frame tampering in surveillance videos undermines the integrity of video evidence, potentially influencing law enforcement investigations and court decisions. This type of tampering is the most common tampering method, often imperceptible to the human eye. Until now, various algorithms have been proposed to identify such tampering, based on handcrafted features. Automatic detection, localization, and determine the tampering type, while maintaining accuracy and processing speed, is still a challenge. We propose a novel method for detecting inter-frame tampering by exploiting a 2D convolution neural network (2D-CNN) of spatiotemporal information and fusion for deep automatic feature extraction, employing an autoencoder to significantly reduce the computational overhead by reducing the dimensionality of the feature’s space; analyzing long-range dependencies within video frames using long short-term memory (LSTM) and gated recurrent units (GRU), which helps to detect tampering traces; and finally, adding a fully connected layer (FC), with softmax activation for classification. The structural similarity index measure (SSIM) is utilized to localize tampering. We perform extensive experiments on datasets, comprised of challenging videos with different complexity levels. The results demonstrate that the proposed method can identify and pinpoint tampering regions with more than 90% accuracy, irrespective of video frame rates, video formats, number of tampering frames, and the compression quality factor.
... Fadl et al. [60] discovered frame duplication and frame shuffling using a temporal average of each shot and statistical textural features in the digital videos. Fadl et al. [61] found the frame insertion, deletion and duplication using 2D convolution neural network of spatiotemporal information and fusion. ...
... Fadl et al. [61] have represented a technique in which an input digital video is divided into smaller shots having size z for each shot. Then, STP algorithm is used to fuse the spatial and temporal information and results in STP images. ...
... Wang et al. [20] √ × × √ × × × × × Hsu et al. [12] √ × × × × × × × × Wang et al. [21] √ × × × × × × × × Su et al. [22] × × √ × × × × × × Kobayashi et al. [23] √ × × × × × × × × Su et al. [24] × × × × × × × × √ Xu et al. [25] × × × × × × × × √ Lin et al. [26] × × × √ × × × × × Subramanyam et al. [28] √ × × × × × × × × Bestagini et al. [90] √ × × × × × × × × Liao et al. [29] × × × √ × × × × × Karthikasini et al. [30] √ × × × × × × × × Pandey et al. [19] √ × × × × × × × × Wu et al. [31] × × × × √ × × × × Zhang et al. [32] × √ × × × × × × × Wang et al. [33] × √ × × × × × × × Gironi et al. [34] × √ × × × × × × × Anshida et al. [6] √ × × × × × × × × Bidokhti et al. [35] √ × × × × × × × × Zheng et al. [36] × √ × × × × × × × Singh et al. [37] × × × √ × × × × × Su et al. [93] √ × × × × × × × × Bagiwa et al. [38] × × × × × × × × √ Yang et al. [39] × × × √ × × × × × Aghamaleki et al. [40] × √ × × × × × × × Yu et al. [41] × × √ × × × × × × Ulutas et al. [42] × × × × × × × √ × Bozkurt et al. [43] × × × √ × × × × × Liu et al. [44] × × × √ × × × × × Kingra et al. [45] × × × × × √ × × × Liu et al. [46] × × × × × × × × √ Yao Y et al. [47] √ × × × × × × × × Long et al. [48] × × × √ × × × × × Ulutas et al. [49] × √ × × × × × × × Li et al. [50] × √ × × × × × × × Zhao et al. [51] × × × × × √ × × × Su et al. [53] √ × × × × × × × × Long et al. [54] × √ × × × × × × × Singh et al. [56] √ × × √ × × × × × Saddique et al. [57] √ × × × × × × × × Bakas J et al. [58] × × √ × × × × × × Kharat et al. [59] × × × √ × × × × × Fadl et al. [60] × × × × × × √ × × Fadl et al. [61] × × × × × √ × × × Kaur et al. [78] √ × × × × × × × × Ren et al. [62] × × × √ × × × × × ...
Article
Full-text available
In the real-world scenario, the digital videos are captured to keep in records the life related memories, security purpose, truthful evidence etc. Due to rise in multimedia technology, the digital videos are mostly posted over social websites and applications such as Facebook, Instagram, What’s App, YouTube, etc. through the internet in daily routine. The contents of these social digital videos are manipulated by easily available editing tools and software. Therefore, the security of digital videos over social media are the most important requirement of people. This paper presents a systematic survey on Copy-Move forgery and its detection with several passive techniques in the digital videos. At first, Copy-Move forgery is introduced with its various types in this paper. Then, a survey is provided on the existing passive techniques for detecting Copy-Move forgery in digital videos. Here, brief review of techniques is also presented in the tabular forms with their used features, datasets, parameters, type of forgery and their limitations. Furthermore, this survey provides the parameters and datasets in detail which are used for the evaluation of existing techniques. Besides, the detecting tools for Copy-Move video forgery and its future directions are detailed in this survey. This paper also provides various new challenges for automatic detection of Copy-Move video forgery in the realm of deep learning.
... This technique suffered from the huge cost of computational, and complexity. Fadl et al., [6] devised a technique considering inter-frame forgeries for forgery detection. The technique considers four phases, namely deletion of the frame, insertion of the frame, and duplication of frames utilizing 2-Dimemsional CNN (2D-CNN) of spatio temporal data and fusing deep features for detecting the forgeries. ...
... An analysis of the developed technique is estimated by comparison with classical approaches, such as DA-Taylor-ROA-enabled DCNN [28], 3FAT [11], 2D-CNN [6], AVIBE [26], PSO-based RMDL, Genetic algorithm (GA)based RMDL, CSO-based RMDL, SFO-based RMDL, and proposed CSSFOA-based RMDL. ...
Article
Full-text available
Nowadays, a surveillance camera is used extensively to provide security. With easier accessibility of tools like video editing, it became simple to destroy evidence. Different detection techniques are in practice howsoever due to many reasons it is confined. Hence, the competitive swarm sunflower optimization algorithm (CSSFOA)-based random multimodal deep learning (RMDL) is proposed for discovering forgeries. CSSFOA is the integration of competitive swarm optimizer and sunflower optimization. Here, extraction of the keyframe is carried out utilizing discrete cosine transform and Tanimoto distance. Also, by using the Viola-Jones algorithm, face detection is performed considering the light coefficients and face image coefficients by extracting the local optimal oriented pattern. The deep composite images are obtained utilizing RMDL. RMDL is trained utilizing developed CSSFOA. The proposed CSSFOA-based RMDL shows superior performance with maximum accuracy of 96.6%, true positive rate of 95.0%, and true negative rate of 95.5%.
... It is a complex task, which is cautiously constructed inter/interframe forgery by the above-stated machine learning techniques achieved by constant statistical measures. It is necessary because the relevant objects copied and background of the frame pasted is shot under specific surveillance camera, these techniques exhibit similar statistical applications and are hence in differentiable [7]- [10]. CMF mechanism seems to be the most challenging problem to tackle in the field of video forensics. ...
Article
Full-text available
The technique of video copy-move forgery (CMF) is commonly employed in various industries; digital videography is regularly used as the foundation for vital graphic evidence that may be modified using the aforementioned method. Recently in the past few decades, forgery in digital images is detected via machine intellect. The second issue includes continuous allocation of parallel frames having relevant backgrounds erroneously results in false implications, detected as CMF regions third include as the CMF is divided into inter-frame or intra-frame forgeries to detect video copy is not possible by most of the existing methods. Thus, this research presents the dual deep network (DDN) for efficient and effective video copy-move forgery detection (VCMFD); DDN comprises two networks; the first detection network (DetNet1) extracts the general deep features and second detection network (DetNet2) extracts the custom deep features; both the network are interconnected as the output of DetNet1 is given to DetNet2. Furthermore, a novel algorithm is introduced for forged frame detection and optimization of the falsely detected frame. DDN is evaluated considering the two benchmark datasets REWIND and video tampering dataset (VTD) considering different metrics; furthermore, evaluation is carried through comparing the recent existing model. DDN outperforms the existing model in terms of various metrics.
... The authors made available a video inter-frame forgery dataset (VIFFD) that includes duplication, insertion, and deletion forgeries for experimental analysis. A 2D convolutional neural network (2D CNN) and a structural similarity (SSIM) evaluation were employed by the researcher [141] to identify inter-frame tampering in footage obtained from surveillance cameras. The categorization in the study is done with a Gaussian RBF multi-class SVM. ...
Article
Full-text available
People across the world aspire to settle in urban areas for better opportunities in career, education, and healthcare facilities. The increased proportion of people living in urban areas requires an improvement of smart habitat(s) robust enough to deal with the daily needs of citizens such as personal information management, security and surveillance to deal with anomalous activities in real-time. The present paper aims to provide an extensive review based on 213 research articles published from 2001 to 2023 highlighting various technologies for smart cities and intelligent video surveillance techniques, in order to: (i) Highlight the significance of smart city surveillance as well as the current research tendencies in this field. (ii) To present and explicate a standardized model of a smart city based on video surveillance. (iii) To analyze the current status and highlight challenges, and limitations of surveillance systems in different smart city applications. The paper outlines the critical role of video surveillance as a necessary feature of every subdomain of the smart city model. The fundamental element that defines the soon-to-come victorious period is the most recent technological developments for the detection of anomalous activity, fire, digital tampering, and objects, which are thoroughly examined in existing research papers and elucidated. The article further presents a well outlined bibliographic classification of state-of-the-art techniques. A comparison of the existing video surveillance datasets has also been thoroughly analyzed. Finally, the current work identifies major research challenges and future opportunities in this domain.
... However, because this system required tailored training for various settings, the exceedingly sluggish training tempo limited its full implementation. Sondos Fadl et al. [35] suggested using a Gaussian RBF multiple class SVM (RBF-MSVM) to identify fake surveillance videos. ...
... Ren et al. [21] employs spatial-temporal inconsistencies of the forged video for forgery detection and introduces a new temporal modeling paradigm in the temporal inconsistency module. Fadl et al. [22] introduces an inter-frame forgery detection system using spatial-temporal information and an integrated two-dimensional convolutional neural network (2D-CNN), which automatically performs deep feature extraction and utilizes a Gaussian RBF multi-class support vector machine for the classification process. The spatial-temporal fusion feature-based detection method can consider the temporal and spatial features of the video in an integrated manner, thus providing a more comprehensive framework for video face forgery detection. ...
Article
Full-text available
In recent years, the nefarious exploitation of video face forgery technology has emerged as a grave threat, not only to personal property security but also to the broader stability of states and societies. Although numerous models and methods have emerged for video face forgery detection, these methods fall short in recognizing subtle traces of forgery in local regions, and the performance of the detection models is often affected to some extent when dealing with specific forgery strategies. To solve this problem, we propose a model based on multiple feature fusion network (MFF-Net) for video face forgery detection. The model employs Res2Net50 to extract texture features of the video, which realizes deeper texture feature extraction. By integrating the extracted texture and frequency feature into a temporal feature extraction module, which includes a three-layer LSTM network, the detection model fully incorporates the diverse features of the video information, thus identifying the subtle artifacts more effectively. To further enhance the discrimination ability of the model, we have also introduced a texture activation module (TAM) in the texture feature extraction section. It helps to enhance the saliency of subtle forgery traces, thus improving the detection of specific forgery strategies. In order to verify the effectiveness of the proposed method, we conduct experiments on several generalized datasets such as FaceForensics++ and DFD. The experimental results demonstrate that the MFF-Net model can recognize subtle forgery traces more effectively, especially in the case of a particular forgery strategy, and the model exhibits excellent performance and high detection accuracy.
Article
Dijital multimedya verilerinin bütünlüğünün doğrulanması konusundaki araştırmalar son yıllarda hız kazanmıştır. Buna bağlı olarak da dijital multimedya güvenliği üzerine yapılan çalışmaların sayısının gün geçtikçe arttığı gözlemlenmiştir. Bu da dijital multimedya güvenliği konusundaki çalışmaların hala güncel ve aktif bir araştırma alanı olduğunu göstermektedir. Ses, görüntü ve video alanlarında profesyonel bir eğitim almamış kişiler cep telefonları, akıllı cihazlar, çeşitli web uygulamaları vb. gibi araçlar üzerinden ses, görüntü ve video verileri üzerinde kolayca değişiklik yapabilmektedir. Yapılan bu değişiklikler ise verilerin doğruluğunu, bütünlüğünü ve gerçekliğini bozmaktadır. Bütünlüğü ve gerçekliği bozulmuş bu veriler adli makamları yanıltma, kamu düzenini bozma, mahkemede sahte delil olarak kullanma ve otomatik konuşmacı doğrulama sistemlerini yanıltma vb. gibi çeşitli amaçlar için kullanılabilmektedir. Bu sebepten günümüzde dijital multimedya verileri üzerinde yapılan sahteciliklerin tespit edilmesi oldukça önemli bir konudur. Yapılan çalışmalar, dijital multimedya verileri üzerindeki sahtecilik tespit yöntemlerini aktif ve pasif teknikler olmak üzere iki kategori altında toplamıştır. Literatürde özellikle ses sinyalleri başta olmak üzere dijital veriler üzerinde yapılan sahteciliklerin tespiti için aktif teknikler üzerine yoğunlaşıldığı pasif teknikler üzerine yapılan çalışmaların aktif tekniklere göre nispeten daha az olduğu tespit edilmiştir. Bu araştırma makalesinde pasif tekniklerden kopyala-yapıştır ve birleştirme sahtecilik tespitleri ile ilgili son yıllarda yapılmış olan çalışmaların kategorize edilmesi amaçlanmıştır.
Article
Full-text available
In this paper, we present a passive blind scheme consisting of two different algorithms to detect frame and region duplication forgeries in videos. We have examined the video frame duplication forgery in three different forms such as duplication of a sequence of consecutive video frames at long continue running position, duplication of many such sequences having different lengths at many different locations and duplication from other videos having different and same dimensions which can raise a serious problem in the real world scenario. The algorithm I of proposed scheme has detected these three different forms of copy-moved frame duplication forgery in videos by obtaining the mean features of each video frame for evaluating the correlation between sequences. In this paper, we have also analysed forged regular and irregular region within same frame at different locations and from other frame to one or more sequences of consecutive frames of the same video at same locations. It creates a challenge to detect this copy-move forgery due to slightly change in pixel intensity values in the duplicated region and providing high correlation as authentic region. The algorithm II of proposed scheme has detected these copy-moved region duplication forgeries in videos by locating the position of error with threshold process in order to calculate the similarities between regions of two frames or within affected frame. In this paper, the experimental results show the higher detection accuracy and execution time efficiency of proposed scheme than the latest algorithms with satisfactory performance.
Article
Full-text available
Frame insertion, deletion and duplication are common inter-frame tampering operations in digital videos. In this paper, based on similarity analysis, a passive-blind forensics scheme for video shots is proposed to detect inter-frame forgeries. This method is composed of two parts: HSV (Hue-Saturation-Value) color histogram comparison and SURF (Speeded Up Robust Features) feature extraction together with FLANN (Fast Library for Approximate Nearest Neighbors) matching for double-checking. We mainly calculate H-S and S-V color histograms of every frame in a video shot and compare the similarity between histograms to detect and locate tampered frames in the shot. Then we utilize SURF feature extraction and FLANN matching to further confirm the forgery types in the tampered locations. Experimental results demonstrate that the proposed detection method is efficient and accurate in terms of forgery identification and localization. In contrast to other inter-frame forgery detection methods, our scheme can detect three kinds of forgery operations and has its own superiority and applicability as a passive-blind detection method.
Article
Full-text available
Duplicated sequence of frames in a video to cover up or replicate a scene is a video forgery. There are methods to authenticate video files, but embedding authentication information into videos requires extra hardware or software. It is possible to detect frame duplication forgery by carefully inspecting the content to discover high correlation among group of frames. A new frame duplication detection method based on Bag-of-Words (BoW) model is proposed in this paper. BoW is a model used in textual analysis first and image and video retrieval later by researchers. We used BoW to create visual words and build a dictionary from Scale Independent Feature Transform (SIFT) keypoints of frames in video. Frame features, i.e., visual word representations at keypoints, are used to detect sequence of duplicated parts in the video. The method computes thresholds depending on the content to improve both robustness and performance. The proposed method is tested on 31 test videos selected from Surrey University Library for Forensic Analysis (SULFA) and from various movies. Experimental results show a better detection performance and reduced run time compared to similar methods reported in the literature.
Article
Full-text available
In the midst of low cost and easy-to-use multimedia editing software, which make it exceedingly simple to tamper with digital content, the domain of digital multimedia forensics has attained considerable significance. This research domain deals with production of tools and techniques that enable authentication of digital evidence prior to its use in various critical and consequential matters, such as politics, criminal investigations, defense planning. This paper presents a forensic scheme for detection of frame-based tampering in digital videos, especially those captured by surveillance cameras. Frame-based tampering, which involves insertion, removal or duplication of frames into or from video sequences, is usually very difficult to detect via simple visual inspection. Such forgeries, however, disturb the temporal correlation among successive frames of the tampered video. These disturbances, when analyzed in an appropriate manner, help reveal the evidence of forgery. The forensic technique presented in this paper relies on objective analysis of prediction residual and optical flow gradients for the detection of frame-based tampering in MPEG-2 and H.264 encoded videos. The proposed technique is also capable of determining the exact location of the forgery in the given video sequence. Results of extensive experimentation in diverse and realistic forensic set-ups show that the proposed technique can detect and locate tampering with an average accuracy of 83% and 80% respectively, regardless of the number of frames inserted, removed or duplicated.
Article
Videos are acceptable as evidence in the court of law, provided its authenticity and integrity are scientifically validated. Videos recorded by surveillance systems are susceptible to malicious alterations of visual content by perpetrators locally or remotely. Such malicious alterations of video contents (called video forgeries) are categorized into inter-frame and intra-frame forgeries. In this paper, we propose inter-frame forgery detection techniques using tamper traces from spatio-temporal and compressed domains. Pristine videos containing frames that are recorded during sudden camera zooming event, may get wrongly classified as tampered videos leading to an increase in false positives. To address this issue, we propose a method for zooming detection and it is incorporated in video tampering detection. Frame shuffling detection, which was not explored so far is also addressed in our work. Our method is capable of differentiating various inter-frame tamper events and its localization in the temporal domain. The proposed system is tested on 23,586 videos of which 2346 are pristine and rest of them are candidates of inter-frame forged videos. Experimental results show that we have successfully detected frame shuffling with encouraging accuracy rates. We have achieved improved accuracy on forgery detection in frame insertion, frame deletion and frame duplication.
Article
Nowadays, surveillance systems are used to control crimes. Therefore, the authenticity of digital video increases the accuracy of deciding to admit the digital video as legal evidence or not. Inter-frame duplication forgery is the most common type of video forgery methods. However, many existing methods have been proposed for detecting this type of forgery and these methods require high computational time and impractical. In this study, we propose an efficient inter-frame duplication detection algorithm based on standard deviation of residual frames. Standard deviation of residual frame is applied to select some frames and ignore others, which represent a static scene. Then, the entropy of discrete cosine transform coefficients is calculated for each selected residual frame to represent its discriminating feature. Duplicated frames are then detected exactly using subsequence feature analysis. The experimental results demonstrated that the proposed method is effective to identify inter-frame duplication forgery with localization and acceptable running time.