Fig 8 - uploaded by Gaobo Yang
Content may be subject to copyright.
An example of CIP. 

An example of CIP. 

Source publication
Article
Full-text available
Motion-compensated frame-interpolation (MCFI), a frame interpolation technique to increase the motion continuity of low frame-rate video, can be utilized by counterfeiters for faking high bitrate video or splicing videos with different frame-rates. For existing MCFI detectors, their performances are degraded under real-world scenarios such as H.264...

Context in source publication

Context 1
... 4. Use the selected IP to check the identified results of interpolated frames, and correct abnormal frames. Figure 8 shows an example of CIP. How to label candidate IP is a key issue for CIP. ...

Similar publications

Article
Full-text available
As a simple yet effective operation, frame deletion is widely used in video forgery. Many video forensic techniques have been developed to detect this manipulation. Some inter-frame continuity based methods are capable to detect frame deletion as well as locate the frame deletion points precisely. However, due to the simple principle, this kind of...

Citations

... In recent years, Discrete Tchebichef Transforms (DTTs) have gained increasing popularity as an effective signal processing tool for video technology. The DTT computes Tchebichef Moments (TMs) and has been applied in various signal processing applications including image analysis [32], security [6,21], digital watermarking [14], pattern recognition [46], interpolating frames in video [12], and image and video coding [9,25]. While the DTT exhibits very close properties to the DCT, such as coefficient uncorrelation and energy compaction, it has been shown to provide better image compression results in some cases [13,40,43]. ...
... The images were partitioned into 8 × 8 blocks and transformed using the proposed matrices and matrices from [29,34,35,40] based on Eqs. (4), (9), (12), (15), (34), and (35). The transformed images were then compressed using a range of quality factors, and their quality was assessed using two image quality metrics, namely peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). ...
Article
Full-text available
In recent years, the Discrete Tchebichef Transform (DTT) has gained popularity as a signal processing tool for image and video compression due to its efficient coding and decorrelation properties. However, in the context of real-time applications and embedded systems, it is critical to develop approximate algorithms with reduced complexity and energy consumption. While three DTT approximations have been proposed to date, there is still room for further improvements. To address this gap, we propose two new low-complexity DTT approximations that employ a modified deviation metric, resulting in better compression efficiency and reduced complexity. We validate our proposed methods by implementing them on the Xilinx Virtex-6 XC6VSX475T-1FF1759-2 Field Programmable Gate Array (FPGA) through rapid prototyping. Our proposed transformations exhibit superior performance in terms of hardware resources and energy consumption, particularly for 1D 8 inputs. Furthermore, compared to the state-of-the-art DTT approximations in image compression, our proposed transformations demonstrate a quality gain of up to 2 dB. Overall, our proposed approximations provide a promising trade-off between image quality, hardware resources, and energy consumption, making them ideal for real-time applications and embedded systems.
... R ECENTLY, face manipulation techniques empowered by deep generative models have made considerable progress [1]- [4], which makes Deepfake media more lifelike. Realistic Deepfake videos are likely to be utilized by attackers for malicious purposes, such as creating and distributing fake news, defaming celebrities, leading to serious security problems. ...
Preprint
Full-text available
Deepfake technologies empowered by deep learning are rapidly evolving, creating new security concerns for society. Existing multimodal detection methods usually capture audio-visual inconsistencies to expose Deepfake videos. More seriously, the advanced Deepfake technology realizes the audio-visual calibration of the critical phoneme-viseme regions, achieving a more realistic tampering effect, which brings new challenges. To address this problem, we propose a novel Deepfake detection method to mine the correlation between Non-critical Phonemes and Visemes, termed NPVForensics. Firstly, we propose the Local Feature Aggregation block with Swin Transformer (LFA-ST) to construct non-critical phoneme-viseme and corresponding facial feature streams effectively. Secondly, we design a loss function for the fine-grained motion of the talking face to measure the evolutionary consistency of non-critical phoneme-viseme. Next, we design a phoneme-viseme awareness module for cross-modal feature fusion and representation alignment, so that the modality gap can be reduced and the intrinsic complementarity of the two modalities can be better explored. Finally, a self-supervised pre-training strategy is leveraged to thoroughly learn the audio-visual correspondences in natural videos. In this manner, our model can be easily adapted to the downstream Deepfake datasets with fine-tuning. Extensive experiments on existing benchmarks demonstrate that the proposed approach outperforms state-of-the-art methods.
... The video forensics, which attempts to verify the trustworthiness of digital videos, has attracted wide attentions in the information security community for more than a decade [14], [15]. All kinds of forensics strategies were developed to detect both intra-frame, and inter-frame manipulations, which are referred to as spatial-domain [16], [17], and temporal-domain forgeries [18], [19], [20], respectively. There were also some researches focusing on the blind forensics of conventional video inpainting techniques, summarized as follows. ...
... Currently, deep learning-based approaches for detecting recompression [18,34], frame dropping [26], and deepfake [14,40] are presented, while a CNN-based forensic system for detecting FRUC forensic clues does not exist. Although conventional approaches [8,9,16,21,23,[48][49][50] have exhibited high performance, they did not consider the robustness of the various types of interpolation methods (e.g., nearest neighbor interpolation (NNI), bilinear interpolation (BI), and motion-compensated interpolation (MCI)), which is an important requirement for FRUC detection. Furthermore, the approaches require considerable processing time to perform detection due to the need to observe all frames. ...
... Although these two approaches [49,50] perform well for high-quality video, their performance degrades for low-quality or low-motion videos. Subsequently, Ding et al. analyzed residual energy distribution within interpolated frames and modeled temporal inconsistencies in artifact regions using Tchebichef moments as shape descriptors [16]. This method is robust for signal processing operations such as blurring and noise addition. ...
... The described previous studies [8,9,16,21,23,[48][49][50] have exhibited high performance for specific targeted interpolation algorithms. However, they do not fully satisfy the fundamental requirements of video forensics for detecting frame-rate conversion because they are not robust against various types of interpolation schemes (e.g., NNI, BI, and MCI). ...
Preprint
With the advance in user-friendly and powerful video editing tools, anyone can easily manipulate videos without leaving prominent visual traces. Frame-rate up-conversion (FRUC), a representative temporal-domain operation, increases the motion continuity of videos with a lower frame-rate and is used by malicious counterfeiters in video tampering such as generating fake frame-rate video without improving the quality or mixing temporally spliced videos. FRUC is based on frame interpolation schemes and subtle artifacts that remain in interpolated frames are often difficult to distinguish. Hence, detecting such forgery traces is a critical issue in video forensics. This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion. The proposed network uses a stack of consecutive frames as the input and effectively learns interpolation artifacts using network blocks to learn spatiotemporal features. This study is the first attempt to apply a neural network to the detection of FRUC. Moreover, it can cover the following three types of frame interpolation schemes: nearest neighbor interpolation, bilinear interpolation, and motion-compensated interpolation. In contrast to existing methods that exploit all frames to verify integrity, the proposed approach achieves a high detection speed because it observes only six frames to test its authenticity. Extensive experiments were conducted with conventional forensic methods and neural networks for video forensic tasks to validate our research. The proposed network achieved state-of-the-art performance in terms of detecting the interpolated artifacts of FRUC. The experimental results also demonstrate that our trained model is robust for an unseen dataset, unlearned frame-rate, and unlearned quality factor.
... However, their performance deteriorates seriously when the tampered frame-rate is much higher than the original one, especially more than four times. Subsequently, we discuss the localization problem of interpolated frames under real-world scenarios such as H.264/AVC compression, noise or blur [12]. In this method, the candidate artifact regions is firstly selected after investigating the existing strong correlations between artifact regions and high residual energies. ...
... The 14 th original frames of "Football" sequence and its interpolated frames [12] by MSU, MVTools, DSME [45] and MCMP [23] Content courtesy of Springer Nature, terms of use apply. Rights reserved. ...
Article
Full-text available
Motion Compensated Frame Interpolation (MCFI), a frame-based video operation to increase the motion continuity of low frame rate video, can be adopted by falsifiers for forging high bitrate video or splicing videos with different frame-rates. For existing MCFI detectors, their performance are degraded by stronger video compression, and noise. To deal with this problem, we propose a blind forensics method to detect the adopted MCFI operation. After investigating the synthetic process of interpolated frames, we discover that motion regions of interpolated frames exist some local slight artifacts, causing the optical flow based inter-frame discontinuity. To capture this irregularities introduced by various MCFI techniques, compact features are designed, which are calculated as Temporal Frame Difference-weighted histogram of Local Binary Pattern computed on Optical Flow field (TFD-OFLBP). Meanwhile, Local Inter-block and Edge-block difference Features (LIEF) are further proposed to detect interpolation frames with stable content. Besides, a set of forensics tools are adopted to eliminate the side effects of possible interferences of the scenes change, sudden lighting change, focus vibration, and some original frames with inherent local artifacts. Experimental results on four representative MCFI software and techniques show that the proposed approach outperforms existing MCFI detectors and also with robustness to compression, and noise.
... For each CU derived from recursive quad-tree partitioning, it supports up to 20 intra-and inter-prediction modes, from which the optimal mode is selected with the minimum rate-distortion cost (RD Cost) by rate-distortion optimization (RDO). Especially for dependent views, the exhaustive mode decision and associated motion estimation (ME) [7][8][9][10] and disparity estimation (DE) [11] induce extremely high computational burden, which limits the practical applications of 3D-HEVC. Thus, fast mode decision is a crucial issue for the optimization of the 3D-HEVC encoder. ...
Article
Full-text available
As an extension of High-Efficiency Video Coding (HEVC) standard, 3D-HEVC needs to encode multiple texture views and depth maps, which further increases the computational complexity. To reduce the complexity of dependent texture view coding, a fast prediction unit (PU) and coding unit (CU) decision method is proposed for 3D-HEVC based on hybrid stopping model. The inter-view correlation is used as a priori information to roughly predict the possible optimal PU and CU sizes. Then, by exploiting the encoded posterior information, the rate distortion cost correlation and the code block flag, the optimal PU and CU are further examined as being optimal or not. Experimental results show that the proposed fast PU and CU decision method achieves 52.7% encoding time saving on average with negligible loss of coding efficiency for 3D-HEVC-dependent texture view coding.
... Thirdly, FRUC is originally presented to improve the motion continuity of low frame-rate videos, but it can also be regarded as a special inter-frame forgery since it enhances both framerate and bitrate [8]. In recent years, a few FRUC detectors have been proposed in the field of blind video forensics [7,9,36] to verify the authenticity and integrity of digital videos. Meanwhile, antiforensics techniques, which are countermeasures to video forensics, have been designed to hide video inter-frame forgery [17,32]. ...
Article
Full-text available
A spatio-temporal saliency-based frame rate up-conversion (FRUC) approach is proposed, which achieves better quality of interpolated frames and invalidates existing texture variation-based FRUC detectors. A spatio-temporal saliency model is designed to select salient frames. After obtaining initial motion vector field by texture- and color-based bilateral motion estimation, two motion vector refining (MVR) schemes are adopted for high and low saliency frames to hierarchically refine the motion vectors, respectively. To produce high-quality interpolated frames, image enhancement are performed for salient frames after frame interpolation. Due to distinct MVR schemes, there are different degrees of texture information in interpolated frames. Some edge and texture information is supplemented into salient frames as post-processing, which can invalidate existing texture variation-based FRUC detectors. Experimental results show that the proposed approach outperforms state-of-the-art works in both objective and subjective qualities of interpolated frames, and achieves the purpose of FRUC anti-forensics.
... m   y y (11) where median(•) denotes the median of input vector. Finally, the standard deviation of the video is generated as follows, ...
Article
Full-text available
Frame repetition (FR) is a common temporal-domain tampering operator, which is often used to increase the frame rate of video sequences. Existing methods detect FR forgery by analyzing residual variation or similarity between video frames; however, these methods are easily interfered with by noise, affecting the stability of detection performance. This paper proposes a noise-level based detection method which detects the varying noise level over time to determine whether the video is forged by FR. Wavelet coefficients are first computed for each video frame, and median absolute deviation (MAD) of wavelet coefficients is used to estimate the standard deviation of Gaussian noise mixed in each video frame. Then, fast Fourier transform (FFT) is used to calculate the amplitude spectrum of the standard deviation curve of the video sequence, and to provide the peak-mean ratio (PMR) of the amplitude spectrum. Finally, according to the PMR obtained, a hard threshold decision is taken to determine whether the standard deviation bears periodicity in the temporal domain, in which way FR forgery can be automatically identified. The experimental results show that the proposed method ensures a large PMR for the forged video, and presents a better detection performance when compared with the existing detection methods.
Article
Mosaic is a prevalent signal processing approach to hide or protect critical information from users. The conventional mosaic schemes simply abolish or add artificial noise to the original signal content a sender wants to conceal. Thus they are not recoverable simply by use of a key which can be represented as a very short sequence compared to the concealed signal content. In this work, we extend our previous effort in recoverable image mosaicing to design a novel audio mosaicing approach using hierarchical permutations. Besides, we establish the mathematical relationship between the popular signal-quality metric, namely signal-to-noise ratio (SNR), and our previously proposed signal-destructuring metric, namely Kullback-Leibler divergence of discrete cosine transform (DCT-KLD), so that the mosaicing or signal-destructuring effect in terms of DCT-KLD and the general signal-quality measure in terms of SNR can be translated into each other. As a result, one can easily judge if the mosaicked signal reaches the concealability which is equivalent to the maximum SNR to eliminate the intelligibility of an utterance. The new relationship between DCT-KLD and SNR we develop can thus be very useful to qualify an audio mosaic method without any need of human listening test.