ArticlePublisher preview available

Fake-checker: A fusion of texture features and deep learning for deepfakes detection

Authors:
  • Head of Management Information Systems Department.College of Business Administration (CBA) , Jazan Universit, Saudi Arabiay
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

The evolution of sophisticated deep learning algorithms such as Generative Adversarial Networks has made it possible to create deepfakes videos with convincing reality. Deepfake identification is important to address internet disinformation campaigns and lessen negative social media effects. Existing studies either use handcrafted features or deep learning-based models for deepfake detection. To effectively combine the attributes of both approaches, this paper presents a fusion of deep features with handcrafted texture features to create a powerful fused feature vector for accurate deepfakes detection. We propose a Directional Magnitude Local Hexadecimal Pattern (DMLHP) to extract the 320-D texture features and extract the deep feature vector of 2048-D using inception V3. Next, we employ the Principal Component Analysis to reduce the feature dimensions to 320 for a balanced representation of features after fusion. The deep and handcrafted features are combined to form a fused feature vector of 640-D. Further, we employ the proposed features to train the XGBoost model for the classification of frames as genuine or forged. We evaluated our proposed model on Faceforensic + + and Deepfake Detection Challenge Preview (DFDC-P) datasets. Our method achieved the accuracy and area under the curve of 97.7% and 99.3% on Faceforensic + + , whereas 90.8% and 93.1% on the DFDC-P dataset, respectively. Moreover, we performed a cross-set and cross-dataset evaluation to show the generalization capability of our model.
This content is subject to copyright. Terms and conditions apply.
Vol.:(0123456789)
Multimedia Tools and Applications (2024) 83:49013–49037
https://doi.org/10.1007/s11042-023-17586-x
1 3
Fake‑checker: Afusion oftexture features anddeep learning
fordeepfakes detection
NoorulHuda1· AliJaved2· KholoudMaswadi3· AliAlhazmi4· RehanAshraf5
Received: 17 February 2023 / Revised: 20 September 2023 / Accepted: 18 October 2023 /
Published online: 3 November 2023
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023
Abstract
The evolution of sophisticated deep learning algorithms such as Generative Adversarial
Networks has made it possible to create deepfakes videos with convincing reality. Deep-
fake identification is important to address internet disinformation campaigns and lessen
negative social media effects. Existing studies either use handcrafted features or deep
learning-based models for deepfake detection. To effectively combine the attributes of
both approaches, this paper presents a fusion of deep features with handcrafted texture fea-
tures to create a powerful fused feature vector for accurate deepfakes detection. We pro-
pose a Directional Magnitude Local Hexadecimal Pattern (DMLHP) to extract the 320-D
texture features and extract the deep feature vector of 2048-D using inception V3. Next,
we employ the Principal Component Analysis to reduce the feature dimensions to 320 for
a balanced representation of features after fusion. The deep and handcrafted features are
combined to form a fused feature vector of 640-D. Further, we employ the proposed fea-
tures to train the XGBoost model for the classification of frames as genuine or forged. We
evaluated our proposed model on Faceforensic + + and Deepfake Detection Challenge Pre-
view (DFDC-P) datasets. Our method achieved the accuracy and area under the curve of
97.7% and 99.3% on Faceforensic + + , whereas 90.8% and 93.1% on the DFDC-P dataset,
respectively. Moreover, we performed a cross-set and cross-dataset evaluation to show the
generalization capability of our model.
Keywords Deepfakes· Deep Convolutional Neural Networks· Generative Adversarial
Networks
* Rehan Ashraf
rehan@ntu.edu.pk
1 Department ofComputer Science, University ofEngineering andTechnology, Taxila47050,
Pakistan
2 Department ofSoftware Engineering, University ofEngineering andTechnology, Taxila47050,
Pakistan
3 Department ofManagement Information Systems, Jazan University, 45142Jazan, SaudiArabia
4 College ofComputer Science andInformation Technology, Jazan University, 45142Jazan,
SaudiArabia
5 Department ofComputer Science, National Textile University, Faisalabad, Pakistan
Content courtesy of Springer Nature, terms of use apply. Rights reserved.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In videos, anomaly detection is challenging due to its diverse nature in different application domains. Reconstruction and prediction-based methods have been widely employed to detect anomalies. Due to the generalization capability of a deep neural network, sometimes, it recreates irregular patterns along with regular ones. This paper presents a novel autoencoder-based framework called deep multiplicative attention-based autoencoder (DeMAAE) to detect anomalies in a video sequence. The global attention mechanism is used at the decoder side of DeMAAE for better feature learning during the decoding phase. An attention map is created by taking the dot product between all encoder's hidden states and the previously generated decoder's hidden state. After that, the final output of the decoder is determined by the context vector. The context vector is computed using the weighted summation of all encoder's hidden states and attention weight. DeMAAE delivers an improved runtime of 0.015 s (∼ 67 fps) for detecting anomalies during testing. Extensive experiments have been performed on the two diversified and widely used datasets (UCSD Pedestrian and CUHK Avenue) to compare the efficacy of DeMAAE with different state-of-the-art methods.
Article
Full-text available
Automatic detection, localization and interpretation of an unusual event in a sequence of video is a challenging task due to its equivocal and complex nature. The development of deep neural networks have paved the way for more efficient recognition and analysis of anomalous events in video data. With the introduction of convolutional neural network (CNN) and Long short-term memory (LSTM), the spatial and temporal features extraction became easier. In this paper, we propose an end-to-end trainable Inter-fused Autoencoder (IFA) which is designed using the assemblage of CNN and LSTM layers to detect the unwonted events in a video sequence. The proposed architecture is capable of exploiting both the spatial and temporal variation of video data. The reconstruction error is computed in terms of both MSE and PSNR for each testing video. A comparison is also carried out between MSE and PSNR to show that which assessment technique is better for a recon-structive model for recreating the video sequence. A well-optimized threshold is calculated which decides the fate of testing event i.e. either usual or unusual event. Using benchmark datasets, multiple experiments were carried out to demonstrate the efficacy of proposed architecture.
Article
Full-text available
The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT’s transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT’s excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.
Chapter
In recent years, we have witnessed a tremendous evolution in generative adversarial networks resulting in the creation of much realistic fake multimedia content termed deepfakes. The deepfakes are created by superimposing one person’s real facial features, expressions, or lip movements onto another one. Apart from the benefits of deepfakes, it has been largely misused to propagate disinformation about influential persons like celebrities, politicians, etc. Since the deepfakes are created using different generative algorithms and involve much realism, thus it is a challenging task to detect them. Existing deepfakes detection methods have shown lower performance on forged videos that are generated using different algorithms, as well as videos that are of low resolution, compressed, or computationally more complex. To counter these issues, we propose a novel fused truncated DenseNet121 model for deepfakes videos detection. We employ transfer learning to reduce the resources and improve effectiveness, truncation to reduce the parameters and model size, and feature fusion to strengthen the representation by capturing more distinct traits of the input video. Our fused truncated DenseNet model lowers the DenseNet121 parameters count from 8.5 to 0.5 million. This makes our model more effective and lightweight that can be deployed in portable devices for real-time deepfakes detection. Our proposed model can reliably detect various types of deepfakes as well as deepfakes of different generative methods. We evaluated our model on two diverse datasets: a large-scale FaceForensics (FF)++ dataset and the World Leaders (WL) dataset. Our model achieves a remarkable accuracy of 99.03% on the WL dataset and 87.76% on the FF++ which shows the effectiveness of our method for deepfakes detection.KeywordsDeepfakes detectionDenseNet121FaceForensics++Fused truncated DenseNetWorld leaders dataset
Article
Deepfakes are generated using sophisticated deep-learning models to create fake images or videos. As the techniques for creating deepfakes improve, issues like defamation, impersonation, fraud, and misinformation on social media are becoming more prevalent. Existing deep learning-based deepfakes detection models are not interpretable and don’t generalize well when tested across diverse deepfakes generating techniques and datasets. Therefore, the creation of reliable and effective deepfakes detection algorithms is required which are not only generalizable but also interpretable. This paper introduces a novel graph neural network-based architecture to identify hyperrealist deepfake content. Currently, very limited efforts have been done to address the problem of deepfakes detection using graph neural networks. The proposed model is based on the pyramid structure that takes advantage of multi-scale images property by extracting features with progressively smaller spatial sizes as layer depth increases. The method first sliced the image into patches, which are referred to as nodes, and then constructed a graph by connecting the nearest neighbors. To transform and exchange information between all nodes, the proposed model has two basic modules: GraphNet, which uses graph convolution layers to aggregate and update graph information, and FFN, which has linear layers for the transformation of node features. The effectiveness of the method is assessed using the diverse Deepfake Detection Challenge dataset, FaceForensics++ (FF++), World Leaders dataset (WLRD), and the Celeb-DF. To demonstrate the generalizability of the proposed method for accurate deepfakes detection, open/close set, cross-set, and cross-corpora evaluations were also performed. The AUC values of 0.98 on FF++, 0.95 on Celeb-DF, 0.92 on DFDC, and 1.00 on most of the sets of WLRD datasets demonstrate the efficacy of the method for identifying manipulated facial images produced using various deepfakes techniques.
Article
This paper presents a novel attention-based adversarial autoencoder network (A3N) that consists of a two-stream decoder to detect abnormal events in video sequences. The first stream of the decoder is a reconstructive model responsible for recreating the input frame sequence. However, the second stream is a future predictive model used to predict the future frame sequence through adversarial learning. A global attention mechanism is employed at the decoder side that helps to decode the encoded sequences effectively. The training of A3N is carried out on normal video data. The attention-based reconstructive model is used during the inference stage to compute the anomaly score. A3N delivers a considerable average speed of 0.0227 s (∼ 44 fps ) for detecting anomalies in the testing phase on used datasets. Several experiments and ablation analyses have been performed on UCSD Pedestrian, CUHK Avenue and ShanghaiTech datasets to validate the efficiency of the proposed model.
Article
Deepfake has ignited hot research interests in both academia and industry due to its potential security threats. Many countermeasures have been proposed to mitigate such risks. Current Deepfake detection methods achieve superior performances in dealing with low-visual-quality Deepfake media which can be distinguished by the obvious visual artifacts. However, with the development of deep generative models, the realism of Deepfake media has been significantly improved and becomes tough challenging to current detection models. In this paper, we propose a frame inference-based detection framework (FInfer) to solve the problem of high-visual-quality Deepfake detection. Specifically, we first learn the referenced representations of the current and future frames’ faces. Then, the current frames’ facial representations are utilized to predict the future frames’ facial representations by using an autoregressive model. Finally, a representation-prediction loss is devised to maximize the discriminability of real videos and fake videos. We demonstrate the effectiveness of our FInfer framework through information theory analyses. The entropy and mutual information analyses indicate the correlation between the predicted representations and referenced representations in real videos is higher than that of high-visual-quality Deepfake videos. Extensive experiments demonstrate the performance of our method is promising in terms of in-dataset detection performance, detection efficiency, and cross-dataset detection performance in high-visual-quality Deepfake videos.