ArticlePublisher preview available

Fake-checker: A fusion of texture features and deep learning for deepfakes detection

November 2023
Multimedia Tools and Applications 83(16):1-25

November 2023
83(16):1-25

DOI:10.1007/s11042-023-17586-x

Authors:

Ali Javed

University of Engineering and Technology, Taxila

Kholoud Maswadi

Head of Management Information Systems Department.College of Business Administration (CBA) , Jazan Universit, Saudi Arabiay

Ali Alhazmi

Show all 5 authorsHide

The evolution of sophisticated deep learning algorithms such as Generative Adversarial Networks has made it possible to create deepfakes videos with convincing reality. Deepfake identification is important to address internet disinformation campaigns and lessen negative social media effects. Existing studies either use handcrafted features or deep learning-based models for deepfake detection. To effectively combine the attributes of both approaches, this paper presents a fusion of deep features with handcrafted texture features to create a powerful fused feature vector for accurate deepfakes detection. We propose a Directional Magnitude Local Hexadecimal Pattern (DMLHP) to extract the 320-D texture features and extract the deep feature vector of 2048-D using inception V3. Next, we employ the Principal Component Analysis to reduce the feature dimensions to 320 for a balanced representation of features after fusion. The deep and handcrafted features are combined to form a fused feature vector of 640-D. Further, we employ the proposed features to train the XGBoost model for the classification of frames as genuine or forged. We evaluated our proposed model on Faceforensic + + and Deepfake Detection Challenge Preview (DFDC-P) datasets. Our method achieved the accuracy and area under the curve of 97.7% and 99.3% on Faceforensic + + , whereas 90.8% and 93.1% on the DFDC-P dataset, respectively. Moreover, we performed a cross-set and cross-dataset evaluation to show the generalization capability of our model.

Handcrafted feature engineering

…

Process flow of the proposed method

…

An illustration of how TOBP and TMBP are computed. The TOBP calculation step for the central pixel is displayed in the orange box, while the 8 neighborhood regions are represented in the green box. In the given example orange box represents the center pixel 7 and its adjacent neighbors 1,8,2,2 in 135,90, 45, and 0 directions, whereas the green box represents the center pixel's eight surrounding neighbors with their adjacent neighbors in the four directions. The central pixel, which is "7," is highlighted by the purple box, while the yellow box indicates the neighboring pixel

…

Computation of binary pattern, while the blue column represents the TOBP 8-bit and the green column represents the occurrence of 8-bit pattern values (1, 1, 12, 1, 4, 1, 15, 11)

…

Figures - available from: Multimedia Tools and Applications

This content is subject to copyright. Terms and conditions apply.

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Multimedia Tools and Applications

This content is subject to copyright. Terms and conditions apply.

Vol.:(0123456789)

Multimedia Tools and Applications (2024) 83:49013–49037

https://doi.org/10.1007/s11042-023-17586-x

1 3

Fake‑checker: Afusion oftexture features anddeep learning

fordeepfakes detection

NoorulHuda1· AliJaved2· KholoudMaswadi3· AliAlhazmi4· RehanAshraf5

Received: 17 February 2023 / Revised: 20 September 2023 / Accepted: 18 October 2023 /

Published online: 3 November 2023

Abstract

The evolution of sophisticated deep learning algorithms such as Generative Adversarial

Networks has made it possible to create deepfakes videos with convincing reality. Deep-

fake identiﬁcation is important to address internet disinformation campaigns and lessen

negative social media eﬀects. Existing studies either use handcrafted features or deep

learning-based models for deepfake detection. To eﬀectively combine the attributes of

both approaches, this paper presents a fusion of deep features with handcrafted texture fea-

tures to create a powerful fused feature vector for accurate deepfakes detection. We pro-

pose a Directional Magnitude Local Hexadecimal Pattern (DMLHP) to extract the 320-D

texture features and extract the deep feature vector of 2048-D using inception V3. Next,

we employ the Principal Component Analysis to reduce the feature dimensions to 320 for

a balanced representation of features after fusion. The deep and handcrafted features are

combined to form a fused feature vector of 640-D. Further, we employ the proposed fea-

tures to train the XGBoost model for the classiﬁcation of frames as genuine or forged. We

evaluated our proposed model on Faceforensic + + and Deepfake Detection Challenge Pre-

view (DFDC-P) datasets. Our method achieved the accuracy and area under the curve of

97.7% and 99.3% on Faceforensic + + , whereas 90.8% and 93.1% on the DFDC-P dataset,

respectively. Moreover, we performed a cross-set and cross-dataset evaluation to show the

generalization capability of our model.

Keywords Deepfakes· Deep Convolutional Neural Networks· Generative Adversarial

Networks

* Rehan Ashraf

rehan@ntu.edu.pk

1 Department ofComputer Science, University ofEngineering andTechnology, Taxila47050,

Pakistan

2 Department ofSoftware Engineering, University ofEngineering andTechnology, Taxila47050,

Pakistan

3 Department ofManagement Information Systems, Jazan University, 45142Jazan, SaudiArabia

4 College ofComputer Science andInformation Technology, Jazan University, 45142Jazan,

SaudiArabia

5 Department ofComputer Science, National Textile University, Faisalabad, Pakistan

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

ResearchGate has not been able to resolve any citations for this publication.

DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences

Article

Full-text available

May 2023
VISUAL COMPUT

In videos, anomaly detection is challenging due to its diverse nature in different application domains. Reconstruction and prediction-based methods have been widely employed to detect anomalies. Due to the generalization capability of a deep neural network, sometimes, it recreates irregular patterns along with regular ones. This paper presents a novel autoencoder-based framework called deep multiplicative attention-based autoencoder (DeMAAE) to detect anomalies in a video sequence. The global attention mechanism is used at the decoder side of DeMAAE for better feature learning during the decoding phase. An attention map is created by taking the dot product between all encoder's hidden states and the previously generated decoder's hidden state. After that, the final output of the decoder is determined by the context vector. The context vector is computed using the weighted summation of all encoder's hidden states and attention weight. DeMAAE delivers an improved runtime of 0.015 s (∼ 67 fps) for detecting anomalies during testing. Extensive experiments have been performed on the two diversified and widely used datasets (UCSD Pedestrian and CUHK Avenue) to compare the efficacy of DeMAAE with different state-of-the-art methods.

Unsupervised anomalous event detection in videos using spatiotemporal inter-fused autoencoder

Article

Full-text available

Sep 2022
MULTIMED TOOLS APPL

Automatic detection, localization and interpretation of an unusual event in a sequence of video is a challenging task due to its equivocal and complex nature. The development of deep neural networks have paved the way for more efficient recognition and analysis of anomalous events in video data. With the introduction of convolutional neural network (CNN) and Long short-term memory (LSTM), the spatial and temporal features extraction became easier. In this paper, we propose an end-to-end trainable Inter-fused Autoencoder (IFA) which is designed using the assemblage of CNN and LSTM layers to detect the unwonted events in a video sequence. The proposed architecture is capable of exploiting both the spatial and temporal variation of video data. The reconstruction error is computed in terms of both MSE and PSNR for each testing video. A comparison is also carried out between MSE and PSNR to show that which assessment technique is better for a recon-structive model for recreating the video sequence. A well-optimized threshold is calculated which decides the fate of testing event i.e. either usual or unusual event. Using benchmark datasets, multiple experiments were carried out to demonstrate the efficacy of proposed architecture.

DFDT: An End-to-End DeepFake Detection Framework Using Vision Transformer

Article

Full-text available

Mar 2022

The ever-growing threat of deepfakes and large-scale societal implications has propelled the development of deepfake forensics to ascertain the trustworthiness of digital media. A common theme of existing detection methods is using Convolutional Neural Networks (CNNs) as a backbone. While CNNs have demonstrated decent performance on learning local discriminative information, they fail to learn relative spatial features and lose important information due to constrained receptive fields. Motivated by the aforementioned challenges, this work presents DFDT, an end-to-end deepfake detection framework that leverages the unique characteristics of transformer models, for learning hidden traces of perturbations from both local image features and global relationship of pixels at different forgery scales. DFDT is specifically designed for deepfake detection tasks consisting of four main components: patch extraction & embedding, multi-stream transformer block, attention-based patch selection followed by a multi-scale classifier. DFDT’s transformer layer benefits from a re-attention mechanism instead of a traditional multi-head self-attention layer. To evaluate the performance of DFDT, a comprehensive set of experiments are conducted on several deepfake forensics benchmarks. Obtained results demonstrated the surpassing detection rate of DFDT, achieving 99.41%, 99.31%, and 81.35% on FaceForensics++, Celeb-DF (V2), and WildDeepfake, respectively. Moreover, DFDT’s excellent cross-dataset & cross-manipulation generalization provides additional strong evidence on its effectiveness.

Fused Swish-ReLU Efficient-Net Model for Deepfakes Detection

Conference Paper

Feb 2023

Deepfakes Catcher: A Novel Fused Truncated DenseNet Model for Deepfakes Detection

Chapter

May 2023

In recent years, we have witnessed a tremendous evolution in generative adversarial networks resulting in the creation of much realistic fake multimedia content termed deepfakes. The deepfakes are created by superimposing one person’s real facial features, expressions, or lip movements onto another one. Apart from the benefits of deepfakes, it has been largely misused to propagate disinformation about influential persons like celebrities, politicians, etc. Since the deepfakes are created using different generative algorithms and involve much realism, thus it is a challenging task to detect them. Existing deepfakes detection methods have shown lower performance on forged videos that are generated using different algorithms, as well as videos that are of low resolution, compressed, or computationally more complex. To counter these issues, we propose a novel fused truncated DenseNet121 model for deepfakes videos detection. We employ transfer learning to reduce the resources and improve effectiveness, truncation to reduce the parameters and model size, and feature fusion to strengthen the representation by capturing more distinct traits of the input video. Our fused truncated DenseNet model lowers the DenseNet121 parameters count from 8.5 to 0.5 million. This makes our model more effective and lightweight that can be deployed in portable devices for real-time deepfakes detection. Our proposed model can reliably detect various types of deepfakes as well as deepfakes of different generative methods. We evaluated our model on two diverse datasets: a large-scale FaceForensics (FF)++ dataset and the World Leaders (WL) dataset. Our model achieves a remarkable accuracy of 99.03% on the WL dataset and 87.76% on the FF++ which shows the effectiveness of our method for deepfakes detection.KeywordsDeepfakes detectionDenseNet121FaceForensics++Fused truncated DenseNetWorld leaders dataset

DFGNN: An Interpretable and Generalized graph neural network for deepfakes detection

Article

Mar 2023
EXPERT SYST APPL

Deepfakes are generated using sophisticated deep-learning models to create fake images or videos. As the techniques for creating deepfakes improve, issues like defamation, impersonation, fraud, and misinformation on social media are becoming more prevalent. Existing deep learning-based deepfakes detection models are not interpretable and don’t generalize well when tested across diverse deepfakes generating techniques and datasets. Therefore, the creation of reliable and effective deepfakes detection algorithms is required which are not only generalizable but also interpretable. This paper introduces a novel graph neural network-based architecture to identify hyperrealist deepfake content. Currently, very limited efforts have been done to address the problem of deepfakes detection using graph neural networks. The proposed model is based on the pyramid structure that takes advantage of multi-scale images property by extracting features with progressively smaller spatial sizes as layer depth increases. The method first sliced the image into patches, which are referred to as nodes, and then constructed a graph by connecting the nearest neighbors. To transform and exchange information between all nodes, the proposed model has two basic modules: GraphNet, which uses graph convolution layers to aggregate and update graph information, and FFN, which has linear layers for the transformation of node features. The effectiveness of the method is assessed using the diverse Deepfake Detection Challenge dataset, FaceForensics++ (FF++), World Leaders dataset (WLRD), and the Celeb-DF. To demonstrate the generalizability of the proposed method for accurate deepfakes detection, open/close set, cross-set, and cross-corpora evaluations were also performed. The AUC values of 0.98 on FF++, 0.95 on Celeb-DF, 0.92 on DFDC, and 1.00 on most of the sets of WLRD datasets demonstrate the efficacy of the method for identifying manipulated facial images produced using various deepfakes techniques.

Hybrid Transformer Network for Deepfake Detection

Conference Paper

Oct 2022

A3N: Attention-based adversarial autoencoder network for detecting anomalies in video sequence

Article

Jul 2022
J VIS COMMUN IMAGE R

This paper presents a novel attention-based adversarial autoencoder network (A3N) that consists of a two-stream decoder to detect abnormal events in video sequences. The first stream of the decoder is a reconstructive model responsible for recreating the input frame sequence. However, the second stream is a future predictive model used to predict the future frame sequence through adversarial learning. A global attention mechanism is employed at the decoder side that helps to decode the encoded sequences effectively. The training of A3N is carried out on normal video data. The attention-based reconstructive model is used during the inference stage to compute the anomaly score. A3N delivers a considerable average speed of 0.0227 s (∼ 44 fps ) for detecting anomalies in the testing phase on used datasets. Several experiments and ablation analyses have been performed on UCSD Pedestrian, CUHK Avenue and ShanghaiTech datasets to validate the efficiency of the proposed model.

FInfer: Frame Inference-Based Deepfake Detection for High-Visual-Quality Videos

Article

Jun 2022

Deepfake has ignited hot research interests in both academia and industry due to its potential security threats. Many countermeasures have been proposed to mitigate such risks. Current Deepfake detection methods achieve superior performances in dealing with low-visual-quality Deepfake media which can be distinguished by the obvious visual artifacts. However, with the development of deep generative models, the realism of Deepfake media has been significantly improved and becomes tough challenging to current detection models. In this paper, we propose a frame inference-based detection framework (FInfer) to solve the problem of high-visual-quality Deepfake detection. Specifically, we first learn the referenced representations of the current and future frames’ faces. Then, the current frames’ facial representations are utilized to predict the future frames’ facial representations by using an autoregressive model. Finally, a representation-prediction loss is devised to maximize the discriminability of real videos and fake videos. We demonstrate the effectiveness of our FInfer framework through information theory analyses. The entropy and mutual information analyses indicate the correlation between the predicted representations and referenced representations in real videos is higher than that of high-visual-quality Deepfake videos. Extensive experiments demonstrate the performance of our method is promising in terms of in-dataset detection performance, detection efficiency, and cross-dataset detection performance in high-visual-quality Deepfake videos.

BZNet: Unsupervised Multi-scale Branch Zooming Network for Detecting Low-quality Deepfake Videos

Conference Paper

Apr 2022

Fake-checker: A fusion of texture features and deep learning for deepfakes detection

Abstract and Figures

Recommended publications

D3: A Novel Face Forgery Detector Based on Dual-Stream and Dual-Utilization Methods

Faceswap Deepfakes Detection using Novel Multi-directional Hexadecimal Feature Descriptor

A deep learning model for FaceSwap and Face-Reenactment Deepfakes Detection

Deepfakes Examiner: An End-to-End Deep Learning Model for Deepfakes Videos Detection