Figure 4 - uploaded by Kai Shu
Content may be subject to copyright.
A comparison of bot scores on users related to fake and real news on PolitiFact dataset.

A comparison of bot scores on users related to fake and real news on PolitiFact dataset.

Source publication
Article
Full-text available
Social media has become a popular means for people to consume and share the news. At the same time, however, it has also enabled the wide dissemination of fake news, that is, news with intentionally false information, causing significant negative effects on society. To mitigate this problem, the research of fake news detection has recently received...

Contexts in source publication

Context 1
... set the threshold of 0.5 on the bot score returned from the Botometer results to determine bot accounts. Figure 4 shows the ratio of the bot and human users involved in tweets related to fake and real news. We can see that bots are more likely to post tweets related to fake news than real users. ...
Context 2
... features like user profile metadata and historical tweets of users who spread fake news along with social network one could analyze the differences in characteristics of users to clusters them as malicious or not. Through a preliminary study in Figure 4, we have shown that bot users are more likely to exist in the fake news spreading process. Although existing works have studied bot detection in general, but few studies investigate the influences of social bots for fake news spreading. ...

Citations

... The features of network based analysis helps in finding the unique characteristics of certain networks (Shu et al. 2018), and the likeness and unlikeness of different online accounts, gives the brief idea of different network analysis: friendship network which indicates the following/followee structure of users whose post related tweets or articles and diffusion network (Kwon et al. 2013) which helps in tracking the route of the spread of news. ...
... Li et al. 2020b) Number of pages that the user liked (Sahoo and Gupta 2019) Stories shared in number (Sahoo and Gupta 2019), (Shu et al. 2018) Amount of tags (Sahoo and Gupta 2019), (Jarrahi and Safari 2021), (Shu et al. 2018), (Castillo et al. 2011) Hashtags shared (Sahoo and Gupta 2019), (Shu et al. 2018), (Castillo et al. 2011), (Singhal et al. 2019) Number of post/ tweets (Shu et al. 2018), (Sahoo and Gupta 2019), (Jarrahi and Safari 2021), (Castillo et al. 2011), (Indu and Thampi 2019), (Singhal et al. 2019), (Alrubaian et al. 2021) Current location (Sahoo and Gupta 2019), (Jarrahi and Safari 2021), (Castillo et al. 2011), (Indu and Thampi 2019), (Singhal et al. 2019), (Alrubaian et al. 2021) Friendships count (Alrubaian et al. 2021), (Indu and Thampi 2019), (Sahoo and Gupta 2019), (Jarrahi and Safari 2021), (Castillo et al. 2011), (Singhal et al. 2019) Time of posting (Sahoo and Gupta 2019), (Alrubaian et al. 2021), (Castillo et al. 2011), (Jarrahi and Safari 2021), (Indu and Thampi 2019) Spam-filled messages (Sahoo and Gupta 2019) Number of followers (Alrubaian et al. 2021), (Jarrahi and Safari 2021), (Indu and Thampi 2019), (Castillo et al. 2011), (Shu et al. 2018), (Jin et al. 2016) Max. length of a tweet (Wani et al. 2021) Retweets (Sahoo and Gupta 2019), (Wani et al. 2021), (Castillo et al. 2011) Tweets with mention (Shu et al. 2018), (Sahoo and Gupta 2019), (Kwon et al. 2013), (Alrubaian et al. 2021) The overall count of likes (Sahoo and Gupta 2019), (Shu et al. 2018), (Singhal et al. 2019) 3Distribution features(Kwon et al. 2013),(Castillo et al. 2011),(Varol et al. 2017),(Shu et al. 2018),(Jin et al. 2016),(Vosoughi et al. 2017).(Wu and Liu 2018) 4 Temporal features(Castillo et al. 2011),(Kwon et al. 2013), (Vosoughi et al. 2017), (Ma et al. 2016), (Chen et al. 2019), (Varol et al. 2017), (Jin et al. 2016), (Reis et al. 2019), (Habib et al. 2019), (Huang et al. 2020) 5 Psycho-linguistic features (Verma et al. 2021) 6 Linguistic features (Mahyoob et al. 2020), (Varol et al. 2017), (Verma et al. 2021), (Kwon et al. 2013), (Pérez-Rosas et al. 2018), (Vosoughi et al. 2017), (Reis et al. 2019) 7 Visual Features (Sharma and Garg 2021), (Yang et al. 2018), (Jin et al. 2016), (Xue et al. 2021) 8 ...
... FakeNewsNet 9(Shu et al. 2018) Published a dataset that contains 211 fake news and 211 true news stories culled from BuzzFeed.com and PolitiFact.com. ...
Article
Full-text available
As social media and web-based forums have grown in popularity, the fast-spreading trend of fake news has become a major threat to the government and other agencies. With the rise of social media and internet platforms, misinformation may quickly spread over borders and language boundaries. Detecting and neutralizing fake news in several languages can help to protect the integrity of global elections, political discourse, and public opinion. The lack of a robust multilingual database for training the classification models makes detecting fake news a difficult task. This paper looks at it by describing several forms of fake news (like serious fabrications, large-scale hoaxes, stance news, deceptive news, satire news, clickbait, misinformation, rumour). This review paper includes different steps, features, tools for mitigating the scourge of information pollution, and different available datasets. This study presented a taxonomy for detecting fake news, which gives a comprehensive overview and analysis of existing DL-based algorithms focusing on diverse techniques. This paper also includes the monolingual and multilingual fake news detection models. Finally, this paper ends with the technical challenges.
... As baselines, we train a Multinomial Naive Bayes (MNB), and a Support Vector Machine (SVM), both using bag-of-words to encode the textual features. MNB and SVM are classic machine learning baselines commonly adopted in other disinformation detection works [4,13,20,21]. We also train transformer-based models, mBERT and XLM-RoBERTa, as three of the relevant multilingual datasets discussed in Section 1 use mBERT as a baseline [2,10,21], and XLM-R is used by the fourth dataset Li et al. [13]. ...
Preprint
This work introduces EUvsDisinfo, a multilingual dataset of trustworthy and disinformation articles related to pro-Kremlin themes. It is sourced directly from the debunk articles written by experts leading the EUvsDisinfo project. Our dataset is the largest to-date resource in terms of the overall number of articles and distinct languages. It also provides the largest topical and temporal coverage. Using this dataset, we investigate the dissemination of pro-Kremlin disinformation across different languages, uncovering language-specific patterns targeting specific disinformation topics. We further analyse the evolution of topic distribution over an eight-year period, noting a significant surge in disinformation content before the full-scale invasion of Ukraine in 2022. Lastly, we demonstrate the dataset's applicability in training models to effectively distinguish between disinformation and trustworthy content in multilingual settings.
... Our experiments use three real-world fake news benchmark datasets. We utilize Politi-Fact and Gossipcop, drawn from the FakeNewsNet benchmark (Shu et al., 2020), which focus on political claims and celebrity rumors, respectively. Additionally, we incorporate Constraint (Felber, 2021), a dataset specifically addressing COVID-19 related social media posts. ...
... This paper utilizes a diverse range of datasets sourced from publicly available fake news detection resources, selected for their unique topical focuses. We utilize PolitiFact and Gossipcop, drawn from the FakeNewsNet benchmark (Shu et al., 2020), addressing political claims and celebrity rumors respectively. Furthermore, we include Constraint (Felber, 2021), a dataset focused on COVID-19 related social media posts. ...
Preprint
The spread of fake news negatively impacts individuals and is regarded as a significant social challenge that needs to be addressed. A number of algorithmic and insightful features have been identified for detecting fake news. However, with the recent LLMs and their advanced generation capabilities, many of the detectable features (e.g., style-conversion attacks) can be altered, making it more challenging to distinguish from real news. This study proposes adversarial style augmentation, AdStyle, to train a fake news detector that remains robust against various style-conversion attacks. Our model's key mechanism is the careful use of LLMs to automatically generate a diverse yet coherent range of style-conversion attack prompts. This improves the generation of prompts that are particularly difficult for the detector to handle. Experiments show that our augmentation strategy improves robustness and detection performance when tested on fake news benchmark datasets.
... Three datasets are used to conduct domain-adaptive few-shot FND experiments, where Snopes (Popat et al., 2017) is a domain-agnostic dataset which is extracted from a fact-checking website 2 providing various news articles and corresponding labels. Politifact (Shu et al., 2020) is a political related dataset collected from another fact-checking website 3 specialized for US political system. CoAID (Cui and Lee, 2020) is a healthcare dataset containing COVID-19 related news on websites and social platforms. ...
Preprint
Most Fake News Detection (FND) methods often struggle with data scarcity for emerging news domain. Recently, prompt learning based on Pre-trained Language Models (PLM) has emerged as a promising approach in domain adaptive few-shot learning, since it greatly reduces the need for labeled data by bridging the gap between pre-training and downstream task. Furthermore, external knowledge is also helpful in verifying emerging news, as emerging news often involves timely knowledge that may not be contained in the PLM's outdated prior knowledge. To this end, we propose COOL, a Comprehensive knOwledge enhanced prOmpt Learning method for domain adaptive few-shot FND. Specifically, we propose a comprehensive knowledge extraction module to extract both structured and unstructured knowledge that are positively or negatively correlated with news from external sources, and adopt an adversarial contrastive enhanced hybrid prompt learning strategy to model the domain-invariant news-knowledge interaction pattern for FND. Experimental results demonstrate the superiority of COOL over various state-of-the-arts.
... Existing public datasets for news recommendations do not contain news veracity information while datasets for fake news detection usually lack comprehensive information regarding reading histories of individual users. Fortunately, we found user-news interactions and news veracity information can be extracted from a commonly used dataset FakeNewsNet 1 [24]. In addition, we retrieved the political polarity of each news item according to its source from a commonly used media bias/fact check website 2 . ...
Conference Paper
Full-text available
In the era of information explosion, news recommender systems are crucial for users to effectively and efficiently discover their interested news. However, most of the existing news recommender systems face two major issues, hampering recommendation quality. Firstly, they often oversimplify users' reading interests, neglecting their hierarchical nature, spanning from high-level event (e.g., US Election) related interests to low-level news article-specific interests. Secondly, existing work often assumes a simplistic context, disregarding the prevalence of fake news and political bias under the real-world context. This oversight leads to recommendations of biased or fake news, posing risks to individuals and society. To this end, this paper addresses these gaps by introducing a novel framework , the Hierarchical and Disentangling Interest learning framework (HDInt). HDInt incorporates a hierarchical interest learning module and a disentangling interest learning module. The former captures users' high-and low-level interests, enhancing next-news recommendation accuracy. The latter effectively separates polarity and veracity information from news contents and model them more specifically, promoting fairness-and truth-aware reading interest learning for unbiased and true news recommendations. Extensive experiments on two real-world datasets demonstrate HDInt's superiority over state-of-the-art news recommender systems in delivering accurate, unbiased, and true news recommendations. CCS CONCEPTS • Information systems → Retrieval tasks and goals.
... These systems employ artificial intelligence to automatically identify and categorize fake news, helping users in discriminating between true and false information (Ganegedara, 2022). However, the dataset is the main source for the credibility and trustability of machine learning and deep learning approaches (Helmstetter & Paulheim, 2018;Shu et al., 2020). The existing literature suffers from the limitation in the number of quality and reliable fake news datasets. ...
Article
Full-text available
Twitter is a powerful platform for communication and information sharing but is also susceptible to spreading false information. This false information has adverse consequences for society and can significantly impact public perception, decision-making, and political outcomes. Therefore, there is an urgent need to build a fake news detection system that can accurately catch false information before it is disseminated. Building such a system requires the existence of good quality and trustworthy labeled datasets. The limitations of the existing datasets are undeniable. Most of them are not updated to reflect the advanced generation patterns of the new fake news creators. Thanks to Truth Seeker research team, who offered a large-scale fake news dataset that was labeled based on Amazon Mechanical Turk. The dataset was collected between 2009 and 2022 and then validated according to a robust procedure to ensure its quality and reliability. However, the credibility and trustability of this dataset is still questionable. In this paper, we study and analyze the feasibility of building a fake news detection model based on deep learning using Truth seeker dataset. Mainly we investigated the impact of different text representation techniques on the accuracy of deep learning models. Also, we investigated the importance of hand-crafted features associated with the dataset in the final results. The results have shown that using truth seeker dataset show potential to help social media platforms in detecting fake news. on the other hand, using deep contextualized text representation produced more accurate results compared to word2vec and TF-IDF techniques. The impact of hand-crafted features on the final performance of deep learning models is often negligible, and it is suggested to be excluded from the final models.
... Textual veracity distortion encompasses three types of rumors: natural, artificial, and GPT-generated rumors. Unlike [11][12][13][14] that focus solely on single-source, text-only rumors, MMFakeBench incorporates text-image rumors using highly relevant real or AI-generated images. Visual veracity distortion filters existing PS-edited images [15,16] according to misinformation standards and incorporates high-quality AI-generated images. ...
... Politifact [12] (1) ...
... Gossipcop [12] (1) ...
Preprint
Current multimodal misinformation detection (MMD) methods often assume a single source and type of forgery for each sample, which is insufficient for real-world scenarios where multiple forgery sources coexist. The lack of a benchmark for mixed-source misinformation has hindered progress in this field. To address this, we introduce MMFakeBench, the first comprehensive benchmark for mixed-source MMD. MMFakeBench includes 3 critical sources: textual veracity distortion, visual veracity distortion, and cross-modal consistency distortion, along with 12 sub-categories of misinformation forgery types. We further conduct an extensive evaluation of 6 prevalent detection methods and 15 large vision-language models (LVLMs) on MMFakeBench under a zero-shot setting. The results indicate that current methods struggle under this challenging and realistic mixed-source MMD setting. Additionally, we propose an innovative unified framework, which integrates rationales, actions, and tool-use capabilities of LVLM agents, significantly enhancing accuracy and generalization. We believe this study will catalyze future research into more realistic mixed-source multimodal misinformation and provide a fair evaluation of misinformation detection methods.
... However, if the pre-training material contains content related to these entities, it can let the model learn about the prior background knowledge, which will lead to a distorted evaluation of the entity recognition performance. • Fake News Detection [3,12,40,57,112,130,145,149,160,179]: Articles and comments associated with news events that constitute a benchmark for the fake news detection task might be used as pre-training data, leading to a risk of BDC. An event is usually covered by more than one media outlet, and different media outlets may have different positions and languages. ...
Preprint
Full-text available
The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.
... W ramach syntaktycznych (generowanych) zbiorów danych, wykorzystuje się zazwyczaj zarówno topologie hierarchiczne jak i dowolne [14], [15]. [14], [15], [24], [73], [74]. ...
... Następnie dla kolejnych węzłów zdefiniowanych w zbiorze [73], [118], [121], [140] . Tabela (2)wyznaczenie progu podobieństwa będącego medianą miary stopnia wymaga ...
... Ponadto istnieją naukowe zbiory danych dotyczące wykrywania fałszywych wiadomości jak i badania ich propagacji.Do jednych z najpopularniejszych należą m.in. FakeNewsNet[73], PHEME[152] czy CoAID[153]. Dostępne tam dane zazwyczaj pochodzą z platformy Twitter, przedstawiają źródłowe tweety, reakcje na nie oraz powiązane z nimi komentarze.W ramach analizowanego przypadku wybrano nieoficjalną informacją dotyczącą zarażenia się przez Papieża Franciszka Covid-19 [154] w 2020 roku. ...
Thesis
Full-text available
W obliczu powszechnego dostępu oraz rosnącego wpływu sieci społecznościowych na formowanie opinii publicznej, problem identyfikacji źródeł propagacji fałszywych wiadomości staje się kwestią o kluczowym znaczeniu. Historia zna niejeden przykład, gdy precyzyjnie przygotowana kampania dezinformacyjna uzyskała zakładane rezultaty. Jako przykład może posłużyć panika społeczeństwa związana z rzekomym brakiem paliw na stacjach benzynowych, skutkując faktycznym jego brakiem i ogromnymi kolejkami na stacjach benzynowych. Celem pracy jest opracowanie zaawansowanych technik identyfikacji źródeł dezinformacji w celu kontroli i zapobiegania jej rozprzestrzenianiu, niwelując negatywne ich skutki Niniejsza rozprawa przeprowadza szczegółową analizę dostępnych rozwiązań problemu identyfikacji źródeł dezinformacji w sieciach społecznościowych. Zidentyfikowano ich mocne i słabe strony oraz zaproponowano usprawnienia. W tym zakresie dokonano innowacyjnego wkładu poprzez opracowanie dedykowanego narzędzia analityczno-symulacyjnego RPaSDT stanowiącego bazę dla przyszłych badań. Po raz pierwszy zdefiniowano problem identyfikacji ognisk propagacji jako kluczowy aspekt problemu identyfikacji źródeł fałszywych wiadomości, a także przeprowadzono pierwsze tak szczegółowe badania technik podziału sieci w kontekście identyfikacji źródeł. Na podstawie przeprowadzonych analiz zaproponowaną nową metodę detekcji ognisk - BLOCD, skuteczniejszą zarówno w lokalizacji ognisk propagacji, jak i detekcji źródeł. W kontekście zadania rekonstrukcji grafów propagacji, zaproponowana została technika SHNI, wykorzystująca analizę strukturalną sieci społecznościowych. Pomimo niskiej złożoności obliczeniowej, metoda ta uzyskała satysfakcjonujące wyniki. Ponadto, zidentyfikowano błąd w dotychczasowych założeniach o jednoźródłowej naturze ognisk propagacji. Zaproponowane zostało nowe podejście polegające na selekcji wielu źródeł w ramach jednego ogniska, co zwiększyło efektywność identyfikacji. Wartością dodaną tej rozprawy jest również potwierdzenie, że techniki zespołowe są bardziej skuteczne w identyfikacji źródeł dezinformacji niż pojedyncze metody. Zastosowanie opracowanych technik w rzeczywistym przypadku dezinformacyjnej kampanii dotyczącej rzekomego zachorowania Papieża Franciszka na Covid-19 potwierdziło ich efektywność. Wnioski z tej rozprawy stanowią znaczący krok naprzód w walce z dezinformacją, wprowadzając innowacyjne, empirycznie potwierdzone metody identyfikacji źródeł fałszywych informacji. Praca ta stanowi zatem solidny wkład w narzędzia i metody służące do zrozumienia i zwalczania mechanizmów rozprzestrzeniania dezinformacji w sieciach społecznościowych. ############################################################ In today's world, social networks play a pervasive role in shaping public opinion, making false news source identification critically important. History offers numerous examples where carefully orchestrated disinformation campaigns have achieved their intended impact. One such instance involves public panic caused by rumours of fuel shortages at gas stations, which ultimately led to actual shortages and massive queues. This work aims to propose a suite of techniques to identify the initiators of such disinformation more rapidly, enabling better control, prevention, and a deeper understanding of the mechanisms behind the spread of falsified information. This dissertation conducts a comprehensive review of existing solutions for the disinformation source detection in social networks. It identifies the strengths and weaknesses of these approaches, proposing some improvements. In this regard, an innovative contribution is made by developing a dedicated analytical-simulation tool, RPaSDT, which serves as a foundation for future research. For the first time, the problem rumour propagation outbreaks detection has been defined as a key aspect of the source detection problem. Moreover, the study conducts the first detailed examination of network partitioning techniques in the context of source identification. Based on these analyses, a new detection method - BLOCD - has been proposed, which is more effective in both locating propagation outbreaks and detecting sources. In the context of reconstructing propagation graphs, the SHNI technique has been proposed. It utilizes structural analysis of social networks. Despite its low computational complexity, it received satisfactory results. Additionally, the study challenges previous assumptions that propagation outbreaks are inherently single source in nature. A new approach involving the selection of multiple sources within a single outbreak has been proposed, thereby increasing the efficiency of identification. Moreover, this dissertation confirmed that ensemble techniques are more effective in identifying the sources of disinformation than individual methods. The effectiveness of the developed techniques has been confirmed in a real-world case of a disinformation campaign regarding the alleged illness of Pope Francis due to COVID-19. The findings of this dissertation offer a meaningful step forward in tackling disinformation by introducing tested, effective methods for tracing the origins of false information. As a result, this study provides valuable insights and tools for understanding and mitigating the spread of disinformation in social networks.
... A meme is a visual representation of either an image, a short animated image or video meant to convey humour and often used on social media. Some of the images in datasets used for fake news and harmful languages are actually memes with some having texts inserted within them [52], [76], [93], [99]. ...
... The crux in both works is the computation of semantic similarity between textual and visual features. [65] proposed an VOLUME 12, 2024 Twitter [13], [14] and Weibo [46] MVAE [51] Variational autoencoder T&I Late/Summation Twitter [13], [14] and Weibo [46] SpotFake [51] Fully-connected neural network T&I Intermediate/Concatenation Twitter [13], [14] and Weibo [46] [ [13], [14] and Weibo [46] MCNN [106] Attention-based BiGRU T&I Late/Summation MC [106], Twt [13], [14], PFact [93], TI [110] MCAN [105] Co-attention networks T&I Intermediate (progressive)/Concatenation Twitter [13], [14] and Weibo [46] EM-FEND [78] Co-attention transformer T&I Intermediate/Concatenation TI-CNN [110] and Weibo [46] CAFE [18] Attention ResNet and multichannel CNN T&I Intermediate/Multiplication Twitter [13], [14] and Weibo [46] TTEC [42] Contrastive learning T&I Early/Concatenation ReCOVery [114] MMFN [117] Textual and visual Transformers T&I Early/Concatenation GCop [93], Twitter [13] and Weibo [46] UCNet [72] LSTM T&V Intermediate/Concatenation VAVD [72] and FVC [73] FANVM [22] Adversarial networks T&V Early/Concatenation MYVC [22], VAVD [72] and FVC [73] FVDM [23] TextCNN/Attention-BiLSTM T&V Intermediate/Concatenation MYVC [22], VAVD [72] and FVC [73] [66] CNN/Memory Fusion Network A&V Late DFDC [28] and TIMIT [87] [24] Dissonance score A&V Late DFDC [28] and TIMIT [87] [116] CNN/RNN A&V Late Subset of DFDC [28] and FF++ [83] [48] Pretrained CNN-based models A&V Intermediate/Concatenation FakeAVCeleb [49] AVFakeNet [43] Dense swin transformer network A&V Late/Maximum FakeAVCeleb [49] AVForensics [118] CNN encoder/Contrastive learning A&V Intermediate (Progressive) FF++ [83], DFDC [28], DF [45] and FS [57] [86] EfficientNet/Time-Delay Neural Network A&V Hybrid FkAVCD [49], DFDC [28] and TIMIT [87] Multimodaltrace [82] MLP-Mixer layers A&V Intermediate/Summation FkAVCD [49], PDD [88] and WLDD [1] TikTec [90] Co-attention fusion T&A&V Intermediate (Progressive) COVID-19 Video Dataset [90] SV-FEND [77] Transformers T&A&V Intermediate (Progressive)/Concatenation FakeSV dataset [77] NEED [79] Transformers T&A&V Intermediate/Concatenation FakeSV dataset [77] approach based on Hierarchical Attention Network (HAN), image captioning and forensic analysis to tackle the task of fake news detection with specific focus on fake images in multimedia news contents. In addition they used headlines matching news contents with other algorithms such as Noise Variance Inconsistency (NVI) and Error Level Analysis (ELA) specifically to detect fake images. ...
... The crux in both works is the computation of semantic similarity between textual and visual features. [65] proposed an VOLUME 12, 2024 Twitter [13], [14] and Weibo [46] MVAE [51] Variational autoencoder T&I Late/Summation Twitter [13], [14] and Weibo [46] SpotFake [51] Fully-connected neural network T&I Intermediate/Concatenation Twitter [13], [14] and Weibo [46] [ [13], [14] and Weibo [46] MCNN [106] Attention-based BiGRU T&I Late/Summation MC [106], Twt [13], [14], PFact [93], TI [110] MCAN [105] Co-attention networks T&I Intermediate (progressive)/Concatenation Twitter [13], [14] and Weibo [46] EM-FEND [78] Co-attention transformer T&I Intermediate/Concatenation TI-CNN [110] and Weibo [46] CAFE [18] Attention ResNet and multichannel CNN T&I Intermediate/Multiplication Twitter [13], [14] and Weibo [46] TTEC [42] Contrastive learning T&I Early/Concatenation ReCOVery [114] MMFN [117] Textual and visual Transformers T&I Early/Concatenation GCop [93], Twitter [13] and Weibo [46] UCNet [72] LSTM T&V Intermediate/Concatenation VAVD [72] and FVC [73] FANVM [22] Adversarial networks T&V Early/Concatenation MYVC [22], VAVD [72] and FVC [73] FVDM [23] TextCNN/Attention-BiLSTM T&V Intermediate/Concatenation MYVC [22], VAVD [72] and FVC [73] [66] CNN/Memory Fusion Network A&V Late DFDC [28] and TIMIT [87] [24] Dissonance score A&V Late DFDC [28] and TIMIT [87] [116] CNN/RNN A&V Late Subset of DFDC [28] and FF++ [83] [48] Pretrained CNN-based models A&V Intermediate/Concatenation FakeAVCeleb [49] AVFakeNet [43] Dense swin transformer network A&V Late/Maximum FakeAVCeleb [49] AVForensics [118] CNN encoder/Contrastive learning A&V Intermediate (Progressive) FF++ [83], DFDC [28], DF [45] and FS [57] [86] EfficientNet/Time-Delay Neural Network A&V Hybrid FkAVCD [49], DFDC [28] and TIMIT [87] Multimodaltrace [82] MLP-Mixer layers A&V Intermediate/Summation FkAVCD [49], PDD [88] and WLDD [1] TikTec [90] Co-attention fusion T&A&V Intermediate (Progressive) COVID-19 Video Dataset [90] SV-FEND [77] Transformers T&A&V Intermediate (Progressive)/Concatenation FakeSV dataset [77] NEED [79] Transformers T&A&V Intermediate/Concatenation FakeSV dataset [77] approach based on Hierarchical Attention Network (HAN), image captioning and forensic analysis to tackle the task of fake news detection with specific focus on fake images in multimedia news contents. In addition they used headlines matching news contents with other algorithms such as Noise Variance Inconsistency (NVI) and Error Level Analysis (ELA) specifically to detect fake images. ...
Article
Full-text available
The detection of fake news and harmful languages has become increasingly important in today’s digital age. As the prevalence of fake news and harmful languages continue to increase, so also is the correspondent negative impact on individuals and the society. Researchers are exploring new techniques to identify and combat these issues. Deep neural network (DNN) has found a wide range of applications in diverse problem domains including but not limited to fake news and harmful languages detection. Fake news and harmful languages are currently increasing online and the mode of dissemination of these contents is fast changing from the traditional unimodal to multiple data forms including texts, audios, images and videos. Multimedia contents containing fake news and harmful languages pose more complex challenges than unimodal contents. The choice and efficacy of the fusion methods of the multimedia contents is one of the most challenging. Our area of focus is multimodal techniques based on deep learning that combines diverse data forms to improve detection accuracy. In this review, we delve into the current state of research, the evolution of deep learning techniques that have been proposed for multimodal fake news and harmful languages detection and the state-of-the-art (SOTA) multimedia data fusion methods. In all cases, we discuss the prospects, relationships, breakthroughs and challenges.