Conference PaperPDF Available

Improving Arabic Fake News Detection Using Optimized Feature Selection

Authors:

Abstract

It is of no doubt that the advent of social media has brought several important benefits. However, there have been also attempts of abusing social media in several ways, one of which is by distributing fake news. Fake news is able to change public opinion, and it is necessary to detect such attempts. Despite its importance, there is a lack of research work that has been done in this topic on Arabic posts. The few works that studied this topic in Arabic language did not give much attention to optimizing the feature selection process, which can play an important role in further improving the detection accuracy. This work further improves fake news detection performance by optimizing the feature selection phase. Experimental work has shown that such optimizing improved the detection accuracy for traditional machine learning methods.
Improving Arabic Fake News Detection Using
Optimized Feature Selection
Bilal Hawashin
Department of Artifical Intelligence
Alzaytoonah University of Jordan
Amman, Jordan
b.hawashin@zuj.edu.jo
Shadi AlZubi
Department of Computer Science
Alzaytoonah University of Jordan
Amman, Jordan
smalzubi@zuj.edu.jo
Ahmad Althunibat
Department of Software Engineering
Alzaytoonah University of Jordan
Amman, Jordan
a.thunibat@zuj.edu.jo
Tarek Kanan
Department of Artifical Intelligence
Alzaytoonah University of Jordan
Amman, Jordan
tarek.kanan@zuj.edu.jo
Yousef Sharrab
Department of Artifical Intelligence
Isra’a University
Amman, Jordan
sharrab@iu.edu.jo
Abstract It is of no doubt that the advent of social media
has brought several important benefits. However, there have
been also attempts of abusing social media in several ways, one
of which is by distributing fake news. Fake news is able to
change public opinion, and it is necessary to detect such
attempts. Despite its importance, there is a lack of research
work that has been done in this topic on Arabic posts. The few
works that studied this topic in Arabic language did not give
much attention to optimizing the feature selection process,
which can play an important role in further improving the
detection accuracy. This work further improves fake news
detection performance by optimizing the feature selection
phase. Experimental work has shown that such optimizing
improved the detection accuracy for traditional machine
learning methods.
Keywords Arabic Fake News Detection, Social Media,
Classification, Machine Learning, Data Science.
I. INTRODUCTION
With the advent of social media, the world has become a
small world. Social media has proved its ability to increase
connectivity. Furthermore, it has been used as an educating
tool, a mean of increasing awareness toward important issues,
building virtual communities, helping in noble causes, and
many more. No one can doubt the great benefits of social
media. However, it has been abused also in several ways, one
of the which is by spreading fake news.
Spreading fake news in social media is a new phenomenon
that aims at changing public opinion toward some issue. Such
fake news can manipulate people opinions to serve the
interests of individuals. Therefore, this topic has been gaining
more and more attention in the recent years. Thanks to the
artificial intelligence era, several solutions have bee used to
detect such fake posts automatically, one of them is via text
classification.
Text classification is the process of labeling text using one
or more predefined label. It has several applications in wide
range of domains such as in sentiment analysis, document
classification, author authentication, and many more.
Although several works have been proposed in the recent
years to detect fake news in English language via
classification such as[1][2][3][4][5], very few works have
been proposed to detect Arabic fake news despite its
importance. It is of no doubt that it is necessary to detect such
posts, and the lack of works in this direction can be due to the
lack of Arabic datasets and the lack of attention to this
important issue. Even the works that tackled this issue in
Arabic language have some limitations such as not giving
much attention to the feature extraction process despite its
importance. It is clear that feature selection is one of the
important phases in natural language processing, and such
phase can play a vital role in improving the accuracy and
reducing time. Although some previous works proposed
solutions based on deep learning, these solutions come with a
time and computational cost. It would be much better to
optimize the accuracy of the traditional, less time consuming,
methods.
In this work, we optimize the feature selection phase and
compare the optimized performance with the original one. As
part of this work, we use several classification methods, which
are Logistic Regression, Support Vector Machines(SVM) ,
Random Forest, K Nearest Neighbor(KNN), Naïve Bayes,
and AraBERT. We used a publicly available dataset of Arabic
fake news [21]. As for the evaluation measurements, we used
recall, precision, and F1, which are commonly used in the
classification evaluation process.
The contributions of this paper are as follows.
Further improving the Arabic fake news detection by
optimizing feature selection phase.
Increasing the awareness of this important issue in
Arabic language.
The remainder of the paper is as the following. Section two
is the literature review. Section three has the methodology.
Section four has the experimental works and the discussion,
while section five has the conclusions and the future works.
II. LITERATURE REVIEW
In the field of fake news detection several studies have
been conducted. In this section, we present the important
works in this field, and then we provide the limitations of
them.
[1] discussed different characteristics and types of fake news
and insisted on the importance of handling them properly.
Furthermore, it proposed a fake news detection algorithm for
OSM networks. The best achieved accuracy was 93% when
using bi-LSTM.
[2] compared several machine learning approaches, natural
language processing techniques, and social network analysis
methods. Furthermore, they made a thorough survey of
different means used for fake news identification and
mitigation.
[3] proposes an intelligent approach to recognize rumors on
blogging websites. It benefits from time series information
from social media websites such as user comments and
retweet dynamics, in order to enhance the performance of
rumor detection.
[4] tackled the fake news problem from a new perspective by
using different propagation characteristics, textual features,
and social features. Next, it evaluated several machine
learning methods according to their performance in detecting
fake news based on these features.
[5] handled the issue of domain biases when detection fake
news. In this issue, the trained classifier do not work well and
do not detect fake news if the domain is not known. Therefore,
they proposed a solution for cross domain detection. They
suggested the use of paired news to improve the accuracy in
this scenario.
[6] proposed also a solution for multi-domain fake news
detection. As part of their work, they used history news
environment perception framework, which played an
important role in improving the accuracy.
[7] proposed a model named Modality and Event Adversarial
Networks to detect fake news. This work concentrated on the
multi-modality case and how to learn efficiently when text,
images, and other modalities exist together.
[8] provided a survey on the methods proposed recently to
detect fake news using machine learning. They also proposed
a hybrid method composed of Naïve Bayes and LSTM for this
sake.
[9] surveyed the recent works in this domain and discussed the
challenges and the opportunities for improvements.
[10] proposed the use of multi-layer Bi-LSTM for fake news
detection.
[11]adopted the use of sequential models, specifically
recurrent neural network (RNN) architecture, to detect rumors
in microblogging platforms by finding data temporal
dependencies.
[12] proposed the use of convolutional neural network (CNN)
and argued on its importance and efficiency in the domain of
fake speech recognition. As part of their work, they used both
content-based features and user-based features.
As for Arabic fake news detection, a few works were proposed
in this direction. For example, [13] proposed a multi-modal
fake news detection in Arabic language. They used
MARBERTv2 for textual feature extraction and a
combination of VGG-19 and RESNET50 for visual feature
extraction. In their experimental results, textual features
proved their efficiency in this task.
[14] proposed a novel rumor detection for Arabic language.
They compared ARABERT and MARBERT for this task.
[15] Introduced a novel dataset for fake news training. The
dataset was related to Covid_19 and it was extracted from
facebook and twitter. In their work, they also compared the
performance of two pretrained models; BERT and
ARABERT. The experimental results showed that
ARABERT outperformed BERT.
[16] provided analysis on the challenges facing this issue in
Arabic language. They found that the most studied platform in
Arabic language was Twitter, recommending more studies on
other platforms. Moreover, they recommended more works on
dialects.
[17] provided a dataset composed of real and fake news for
training purposes In the field of Covid-19. They argued on the
importance of detecting such fake news due to their negative
effect in changing public opinion.
[18] proposed the use of BERT-based method in detecting
fake news. The proposed method proved its efficiency in
detecting fake news in Arabic language.
[19] proposed the first large dataset in this domain. The dataset
was composed of around 600000 articles. They used both
traditional machine learning and deep learning methods.
[20] proposed the use of text analysis in order to detect fake
news.
III. METHODOLOGY
From the literature review, it was clear that despite having
several works proposed in the direction of English fake news
detection, only a few works were proposed to improve Arabic
fake news detection. Furthermore, these works did not give
much attention to the optimization of feature selection. Some
works showed the superiority of deep learning models such
as[14][15], however, this efficiency has its own cost. Deep
learning models are very complex, require much more training
time, and calculation power. Therefore, this motivates our
work to find an efficient feature selection method for fake
Arabic news detection.
In this work, we optimize one of the well known feature
selection methods; namely chi square. The classification
performance using various classifiers is tested after all the
combinations of the aforementioned methods. These results
are compared with the classification without using
preprocessing.
In general, the preprocessing is composed of several tasks
including the following:
1- Tokenization: which splits the text into words.
2- Stopword Removal: which removes non-important
terms.
3- Normalization: which unifies the same word that is
written In different forms by removing Harakat or
Hamza for example.
4- Stemming: to find the stem of each term.
5- Feature selection: to select part of the terms instead of
using all the terms.
As stated earlier, we are comparing the performance of
various classifiers based on various combinations of feature
selection. As for the classifiers, we use SVM, Naïve Bayes,
Logistic Regression, KNN, Random Forest, and AraBERT.
As for the and for the feature selection, we use chi square.
The results of classification are provided in the experimental
section.
IV. EXPERIMENTS
A. Data set
The used dataset is from [21]. It is composed of 606912
posts from 134 different sources. Misbar was used to
annotate the data into credible, not credible, and not
sure. In our experiments, the first two labels was used,
whereas the system had a high confidence in the
classification process. We used a balanced subset of the
dataset composed of 30000 fake news post and 30000
real news post.
B. Evaluation Measurements
As for the evaluation measurements, we adopted the use of
recall, precision, and the F1 measurement. They are defined
as follows.
Recall: it is defined as the ratio of true positive over
true positive and false negative. It measures the
ability of the classifier to correctly recognise all
those records of the target label.
Precision: It is defined as the ration of the true
positive over true positive and false positive. It
measures the probability of incorrectly assigning a
record to the target label.
F1 Measurement: It is used to measure the the
harmonic mean between the recall and the precision
and is given in Equation 1.
𝐹1 = 2∗𝑅∗𝑃
𝑅+𝑃 (1)
C. Experimental Settings
For our experiments, we used an Intel® Core i7_8550U 1.8
GHz CPU and 16GB RAM, with Microsoft Windows 10
Operating System. Also, we used Python Jupyter notebook
for the implementations of the classifiers.
D. The Compared Classifiers
The following subsections provide more information about
the compared classifiers.
Support Vector Machines
This classifier has been used widely in the literature
due to its superior accuracy and relatively fast
learning time. It uses the support vectors as a
discriminative tool to find the best hyperplane
between the given classes. SVM can be either linear
or nonlinear based on the type of the hyperplane.
Logistic Regression
This classifier uses regression concept to predicts the
probability of the positive label given the input data. It
is considered the base of the artificial neural network
classifier.
K Nearest Neighbor
This classifier belongs to the lazy classifiers as it does
not learn a model. Instead, whenever a testing record
arrives, it assigns the label of the closest k training
records.
Random Forest
This classifier has gained wide attention in the
literature due to its relatively high accuracy. It uses the
concept of bootstrapping to generate several datasets
from the original one, and uses bagging to build
different tree models from these datasets. Finally, it
uses the major voting to find the final decision.
Naïve Bayes
This classifier finds the probability for each label based
on the given data. For this sake, it uses bayes theorem
that provides the posterior probability. This classifier
assumes that features are independent. This justifies
the name of the classifier. It is well known for its very
fast training time even when the data size is large.
E. Experimental Results and Discussion
As for the dataset, and as mentioned earlier, we used a
balanced dataset of 30000 records for each of the two labels
(fake, real). Next, we removed stopwords and applied data
normalization by removing the punctuations, duplicate
letters, and all the harakat. This step would unify the
appearance of the same term. Next, we applied TF.IDF
vectorizer to find the weight matrix for each term in each
post. We selected the first 10,000 features and removed
features that appeared in less than three documents. Next is
to optimize the classifiers as optimization plays a vital role in
the results. In order to optimize the classifiers, we used
GridSearchCV in python for the optimization. We used cross
validation with 10 folds as it has been used widely In the
literature and proved its efficiency.
TABLE I. TRAINING AND CLASSIFICATION TIME OF VARIOUS
CLASSIFIERS USING 512 USERS
Classifier
Best Parameters
SVM
C = 0.1, Kernel = RBF, gamma = 1
Logistic
Regression
C = 0.5
Naïve Bayes
-
RF
N_estimators = 100
KNN
K = 10
AraBERT
Default
After the optimization process, we compared the classifiers
using their best parameters according to their performance in
detecting Arabic fake news. The optimized parameters are
provided in Table 1. The results of applying the classification
methods is provided in Table II. It is noted that these results
are without using feature selection. These are the baseline
results.
TABLE II. ACCURACIES OF VARIOUS CLASSIFIERS USING WITHOUT
FEATURE SELECTION
Classifier
Recall
Precision
F1
SVM
0.94
0.96
0.95
Logistic
Regression
0.93
0.93
0.93
Naïve Bayes
0.86
0.86
0.86
RF
0.93
0.93
0.93
KNN
0.88
0.88
0.88
AraBERT
0.96
0.98
0.97
From Table II, it can be obviously noted that deep learning
model AraBERT outperformed traditional machine learning
methods, with an F1 reached 0.97. The best performance
among traditional machine learning methods was for SVM.
Both results were expected as deep learning and SVM
proved their high performance in the literature. The worst
performances were for KNN and NB as the former does not
learn a model and the latter merely depends on a probability
model.
Next, we conducted a set of experiments to optimize the chi
square feature selection method by finding the best number
of features. It is well known that different number of
features can lead to different F1 score in the classification
phase. Therefore, we used several values for the number of
features ranging from 200 to 1200. For each value, we
performed feature selection and classification using SVM
and found the F1 score. The results are provided as follows.
TABLE III. OPTIMIZING THE SELECTED NUMBER OF FEATURES FOR CHI
SQUARE BY FINDING F1 FOR SVM CLASSIFIER USING DIFFERENT VALUES
Feature
Selection
Method
200
600
800
1000
1200
Chi
Squared
0.871
0.929
0.946
0.958
0.953
From the table, it can be noted that the F1 score tends to
increase exponentially at the beginning and becomes more
stable with larger number of features. We noted that at 1000
features, the performance outperformed the baseline SVM,
with F1 score of 0.958. This performance starts to degrade
later on. The surge of performance can be due to the
elimination of noisey columns that existed in the baseline
SVM. When eliminated, the best performance was attained.
However, when more features were added, which tended to
be more noisy, the performance started to degrade. Table IV
compares the baseline SVM with both the SVM after feature
selection and AraBERT.
TABLE IV. COMPARING THE BASELINE PERFORMANCE WITH THE
OPTIMIZED PERFORMANCE USING FEATURE SELECTION
Classifier
Recall
Precision
F1
AraBERT
0.96
0.98
0.97
SVM
Baseline
0.94
0.96
0.95
SVM + Chi
0.943
0.969
0.958
It is clear from the table that the best performance was for
AraBERT deep learning method. However, this method has
a high computational cost training time. The difference in
F1 score between optimized SVM after feature selection and
AraBERT was1.5%. However, this difference is not the only
factor that must be considered. Although the accuracy of
optimized SVM is less than that of deep learning, the
optimized SVM has much improved training time than that
of AraBERT. Therefore, it is up to the domain to select the
best track to conduct. If accuracy is needed regardless of the
model complexity nor the training time, deep learning
would be the first golden option. If the model complexity
and the training time are key factors for the domain, feature
selection would provide a much improvement in training
time and model complexity with a slight decrease in
accuracy. Therefore, despite the importance of further
improving deep learning methods, it is equally important to
shed more light on optimizing simple models to gain the
optimal performance.
V. CONCLUSIONS AND FUTURE WORKS
In this work, we proposed an optimized fake news
classification method for Arabic text. Experimental work
showed that optimizing feature selection can improve the
performance of fake news classification in comparison with
no feature selection, and such performance can be close to
that of deep learning methods with much improvement in
model complexity and training time.
Future work can be conducted optimize other parts of the
preprocessing phases. Furthermore, more studies are needed
to provide more Arabic fake news datasets and to direct more
works toward the detection of such important issue.
REFERENCES
[1] X. Jose, S.M. Kumar, & P. Chandran, Characterization,
Classification and Detection of Fake News in Online Social
Media Networks. In 2021 IEEE Mysore Sub Section
International Conference (MysuruCon) ,pp. 759-765, 2021.
[2] K. Sharma, F. Qian, H. Jiang, N. Ruchansky, M. Zhang & Y.
Liu, “Combating fake news: A survey on identification and
mitigation techniques,” ACM Transactions on Intelligent
Systems and Technology (TIST), vol 10, no 3, 1-42, 2019.
[3] J. Ma, W. Gao, Z. Wei, Y. Lu, & K. F. Wong, “Detect rumors
using time series of social context information on
microblogging websites,” In Proceedings of the 24th ACM
international on conference on information and knowledge
management, pp. 1751-1754, 2015.
[4] K. Shu, A. Sliva, S. Wang, J. Tang, & H. Liu, “Fake news
detection on social media: A data mining perspective,” ACM
SIGKDD explorations newsletter, vol. 19, no. 1, pp. 22-36,
2017.
[5] S. Kato, L. Yang, & D. Ikeda, “Domain Bias in Fake News
Datasets Consisting of Fake and Real News Pairs,” In 2022
12th International Congress on Advanced Applied Informatics
(IIAI-AAI) pp. 101-106, 2022.
[6] W. Yu, J. Ge, Z. Yang, Y. Dong, Y., Zheng, & H. Dai, “Multi-
domain Fake News Detection for History News Environment
Perception,” In 2022 IEEE 17th Conference on Industrial
Electronics and Applications (ICIEA), pp. 428-433, 2022.
[7] P. Wei, F. Wu, Y. Sun, H. Zhou, & X.Y. Jing, “Modality and
Event Adversarial Networks for Multi-Modal Fake News
Detection,” IEEE Signal Processing Letters, vol. 29, pp. 1382-
1386, 2022.
[8] D. ohera, et al. “A taxonomy of fake news classification
techniques: Survey and implementation aspects,” IEEE
Access, vol 10, pp. 30367-30394, 2022.
[9] X. Zhou & R. Zafarani. “A survey of fake news: Fundamental
theories, detection methods, and opportunities,” ACM
Computing Surveys (CSUR), vol. 53, no. 5,pp. 1-40, 2020.
[10] A. R. Merryton, & M. G. Augasta, A Novel Framework for
Fake News Detection using Double Layer BI-LSTM,”. In 2023
5th International Conference on Smart Systems and Inventive
Technology (ICSSIT) , pp. 1689-1696, 2023.
[11] J. Ma, W. Gao, P. Mitra, S. Kwon, B.J. Jansen, K.F. Wong, M.
Cha “Detecting rumors from microblogs with recurrent neural
networks”, 3818, 2016.
[12] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, & P.S.Y. TI-CNN,
“Convolutional neural networks for fake news
detection,”.arXiv preprint arXiv:1806.00749, vol 2, no. 6,
2018.
[13] R. M. Albalawi, A. T. Jamal, A. O. Khadidos, & A. M.
Alhothali, “Multimodal Arabic Rumors Detection,”. IEEE
Access, vol. 11, pp. 9716-9730, 2023.
[14] N.O. Bahurmuz, G. A. Amoudi, F. Baothman, A. T. Jamal, H.
S. Alghamdi, & A. M. Alhothali, A. M. “Arabic Rumor
Detection Using Contextual Deep Bidirectional Language
Modeling,” IEEE Access, vol. 10, pp. 114907-114918, 2022.
[15] S. B. Ali, Z. Kechaou, & A. Wali, A., “Arabic fake news
detection in social media Based on AraBERT,” In 2022 IEEE
21st International Conference on Cognitive Informatics &
Cognitive Computing (ICCI* CC) pp. 214-220, 2022.
[16] H. Rahab, A. Zitouni, & M. Djoudi, “Arabic Fake News and
Spam Handling: Methods, Resources and Opportunities,”
In 2021 International Conference on Artificial Intelligence for
Cyber Security Systems and Privacy (AI-CSP) pp. 1-7,2021
[17] D. Mohdeb, M. Laifa, & M. Naidja, “An Arabic Corpus for
Covid-19 related Fake News,” In 2021 International
Conference on Recent Advances in Mathematics and
Informatics (ICRAMI) , pp. 1-5, 2021.
[18] W. Shishah, “JointBert for Detecting Arabic Fake
News,” IEEE Access, vol. 10, pp. 71951-71960, 2022.
[19] A. Khalil, M. Jarrah, M. Aldwairi, and Y. Jararweh, “Detecting
arabic fake news using machine learning,” In 2021 Second
International Conference on Intelligent Data Science
Technologies and Applications (IDSTA) , pp. 171-17, 2021.
[20] H.T. Himdi, & F. Y. Assiri, “Development of Classification
Model based on Arabic Textual Analysis to Detect Fake News:
Case Studies,” In 2023 1st International Conference on
Advanced Innovations in Smart Cities (ICAISC), pp. 1-6,
2023.
[21] A. Khalil, M. Jarrah, and M. Aldwairi, Arabic Fake News
Dataset (AFND), Accessed May 2023.
... Wotaifi and Dhannoon [30] proposed a deep learning-based hybrid model to improve the detection of Arabic fake news with 91.4% accuracy. Hawashin et al. [31] applied feature selection for Arabic fake news detection. These findings collectively emphasize the necessity for culturally informed strategies to effectively address the intricacies of misinformation in the Arab world. ...
Article
Full-text available
The proliferation of fake news poses a substantial and persistent threat to information integrity, necessitating the development of robust detection mechanisms. In response to this challenge, this research specifically focuses on the detection of Arabic fake news, employing a sophisticated approach that leverages textual features and a powerful stacking classifier. The proposed model ingeniously combines bagging, boosting, and baseline classifiers, strategically harnessing the unique strengths of each to create a resilient ensemble. Through a series of extensive experiments and the integration of Embeddings from Language Models (ELMO) word embedding, the proposed approach achieves remarkable results in the realm of Arabic fake news detection. The model’s effectiveness is further heightened by the utilization of advanced stacking techniques, coupled with meticulous textual feature extraction. This capability enables the model to effectively distinguish between real and fake news in Arabic, highlighting its potential to enhance the accuracy of information. The findings of this study hold significant implications for the field of fake news detection, especially within the context of the Arabic language. The proposed model emerges as a valuable tool, contributing to the enhancement of information veracity and fostering a more informed public discourse in the face of misinformation challenges. Furthermore, the excellence of the proposed model is substantiated by its outstanding performance metrics, boasting a 99% accuracy, precision, recall, and F-score. This substantiation is underscored through a comprehensive performance comparison with other state-of-the-art models, affirming the model’s reliability in the domain of Arabic fake news detection.
... The study shows that the combination of different ensemble approaches can effectively enhance the performance of detection. An optimized feature selection-based approach is presented in (Hawashin et al. 2023) for identifying fake information from Arabic news. Six different ML models were employed for classification. ...
Article
Full-text available
The increased propagation of fake news is the significant concern in the digital era. Identification of fake news from social media platforms is critical to strengthen public trust and ensure social stability. This research presents an effective and accurate framework for identifying fake news that combines different steps of natural language processing (NLP) technique along with a neural network architecture. A novel semantic veracity enhancement (SVE) classifier is designed and implemented in this work for detecting fake news. The proposed approach leverages the effectiveness of sentiment analysis for identifying misleading or deceptive content and its subsequent implications on the sentiment and behaviour of social media users. A BERT model is used in this research for analysing the sentiments and classifying the texts from the social media platform. By examining the sentiments, the SVE classifier differentiates between real news and fabricated content. To achieve this, three different datasets comprising both actual content and fabricated (tweaked) tweets are employed for training the SVE classifier. The potentiality of the SVE classifier is evaluated and compared with different optimization techniques. The outcome of the experimental analysis shows that the proposed approach exhibits an excellent performance in terms of classifying misinformation from the original information with an outstanding accuracy of 99% compared to other state of art methods.
... Study [21] by Bilal Hawashin further improves fake news detection performance by optimizing the feature selection phase. Empirical work has shown that such optimization improved the detection accuracy for traditional machine learning methods. ...
Article
The quick spread of fake news in different languages on social platforms has become a global scourge threatening societal security and the government. Fake news is usually written to deceive readers and convince them that this false information is correct; therefore, stopping the spread of this false information becomes a priority of governments and societies. Building fake news detection models for the Arabic language comes with its own set of challenges and limitations. Some of the main limitations include 1) lack of annotated data, 2) dialectal variations where each dialect can vary significantly in terms of vocabulary, grammar, and syntax, 3) morphological complexity with complex word formations and root-and-pattern morphology, 4) semantic ambiguity that make models fail to accurately discern the intent and context of a given piece of information, 5) cultural context and 6) diacrasy. The objective of this paper is twofold: first, we design a large corpus of annotated fake new data for the Arabic language from multiple sources. The corpus is collected from multiple sources to include different dialects and cultures. Second, we build fake detection by building machine learning models as model head over the fine-tuned large language models. These large language models were trained on Arabic language, such as ARBERT, AraBERT, CAMeLBERT, and the popular word embedding technique AraVec. The results showed that the text representations produced by the CAMeLBERT transformer are the most accurate because all models have outstanding evaluation results. We found that using the built deep learning classifiers with the transformer is generally better than classical machine learning classifiers. Finally, we could not find a stable conclusion concerning which model works well with each text representation method because each evaluation measure has a different favored model.
Article
Full-text available
In the field of Artificial Intelligence (AI), Smart Enterprise Management Systems (Smart EMS) and big data analytics are the most prominent computing technologies. A key component of the Smart EMS system is E-commerce, especially Session-based Recommender systems (SRS), which are typically utilized to enhance the user experience by providing recommendations analyzing user behavior encoded in browser sessions. Also the work of the recommender is to predict users’ next actions (click on an item) using the sequence of actions in the current session. Current developments in session-based recommendation have primarily focused on mining more information accessible within the current session. On the other hand, those approaches ignored sessions with identical context for the current session that includes a wealth of collaborative data. Therefore this paper proposed Context-aware and Click session-based graph pattern mining with recommendations for Smart EMS through AI. It employs a novel Triple Attentive Neural Network (TANN) for SRS. Specifically, TANN contains three main components, i.e., Enhanced Sqrt-Cosine Similarity based Neighborhood Sessions Discovery (NSD), Frequent Subgraph Mining (FSM) using Neighborhood Click session-based graph pattern mining and Top-K possible Next-clicked Items Discovery (TNID). The NSD module uses a session-level attention mechanism to find m most similar sessions of the query session, and the FSM module also extracts the frequent subgraphs from the already discovered m most similar sessions of the query session via item-level attention. Then, TNID module is used to discover the top-K possible next-clicked items using the NSD and FSM module via a target-level attention. Finally, we perform comprehensive experiments on one big dataset, DIGINETICA, to verify the effectiveness of the TANN model, and the results of this experiment clearly illustrate the performance of TANN.
Article
Full-text available
Recently, the use of social media platforms has increased with ease of use and fast accessibility, making such platforms a place of rumor proliferation owing to a lack of posting constraints and content authentication. Therefore, there is a need to leverage Artificial intelligence techniques to detect rumors on social media platforms to prevent their adverse effects on society and individuals. Most existing works that detect rumors in Arabic only target the textual features of the tweet content. Nevertheless, tweets contain different types of content such as (text, images, videos, and URLs), and the visual features of tweets play an essential role in rumor diffusion. This study proposes an Arabic rumor detection model to detect rumors on Twitter from textual and visual image features through two types of multimodal fusion: early and late fusion. In addition, we leveraged the transfer learning of the pre-trained language and vision models. Different experiments were conducted to select the best textual and visual feature extractors for building a multimodal model. MARBERTv2 was used as a textual feature extractor, whereas the ensemble of VGG-19 and ResNet50 was used as a visual feature extractor to build a multimodal model. Subsequently, the language and vision models of the single models were used as a baseline to compare their results with those of the multimodal models. Finally, the experimental results demonstrate the effectiveness of textual features in rumor detection tasks compared with multimodal models.
Article
Full-text available
In today’s world, news outlets have changed dramatically; newspapers are obsolete, and radio is no longer in the picture. People look for news online and on social media, such as Twitter and Facebook. Social media contributors share information and trending stories before verifying their truthfulness, thus, spreading rumors. Early identification of rumors from social media has attracted many researchers. However, a relatively smaller number of studies focused on other languages, such as Arabic. In this study, an Arabic rumor detection model is proposed. The model was built using transformer-based deep learning architecture. According to the literature, transformers are neural networks with outstanding performance in natural language processing tasks. Two transformers-based models, AraBERT and MARBERT, were employed, tested, and evaluated using three recently developed Arabic datasets. These models are extensions to the BERT, Bidirectional Encoder Representations from Transformers, a deep learning model that uses transformer architecture to learn the text representations and leverages the attention mechanism. We have also mitigated the challenges introduced by the imbalanced training datasets by employing two sampling techniques. The experimental results of our proposed approaches achieved 0.97 accuracy. This result demonstrated the effectiveness of the proposed method and outperformed other existing Arabic rumor detection methods.
Article
Full-text available
The rapid rise in the use of social media platforms has resulted in a recent surge of fake rumours, particularly among Arab countries. Such false information could potentially be detrimental to individuals and society. Detecting and blocking the spread of the fraudulent news in Arabic is critical. Many artificial intelligence algorithms, including contemporary transformer models, such as BERT, have been employed to detect the fake news in the past. Therefore, the fake news in Arabic can be detected using a revolutionary combined BERT architecture implemented in this paper. Extensive experiments were conducted to test the technique on real-world Arabic fake news datasets. In two of the fake news datasets, covid19fakes and Satirical, the suggested technique had a higher accuracy score than the current state-of-the-art Arabic fake news model. A comparable result can be achieved in other datasets; however, the proposed strategy fails to do so. All datasets except AraNews show an average F1 score improvement of 10% by implementing the proposed strategy. It was found that the proposed method was effective and superior to numerous other baselines of Arabic fake news detection models.
Conference Paper
Fake news is an information that has been carefully manipulated to mislead readers by using false facts and figures. Since the introduction of the Internet and social media, fake news has grown to be a significant problem. Identifying fake news has become an important area of research in Natural Language Processing (NLP). The key challenge is determining the veracity of news stories. There is an increasing difficulty in studying and designing a technological strategy to combat fake news without compromising speed and collaborative access to high-quality information. Despite the fact that various technologies have been developed to assist in the detection of false news, and despite significant breakthroughs, identifying fake news stays ineffective. In this research, a new framework has been proposed that utilizes Porter Stemmer, TF-IDF vectorizer for pre-processing and double layer Bi-LSTM for extracting the refined features to obtain better learning. In this model, initially, the summarized input vector is formed by concatenating the most relevant text attributes such as headlines, news for further process. The performance of the proposed model has been justified by evaluating its performance on three experimental datasets namely Kaggle fake_real_news, Liar and Politifact Fake_Real.
Article
With the popularity of news on social media, fake news has become an important issue for the public and government. There exist some fake news detection methods that focus on information exploration and utilization from multiple modalities, e.g., text and image. However, how to effectively learn both modality-invariant and event-invariant discriminant features is still a challenge. In this paper, we propose a novel approach named Modality and Event Adversarial Networks (MEAN) for fake news detection. It contains two parts: a multi-modal generator and a dual discriminator. The multi-modal generator extracts latent discriminant feature representations of text and image modalities. A decoder is adopted to reduce information loss in the generation process for each modality. The dual discriminator includes a modality discriminator and an event discriminator. The discriminator learns to classify the event or the modality of features, and network training is guided by the adversarial scheme. Experiments on two widely used datasets show that MEAN can perform better than state-of-the-art related multi-modal fake news detection methods.