Conference PaperPDF Available

Improving Arabic Fake News Detection Using Optimized Feature Selection

June 2023

June 2023

DOI:10.1109/ICIT58056.2023.10225974

Conference: Improving Arabic Fake News Detection Using Optimized Feature Selection
At: IEEE Conference

Authors:

Shadi AlZu'bi

Al-Zaytoonah University of Jordan

Ahmad Al Thunibat

Al-Zaytoonah University of Jordan

Tarek Kanan

Al-Zaytoonah University of Jordan

Show all 5 authorsHide

It is of no doubt that the advent of social media has brought several important benefits. However, there have been also attempts of abusing social media in several ways, one of which is by distributing fake news. Fake news is able to change public opinion, and it is necessary to detect such attempts. Despite its importance, there is a lack of research work that has been done in this topic on Arabic posts. The few works that studied this topic in Arabic language did not give much attention to optimizing the feature selection process, which can play an important role in further improving the detection accuracy. This work further improves fake news detection performance by optimizing the feature selection phase. Experimental work has shown that such optimizing improved the detection accuracy for traditional machine learning methods.

Content uploaded by Yousef Sharrab

Content may be subject to copyright.

Improving Arabic Fake News Detection Using

Optimized Feature Selection

Bilal Hawashin

Department of Artifical Intelligence

Alzaytoonah University of Jordan

Amman, Jordan

b.hawashin@zuj.edu.jo

Shadi AlZu’bi

Department of Computer Science

Alzaytoonah University of Jordan

Amman, Jordan

smalzubi@zuj.edu.jo

Ahmad Althunibat

Department of Software Engineering

Alzaytoonah University of Jordan

Amman, Jordan

a.thunibat@zuj.edu.jo

Tarek Kanan

Department of Artifical Intelligence

Alzaytoonah University of Jordan

Amman, Jordan

tarek.kanan@zuj.edu.jo

Yousef Sharrab

Department of Artifical Intelligence

Isra’a University

Amman, Jordan

sharrab@iu.edu.jo

Abstract— It is of no doubt that the advent of social media

has brought several important benefits. However, there have

been also attempts of abusing social media in several ways, one

of which is by distributing fake news. Fake news is able to

change public opinion, and it is necessary to detect such

attempts. Despite its importance, there is a lack of research

work that has been done in this topic on Arabic posts. The few

works that studied this topic in Arabic language did not give

much attention to optimizing the feature selection process,

which can play an important role in further improving the

detection accuracy. This work further improves fake news

detection performance by optimizing the feature selection

phase. Experimental work has shown that such optimizing

improved the detection accuracy for traditional machine

learning methods.

Keywords— Arabic Fake News Detection, Social Media,

Classification, Machine Learning, Data Science.

I. INTRODUCTION

With the advent of social media, the world has become a

small world. Social media has proved its ability to increase

connectivity. Furthermore, it has been used as an educating

tool, a mean of increasing awareness toward important issues,

building virtual communities, helping in noble causes, and

many more. No one can doubt the great benefits of social

media. However, it has been abused also in several ways, one

of the which is by spreading fake news.

Spreading fake news in social media is a new phenomenon

that aims at changing public opinion toward some issue. Such

fake news can manipulate people opinions to serve the

interests of individuals. Therefore, this topic has been gaining

more and more attention in the recent years. Thanks to the

artificial intelligence era, several solutions have bee used to

detect such fake posts automatically, one of them is via text

classification.

Text classification is the process of labeling text using one

or more predefined label. It has several applications in wide

range of domains such as in sentiment analysis, document

classification, author authentication, and many more.

Although several works have been proposed in the recent

years to detect fake news in English language via

classification such as[1][2][3][4][5], very few works have

been proposed to detect Arabic fake news despite its

importance. It is of no doubt that it is necessary to detect such

posts, and the lack of works in this direction can be due to the

lack of Arabic datasets and the lack of attention to this

important issue. Even the works that tackled this issue in

Arabic language have some limitations such as not giving

much attention to the feature extraction process despite its

importance. It is clear that feature selection is one of the

important phases in natural language processing, and such

phase can play a vital role in improving the accuracy and

reducing time. Although some previous works proposed

solutions based on deep learning, these solutions come with a

time and computational cost. It would be much better to

optimize the accuracy of the traditional, less time consuming,

methods.

In this work, we optimize the feature selection phase and

compare the optimized performance with the original one. As

part of this work, we use several classification methods, which

are Logistic Regression, Support Vector Machines(SVM) ,

Random Forest, K Nearest Neighbor(KNN), Naïve Bayes,

and AraBERT. We used a publicly available dataset of Arabic

fake news [21]. As for the evaluation measurements, we used

recall, precision, and F1, which are commonly used in the

classification evaluation process.

The contributions of this paper are as follows.

• Further improving the Arabic fake news detection by

optimizing feature selection phase.

• Increasing the awareness of this important issue in

Arabic language.

The remainder of the paper is as the following. Section two

is the literature review. Section three has the methodology.

Section four has the experimental works and the discussion,

while section five has the conclusions and the future works.

II. LITERATURE REVIEW

In the field of fake news detection several studies have

been conducted. In this section, we present the important

works in this field, and then we provide the limitations of

them.

[1] discussed different characteristics and types of fake news

and insisted on the importance of handling them properly.

Furthermore, it proposed a fake news detection algorithm for

OSM networks. The best achieved accuracy was 93% when

using bi-LSTM.

[2] compared several machine learning approaches, natural

language processing techniques, and social network analysis

methods. Furthermore, they made a thorough survey of

different means used for fake news identification and

mitigation.

[3] proposes an intelligent approach to recognize rumors on

blogging websites. It benefits from time series information

from social media websites such as user comments and

retweet dynamics, in order to enhance the performance of

rumor detection.

[4] tackled the fake news problem from a new perspective by

using different propagation characteristics, textual features,

and social features. Next, it evaluated several machine

learning methods according to their performance in detecting

fake news based on these features.

[5] handled the issue of domain biases when detection fake

news. In this issue, the trained classifier do not work well and

do not detect fake news if the domain is not known. Therefore,

they proposed a solution for cross domain detection. They

suggested the use of paired news to improve the accuracy in

this scenario.

[6] proposed also a solution for multi-domain fake news

detection. As part of their work, they used history news

environment perception framework, which played an

important role in improving the accuracy.

[7] proposed a model named Modality and Event Adversarial

Networks to detect fake news. This work concentrated on the

multi-modality case and how to learn efficiently when text,

images, and other modalities exist together.

[8] provided a survey on the methods proposed recently to

detect fake news using machine learning. They also proposed

a hybrid method composed of Naïve Bayes and LSTM for this

sake.

[9] surveyed the recent works in this domain and discussed the

challenges and the opportunities for improvements.

[10] proposed the use of multi-layer Bi-LSTM for fake news

detection.

[11]adopted the use of sequential models, specifically

recurrent neural network (RNN) architecture, to detect rumors

in microblogging platforms by finding data temporal

dependencies.

[12] proposed the use of convolutional neural network (CNN)

and argued on its importance and efficiency in the domain of

fake speech recognition. As part of their work, they used both

content-based features and user-based features.

As for Arabic fake news detection, a few works were proposed

in this direction. For example, [13] proposed a multi-modal

fake news detection in Arabic language. They used

MARBERTv2 for textual feature extraction and a

combination of VGG-19 and RESNET50 for visual feature

extraction. In their experimental results, textual features

proved their efficiency in this task.

[14] proposed a novel rumor detection for Arabic language.

They compared ARABERT and MARBERT for this task.

[15] Introduced a novel dataset for fake news training. The

dataset was related to Covid_19 and it was extracted from

facebook and twitter. In their work, they also compared the

performance of two pretrained models; BERT and

ARABERT. The experimental results showed that

ARABERT outperformed BERT.

[16] provided analysis on the challenges facing this issue in

Arabic language. They found that the most studied platform in

Arabic language was Twitter, recommending more studies on

other platforms. Moreover, they recommended more works on

dialects.

[17] provided a dataset composed of real and fake news for

training purposes In the field of Covid-19. They argued on the

importance of detecting such fake news due to their negative

effect in changing public opinion.

[18] proposed the use of BERT-based method in detecting

fake news. The proposed method proved its efficiency in

detecting fake news in Arabic language.

[19] proposed the first large dataset in this domain. The dataset

was composed of around 600000 articles. They used both

traditional machine learning and deep learning methods.

[20] proposed the use of text analysis in order to detect fake

news.

III. METHODOLOGY

From the literature review, it was clear that despite having

several works proposed in the direction of English fake news

detection, only a few works were proposed to improve Arabic

fake news detection. Furthermore, these works did not give

much attention to the optimization of feature selection. Some

works showed the superiority of deep learning models such

as[14][15], however, this efficiency has its own cost. Deep

learning models are very complex, require much more training

time, and calculation power. Therefore, this motivates our

work to find an efficient feature selection method for fake

Arabic news detection.

In this work, we optimize one of the well known feature

selection methods; namely chi square. The classification

performance using various classifiers is tested after all the

combinations of the aforementioned methods. These results

are compared with the classification without using

preprocessing.

In general, the preprocessing is composed of several tasks

including the following:

1- Tokenization: which splits the text into words.

2- Stopword Removal: which removes non-important

terms.

3- Normalization: which unifies the same word that is

written In different forms by removing Harakat or

Hamza for example.

4- Stemming: to find the stem of each term.

5- Feature selection: to select part of the terms instead of

using all the terms.

As stated earlier, we are comparing the performance of

various classifiers based on various combinations of feature

selection. As for the classifiers, we use SVM, Naïve Bayes,

Logistic Regression, KNN, Random Forest, and AraBERT.

As for the and for the feature selection, we use chi square.

The results of classification are provided in the experimental

section.

IV. EXPERIMENTS

A. Data set

The used dataset is from [21]. It is composed of 606912

posts from 134 different sources. Misbar was used to

annotate the data into credible, not credible, and not

sure. In our experiments, the first two labels was used,

whereas the system had a high confidence in the

classification process. We used a balanced subset of the

dataset composed of 30000 fake news post and 30000

real news post.

B. Evaluation Measurements

As for the evaluation measurements, we adopted the use of

recall, precision, and the F1 measurement. They are defined

as follows.

• Recall: it is defined as the ratio of true positive over

true positive and false negative. It measures the

ability of the classifier to correctly recognise all

those records of the target label.

• Precision: It is defined as the ration of the true

positive over true positive and false positive. It

measures the probability of incorrectly assigning a

record to the target label.

• F1 Measurement: It is used to measure the the

harmonic mean between the recall and the precision

and is given in Equation 1.

𝐹1 = 2∗𝑅∗𝑃

𝑅+𝑃 (1)

C. Experimental Settings

For our experiments, we used an Intel® Core™ i7_8550U 1.8

GHz CPU and 16GB RAM, with Microsoft Windows 10

Operating System. Also, we used Python Jupyter notebook

for the implementations of the classifiers.

D. The Compared Classifiers

The following subsections provide more information about

the compared classifiers.

• Support Vector Machines

This classifier has been used widely in the literature

due to its superior accuracy and relatively fast

learning time. It uses the support vectors as a

discriminative tool to find the best hyperplane

between the given classes. SVM can be either linear

or nonlinear based on the type of the hyperplane.

• Logistic Regression

This classifier uses regression concept to predicts the

probability of the positive label given the input data. It

is considered the base of the artificial neural network

classifier.

• K Nearest Neighbor

This classifier belongs to the lazy classifiers as it does

not learn a model. Instead, whenever a testing record

arrives, it assigns the label of the closest k training

records.

• Random Forest

This classifier has gained wide attention in the

literature due to its relatively high accuracy. It uses the

concept of bootstrapping to generate several datasets

from the original one, and uses bagging to build

different tree models from these datasets. Finally, it

uses the major voting to find the final decision.

• Naïve Bayes

This classifier finds the probability for each label based

on the given data. For this sake, it uses bayes theorem

that provides the posterior probability. This classifier

assumes that features are independent. This justifies

the name of the classifier. It is well known for its very

fast training time even when the data size is large.

E. Experimental Results and Discussion

As for the dataset, and as mentioned earlier, we used a

balanced dataset of 30000 records for each of the two labels

(fake, real). Next, we removed stopwords and applied data

normalization by removing the punctuations, duplicate

letters, and all the harakat. This step would unify the

appearance of the same term. Next, we applied TF.IDF

vectorizer to find the weight matrix for each term in each

post. We selected the first 10,000 features and removed

features that appeared in less than three documents. Next is

to optimize the classifiers as optimization plays a vital role in

the results. In order to optimize the classifiers, we used

GridSearchCV in python for the optimization. We used cross

validation with 10 folds as it has been used widely In the

literature and proved its efficiency.

TABLE I. TRAINING AND CLASSIFICATION TIME OF VARIOUS

CLASSIFIERS USING 512 USERS

Classifier

Best Parameters

SVM

C = 0.1, Kernel = RBF, gamma = 1

Logistic

Regression

C = 0.5

Naïve Bayes

N_estimators = 100

KNN

K = 10

AraBERT

Default

After the optimization process, we compared the classifiers

using their best parameters according to their performance in

detecting Arabic fake news. The optimized parameters are

provided in Table 1. The results of applying the classification

methods is provided in Table II. It is noted that these results

are without using feature selection. These are the baseline

results.

TABLE II. ACCURACIES OF VARIOUS CLASSIFIERS USING WITHOUT

FEATURE SELECTION

Classifier

Recall

Precision

SVM

0.94

0.96

0.95

Logistic

Regression

0.93

Naïve Bayes

0.86

0.93

KNN

0.88

AraBERT

0.96

0.98

0.97

From Table II, it can be obviously noted that deep learning

model AraBERT outperformed traditional machine learning

methods, with an F1 reached 0.97. The best performance

among traditional machine learning methods was for SVM.

Both results were expected as deep learning and SVM

proved their high performance in the literature. The worst

performances were for KNN and NB as the former does not

learn a model and the latter merely depends on a probability

model.

Next, we conducted a set of experiments to optimize the chi

square feature selection method by finding the best number

of features. It is well known that different number of

features can lead to different F1 score in the classification

phase. Therefore, we used several values for the number of

features ranging from 200 to 1200. For each value, we

performed feature selection and classification using SVM

and found the F1 score. The results are provided as follows.

TABLE III. OPTIMIZING THE SELECTED NUMBER OF FEATURES FOR CHI

SQUARE BY FINDING F1 FOR SVM CLASSIFIER USING DIFFERENT VALUES

Feature

Selection

Method

200

400

600

800

1000

1200

Chi

Squared

0.871

0.9

0.929

0.946

0.958

0.953

From the table, it can be noted that the F1 score tends to

increase exponentially at the beginning and becomes more

stable with larger number of features. We noted that at 1000

features, the performance outperformed the baseline SVM,

with F1 score of 0.958. This performance starts to degrade

later on. The surge of performance can be due to the

elimination of noisey columns that existed in the baseline

SVM. When eliminated, the best performance was attained.

However, when more features were added, which tended to

be more noisy, the performance started to degrade. Table IV

compares the baseline SVM with both the SVM after feature

selection and AraBERT.

TABLE IV. COMPARING THE BASELINE PERFORMANCE WITH THE

OPTIMIZED PERFORMANCE USING FEATURE SELECTION

Classifier

Recall

Precision

AraBERT

0.96

0.98

0.97

SVM

Baseline

0.94

0.96

0.95

SVM + Chi

0.943

0.969

0.958

It is clear from the table that the best performance was for

AraBERT deep learning method. However, this method has

a high computational cost training time. The difference in

F1 score between optimized SVM after feature selection and

AraBERT was1.5%. However, this difference is not the only

factor that must be considered. Although the accuracy of

optimized SVM is less than that of deep learning, the

optimized SVM has much improved training time than that

of AraBERT. Therefore, it is up to the domain to select the

best track to conduct. If accuracy is needed regardless of the

model complexity nor the training time, deep learning

would be the first golden option. If the model complexity

and the training time are key factors for the domain, feature

selection would provide a much improvement in training

time and model complexity with a slight decrease in

accuracy. Therefore, despite the importance of further

improving deep learning methods, it is equally important to

shed more light on optimizing simple models to gain the

optimal performance.

V. CONCLUSIONS AND FUTURE WORKS

In this work, we proposed an optimized fake news

classification method for Arabic text. Experimental work

showed that optimizing feature selection can improve the

performance of fake news classification in comparison with

no feature selection, and such performance can be close to

that of deep learning methods with much improvement in

model complexity and training time.

Future work can be conducted optimize other parts of the

preprocessing phases. Furthermore, more studies are needed

to provide more Arabic fake news datasets and to direct more

works toward the detection of such important issue.

REFERENCES

[1] X. Jose, S.M. Kumar, & P. Chandran, Characterization,

Classification and Detection of Fake News in Online Social

Media Networks. In 2021 IEEE Mysore Sub Section

International Conference (MysuruCon) ,pp. 759-765, 2021.

[2] K. Sharma, F. Qian, H. Jiang, N. Ruchansky, M. Zhang & Y.

Liu, “Combating fake news: A survey on identification and

mitigation techniques,” ACM Transactions on Intelligent

Systems and Technology (TIST), vol 10, no 3, 1-42, 2019.

[3] J. Ma, W. Gao, Z. Wei, Y. Lu, & K. F. Wong, “Detect rumors

using time series of social context information on

microblogging websites,” In Proceedings of the 24th ACM

international on conference on information and knowledge

management, pp. 1751-1754, 2015.

[4] K. Shu, A. Sliva, S. Wang, J. Tang, & H. Liu, “Fake news

detection on social media: A data mining perspective,” ACM

SIGKDD explorations newsletter, vol. 19, no. 1, pp. 22-36,

2017.

[5] S. Kato, L. Yang, & D. Ikeda, “Domain Bias in Fake News

Datasets Consisting of Fake and Real News Pairs,” In 2022

12th International Congress on Advanced Applied Informatics

(IIAI-AAI) pp. 101-106, 2022.

[6] W. Yu, J. Ge, Z. Yang, Y. Dong, Y., Zheng, & H. Dai, “Multi-

domain Fake News Detection for History News Environment

Perception,” In 2022 IEEE 17th Conference on Industrial

Electronics and Applications (ICIEA), pp. 428-433, 2022.

[7] P. Wei, F. Wu, Y. Sun, H. Zhou, & X.Y. Jing, “Modality and

Event Adversarial Networks for Multi-Modal Fake News

Detection,” IEEE Signal Processing Letters, vol. 29, pp. 1382-

1386, 2022.

[8] D. ohera, et al. “A taxonomy of fake news classification

techniques: Survey and implementation aspects,” IEEE

Access, vol 10, pp. 30367-30394, 2022.

[9] X. Zhou & R. Zafarani. “A survey of fake news: Fundamental

theories, detection methods, and opportunities,” ACM

Computing Surveys (CSUR), vol. 53, no. 5,pp. 1-40, 2020.

[10] A. R. Merryton, & M. G. Augasta, “A Novel Framework for

Fake News Detection using Double Layer BI-LSTM,”. In 2023

5th International Conference on Smart Systems and Inventive

Technology (ICSSIT) , pp. 1689-1696, 2023.

[11] J. Ma, W. Gao, P. Mitra, S. Kwon, B.J. Jansen, K.F. Wong, M.

Cha “Detecting rumors from microblogs with recurrent neural

networks”, 3818, 2016.

[12] Y. Yang, L. Zheng, J. Zhang, Q. Cui, Z. Li, & P.S.Y. TI-CNN,

“Convolutional neural networks for fake news

detection,”.arXiv preprint arXiv:1806.00749, vol 2, no. 6,

2018.

[13] R. M. Albalawi, A. T. Jamal, A. O. Khadidos, & A. M.

Alhothali, “Multimodal Arabic Rumors Detection,”. IEEE

Access, vol. 11, pp. 9716-9730, 2023.

[14] N.O. Bahurmuz, G. A. Amoudi, F. Baothman, A. T. Jamal, H.

S. Alghamdi, & A. M. Alhothali, A. M. “Arabic Rumor

Detection Using Contextual Deep Bidirectional Language

Modeling,” IEEE Access, vol. 10, pp. 114907-114918, 2022.

[15] S. B. Ali, Z. Kechaou, & A. Wali, A., “Arabic fake news

detection in social media Based on AraBERT,” In 2022 IEEE

21st International Conference on Cognitive Informatics &

Cognitive Computing (ICCI* CC) pp. 214-220, 2022.

[16] H. Rahab, A. Zitouni, & M. Djoudi, “Arabic Fake News and

Spam Handling: Methods, Resources and Opportunities,”

In 2021 International Conference on Artificial Intelligence for

Cyber Security Systems and Privacy (AI-CSP) pp. 1-7,2021

[17] D. Mohdeb, M. Laifa, & M. Naidja, “An Arabic Corpus for

Covid-19 related Fake News,” In 2021 International

Conference on Recent Advances in Mathematics and

Informatics (ICRAMI) , pp. 1-5, 2021.

[18] W. Shishah, “JointBert for Detecting Arabic Fake

News,” IEEE Access, vol. 10, pp. 71951-71960, 2022.

[19] A. Khalil, M. Jarrah, M. Aldwairi, and Y. Jararweh, “Detecting

arabic fake news using machine learning,” In 2021 Second

International Conference on Intelligent Data Science

Technologies and Applications (IDSTA) , pp. 171-17, 2021.

[20] H.T. Himdi, & F. Y. Assiri, “Development of Classification

Model based on Arabic Textual Analysis to Detect Fake News:

Case Studies,” In 2023 1st International Conference on

Advanced Innovations in Smart Cities (ICAISC), pp. 1-6,

2023.

[21] A. Khalil, M. Jarrah, and M. Aldwairi, Arabic Fake News

Dataset (AFND), Accessed May 2023.

Improving Prediction of Arabic Fake News Using ELMO’s Features Based Tri-Ensemble Model and LIME XAI

Article

Full-text available

Jan 2024

Turki Aljrees

The proliferation of fake news poses a substantial and persistent threat to information integrity, necessitating the development of robust detection mechanisms. In response to this challenge, this research specifically focuses on the detection of Arabic fake news, employing a sophisticated approach that leverages textual features and a powerful stacking classifier. The proposed model ingeniously combines bagging, boosting, and baseline classifiers, strategically harnessing the unique strengths of each to create a resilient ensemble. Through a series of extensive experiments and the integration of Embeddings from Language Models (ELMO) word embedding, the proposed approach achieves remarkable results in the realm of Arabic fake news detection. The model’s effectiveness is further heightened by the utilization of advanced stacking techniques, coupled with meticulous textual feature extraction. This capability enables the model to effectively distinguish between real and fake news in Arabic, highlighting its potential to enhance the accuracy of information. The findings of this study hold significant implications for the field of fake news detection, especially within the context of the Arabic language. The proposed model emerges as a valuable tool, contributing to the enhancement of information veracity and fostering a more informed public discourse in the face of misinformation challenges. Furthermore, the excellence of the proposed model is substantiated by its outstanding performance metrics, boasting a 99% accuracy, precision, recall, and F-score. This substantiation is underscored through a comprehensive performance comparison with other state-of-the-art models, affirming the model’s reliability in the domain of Arabic fake news detection.

Performance analysis of semantic veracity enhance (SVE) classifier for fake news detection and demystifying the online user behaviour in social media using sentiment analysis

Article

Full-text available

Jan 2024

The increased propagation of fake news is the significant concern in the digital era. Identification of fake news from social media platforms is critical to strengthen public trust and ensure social stability. This research presents an effective and accurate framework for identifying fake news that combines different steps of natural language processing (NLP) technique along with a neural network architecture. A novel semantic veracity enhancement (SVE) classifier is designed and implemented in this work for detecting fake news. The proposed approach leverages the effectiveness of sentiment analysis for identifying misleading or deceptive content and its subsequent implications on the sentiment and behaviour of social media users. A BERT model is used in this research for analysing the sentiments and classifying the texts from the social media platform. By examining the sentiments, the SVE classifier differentiates between real news and fabricated content. To achieve this, three different datasets comprising both actual content and fabricated (tweaked) tweets are employed for training the SVE classifier. The potentiality of the SVE classifier is evaluated and compared with different optimization techniques. The outcome of the experimental analysis shows that the proposed approach exhibits an excellent performance in terms of classifying misinformation from the original information with an outstanding accuracy of 99% compared to other state of art methods.

Deep Neural Networks in Social Media Forensics: Unveiling Suspicious Patterns and Advancing Investigations on Twitter

Conference Paper

Oct 2023

Arabic Fake News Detection in Social Media Context Using Word Embeddings and Pre-trained Transformers

Article

Apr 2024
ARAB J SCI ENG

The quick spread of fake news in different languages on social platforms has become a global scourge threatening societal security and the government. Fake news is usually written to deceive readers and convince them that this false information is correct; therefore, stopping the spread of this false information becomes a priority of governments and societies. Building fake news detection models for the Arabic language comes with its own set of challenges and limitations. Some of the main limitations include 1) lack of annotated data, 2) dialectal variations where each dialect can vary significantly in terms of vocabulary, grammar, and syntax, 3) morphological complexity with complex word formations and root-and-pattern morphology, 4) semantic ambiguity that make models fail to accurately discern the intent and context of a given piece of information, 5) cultural context and 6) diacrasy. The objective of this paper is twofold: first, we design a large corpus of annotated fake new data for the Arabic language from multiple sources. The corpus is collected from multiple sources to include different dialects and cultures. Second, we build fake detection by building machine learning models as model head over the fine-tuned large language models. These large language models were trained on Arabic language, such as ARBERT, AraBERT, CAMeLBERT, and the popular word embedding technique AraVec. The results showed that the text representations produced by the CAMeLBERT transformer are the most accurate because all models have outstanding evaluation results. We found that using the built deep learning classifiers with the transformer is generally better than classical machine learning classifiers. Finally, we could not find a stable conclusion concerning which model works well with each text representation method because each evaluation measure has a different favored model.

Context-Aware and Click Session-Based Graph Pattern Mining With Recommendations for Smart EMS Through AI

Article

Full-text available

Jun 2023

In the field of Artificial Intelligence (AI), Smart Enterprise Management Systems (Smart EMS) and big data analytics are the most prominent computing technologies. A key component of the Smart EMS system is E-commerce, especially Session-based Recommender systems (SRS), which are typically utilized to enhance the user experience by providing recommendations analyzing user behavior encoded in browser sessions. Also the work of the recommender is to predict users’ next actions (click on an item) using the sequence of actions in the current session. Current developments in session-based recommendation have primarily focused on mining more information accessible within the current session. On the other hand, those approaches ignored sessions with identical context for the current session that includes a wealth of collaborative data. Therefore this paper proposed Context-aware and Click session-based graph pattern mining with recommendations for Smart EMS through AI. It employs a novel Triple Attentive Neural Network (TANN) for SRS. Specifically, TANN contains three main components, i.e., Enhanced Sqrt-Cosine Similarity based Neighborhood Sessions Discovery (NSD), Frequent Subgraph Mining (FSM) using Neighborhood Click session-based graph pattern mining and Top-K possible Next-clicked Items Discovery (TNID). The NSD module uses a session-level attention mechanism to find m most similar sessions of the query session, and the FSM module also extracts the frequent subgraphs from the already discovered m most similar sessions of the query session via item-level attention. Then, TNID module is used to discover the top-K possible next-clicked items using the NSD and FSM module via a target-level attention. Finally, we perform comprehensive experiments on one big dataset, DIGINETICA, to verify the effectiveness of the TANN model, and the results of this experiment clearly illustrate the performance of TANN.

Arabic fake news detection in social media Based on AraBERT

Conference Paper

Full-text available

Dec 2022

Multimodal Arabic Rumors Detection

Article

Full-text available

Jan 2023

Recently, the use of social media platforms has increased with ease of use and fast accessibility, making such platforms a place of rumor proliferation owing to a lack of posting constraints and content authentication. Therefore, there is a need to leverage Artificial intelligence techniques to detect rumors on social media platforms to prevent their adverse effects on society and individuals. Most existing works that detect rumors in Arabic only target the textual features of the tweet content. Nevertheless, tweets contain different types of content such as (text, images, videos, and URLs), and the visual features of tweets play an essential role in rumor diffusion. This study proposes an Arabic rumor detection model to detect rumors on Twitter from textual and visual image features through two types of multimodal fusion: early and late fusion. In addition, we leveraged the transfer learning of the pre-trained language and vision models. Different experiments were conducted to select the best textual and visual feature extractors for building a multimodal model. MARBERTv2 was used as a textual feature extractor, whereas the ensemble of VGG-19 and ResNet50 was used as a visual feature extractor to build a multimodal model. Subsequently, the language and vision models of the single models were used as a baseline to compare their results with those of the multimodal models. Finally, the experimental results demonstrate the effectiveness of textual features in rumor detection tasks compared with multimodal models.

Arabic Rumor Detection Using Contextual Deep Bidirectional Language Modeling

Article

Full-text available

Jan 2022

In today’s world, news outlets have changed dramatically; newspapers are obsolete, and radio is no longer in the picture. People look for news online and on social media, such as Twitter and Facebook. Social media contributors share information and trending stories before verifying their truthfulness, thus, spreading rumors. Early identification of rumors from social media has attracted many researchers. However, a relatively smaller number of studies focused on other languages, such as Arabic. In this study, an Arabic rumor detection model is proposed. The model was built using transformer-based deep learning architecture. According to the literature, transformers are neural networks with outstanding performance in natural language processing tasks. Two transformers-based models, AraBERT and MARBERT, were employed, tested, and evaluated using three recently developed Arabic datasets. These models are extensions to the BERT, Bidirectional Encoder Representations from Transformers, a deep learning model that uses transformer architecture to learn the text representations and leverages the attention mechanism. We have also mitigated the challenges introduced by the imbalanced training datasets by employing two sampling techniques. The experimental results of our proposed approaches achieved 0.97 accuracy. This result demonstrated the effectiveness of the proposed method and outperformed other existing Arabic rumor detection methods.

JointBert for Detecting Arabic Fake News

Article

Full-text available

Jan 2022

Wesam Shishah

The rapid rise in the use of social media platforms has resulted in a recent surge of fake rumours, particularly among Arab countries. Such false information could potentially be detrimental to individuals and society. Detecting and blocking the spread of the fraudulent news in Arabic is critical. Many artificial intelligence algorithms, including contemporary transformer models, such as BERT, have been employed to detect the fake news in the past. Therefore, the fake news in Arabic can be detected using a revolutionary combined BERT architecture implemented in this paper. Extensive experiments were conducted to test the technique on real-world Arabic fake news datasets. In two of the fake news datasets, covid19fakes and Satirical, the suggested technique had a higher accuracy score than the current state-of-the-art Arabic fake news model. A comparable result can be achieved in other datasets; however, the proposed strategy fails to do so. All datasets except AraNews show an average F1 score improvement of 10% by implementing the proposed strategy. It was found that the proposed method was effective and superior to numerous other baselines of Arabic fake news detection models.

Development of Classification Model based on Arabic Textual Analysis to Detect Fake News: Case Studies

Conference Paper

Jan 2023

A Novel Framework for Fake News Detection using Double Layer BI-LSTM

Conference Paper

Jan 2023

Fake news is an information that has been carefully manipulated to mislead readers by using false facts and figures. Since the introduction of the Internet and social media, fake news has grown to be a significant problem. Identifying fake news has become an important area of research in Natural Language Processing (NLP). The key challenge is determining the veracity of news stories. There is an increasing difficulty in studying and designing a technological strategy to combat fake news without compromising speed and collaborative access to high-quality information. Despite the fact that various technologies have been developed to assist in the detection of false news, and despite significant breakthroughs, identifying fake news stays ineffective. In this research, a new framework has been proposed that utilizes Porter Stemmer, TF-IDF vectorizer for pre-processing and double layer Bi-LSTM for extracting the refined features to obtain better learning. In this model, initially, the summarized input vector is formed by concatenating the most relevant text attributes such as headlines, news for further process. The performance of the proposed model has been justified by evaluating its performance on three experimental datasets namely Kaggle fake_real_news, Liar and Politifact Fake_Real.

Multi-domain Fake News Detection for History News Environment Perception

Conference Paper

Dec 2022

Domain Bias in Fake News Datasets Consisting of Fake and Real News Pairs

Conference Paper

Jul 2022

Modality and Event Adversarial Networks for Multi-Modal Fake News Detection

Article

Jan 2022

With the popularity of news on social media, fake news has become an important issue for the public and government. There exist some fake news detection methods that focus on information exploration and utilization from multiple modalities, e.g., text and image. However, how to effectively learn both modality-invariant and event-invariant discriminant features is still a challenge. In this paper, we propose a novel approach named Modality and Event Adversarial Networks (MEAN) for fake news detection. It contains two parts: a multi-modal generator and a dual discriminator. The multi-modal generator extracts latent discriminant feature representations of text and image modalities. A decoder is adopted to reduce information loss in the generation process for each modality. The dual discriminator includes a modality discriminator and an event discriminator. The discriminator learns to classify the event or the modality of features, and network training is guided by the adversarial scheme. Experiments on two widely used datasets show that MEAN can perform better than state-of-the-art related multi-modal fake news detection methods.

Improving Arabic Fake News Detection Using Optimized Feature Selection

Abstract

Recommended publications

Fake News Detection in Low Resource Languages using SetFit Framework

Arabic User Requirements Classification Using Machine Learning

Classification of Arabic Software Requirements Using Machine Learning Techniques

A Novel Taxonomy for Arabic Fake News Datasets