Figure - available from: Social Network Analysis and Mining
This content is subject to copyright. Terms and conditions apply.
Top ten internet using languages

Top ten internet using languages

Source publication
Article
Full-text available
Recently, the world has witnessed an exponential growth of social networks which have opened a venue for online users to express and share their opinions in different life aspects. Sentiment analysis has become a hot-trend research topic in the field of natural language processing due to its significant roles in analyzing the public’s opinion and d...

Similar publications

Article
The rapid proliferation of user generated content has given rise to large volumes of text corpora. Increasingly, scholars, researchers, and organizations employ text classification to mine novel insights for high-impact applications. Despite their prevalence, conventional text classification methods rely on labor-intensive feature engineering effor...

Citations

... The highest accuracy of 96% and recall of 100% is achieved by GBM. Recent analysis show that deep learning models tends to outperform traditional sentiment analysis model [24][25][26][27][28][29]. Dang et al. [8] experimented on deep NN, CNN and LSTM models alternatively with word embedding and TF-IDF on eight different dataset and highest accuracy of 90.45% was obtained using word embedding and RNN on Tweets Airline dataset. ...
Preprint
Full-text available
In the contemporary era, social media platforms amass an extensive volume of social data contributed by their users. In order to promptly grasp the opinions and emotional inclinations of individuals regarding a product or event, it becomes imperative to perform sentiment analysis on the user-generated content. Microblog comments often encompass both lengthy and concise text entries, presenting a complex scenario. This complexity is particularly pronounced in extensive textual content due to its rich content and intricate word interrelations compared to shorter text entries. Sentiment analysis of public opinion shared on social networking websites such as Facebook or Twitter has evolved and found diverse applications. However, several challenges remain to be tackled in this field. The hybrid methodologies have emerged as promising models for mitigating sentiment analysis errors, particularly when dealing with progressively intricate training data. In this article, to investigate the hesitancy of COVID-19 vaccination, we propose eight different hybrid deep learning models for sentiment classification with an aim of improving overall accuracy of the model. The sentiment prediction is achieved using embedding, deep learning model and grid search algorithm on Twitter COVID-19 dataset. According to the study, public sentiment towards COVID-19 immunization appears to be improving with time, as evidenced by the gradual decline in vaccine reluctance. Through extensive evaluation, proposed model reported an increased accuracy of 98.86%, outperforming other models. Specifically, the combination of BERT, CNN and GS yield the highest accuracy, while the combination of GloVe, BiLSTM, CNN and GS follows closely behind with an accuracy of 98.17%. In addition, increase in accuracy in the range of 2.11% to 14.46% is reported by the proposed model in comparisons with existing works.
... Alayba et al. (2018) investigated the integration of CNN and LSTM networks for ASA and observed a rise in the precision of ASA when analysing multiple datasets. Furthermore, Ombabi et al. (2020) proposed a deep learning model for Arabic Sentiment Analysis (ASA) that utilizes a single-layer CNN for extracting local features and a two-layer LSTM for preserving long-term dependencies. The output of the CNN and LSTM are fed into an SVM classifier, which produces the final classification result. ...
Article
Full-text available
In the last few years, various topics, including sports, have seen social media platforms emerge as significant sources of information and viewpoints. Football fans use social media to express their opinions and sentiments about their favourite teams and players. Analysing these opinions can provide valuable information on the satisfaction of football fans with their teams. In this article, we present Soutcom, a scalable real‐time system that estimates the satisfaction of football fans with their teams. Our approach leverages the power of social media platforms to gather real‐time opinions and emotions of football fans and applies state‐of‐the‐art machine learning‐based sentiment analysis techniques to accurately predict the sentiment of Arabic posts. Soutcom is designed as a cloud‐based scalable system integrated with the X (formerly known as Twitter) API and a football data service to retrieve live posts and match data. The Arabic posts are analysed using our proposed bidirectional LSTM (biLSTM) model, which we trained on a custom dataset specifically tailored for the sports domain. Our evaluation shows that the proposed model outperforms other machine learning models such as Random Forest, XGBoost and Convolutional Neural Networks (CNNs) in terms of accuracy and F1‐score with values of 0.83 and 0.82, respectively. Furthermore, we analyse the inference time of our proposed model and suggest that there is a trade‐off between performance and efficiency when selecting a model for sentiment analysis on Arabic posts.
... An integrated FastText over SVC and LR classifiers (Altowayan, 2017). A joint model based on CNN, LSTM, and SVM classifier (Ombabi et al., 2020). A Multi-Channel Embedding Convolutional Neural Network Model (Dahou et al., 2019). ...
... As detailed in Table 9 and Fig. 6, Deep Conv-ABiLSTM consistently outperformed all the other baseline methods. Noteworthy, Deep Conv-ABiLSTM achieved significant improvement with 93.73% of accuracy over the CNN architecture reported in (Alayba et al., 2018b) which achieved 88.10% of accuracy, and the CNN-LSTM architecture reported in (Ombabi et al., 2020) which achieved 88.52%. ...
... Using the ASTD benchmark, Deep Conv-ABiLSTM achieved 92.21% of accuracy which provide + 12.96% of accuracy improvement. Notably, Deep Conv-ABiLSTM outperformed the CNN-LSTM based models reported in Dahou et al. (2019) and Ombabi et al. (2020) with + 7.68% and + 2.49% accuracy improvements respectively, this compression demonstrates the significant influence of Bi-LSTM and attention layer on the context information processing. Meanwhile, Deep Conv-ABiLSTM achieved the best performance against the other approaches on this dataset. ...
Article
Full-text available
With the enormous growth of social data in recent years, sentiment analysis has gained increasing research attention and has been widely explored in various languages. Arabic language nature imposes several challenges, such as the complicated morphological structure and the limited resources, Thereby, the current state-of-the-art methods for sentiment analysis remain to be enhanced. This inspired us to explore the application of the emerging deep-learning architecture to Arabic text classification. In this paper, we present an ensemble model which integrates a convolutional neural network, bidirectional long short-term memory (Bi-LSTM), and attention mechanism, to predict the sentiment orientation of Arabic sentences. The convolutional layer is used for feature extraction from the higher-level sentence representations layer, the BiLSTM is integrated to further capture the contextual information from the produced set of features. Two attention mechanism units are incorporated to highlight the critical information from the contextual feature vectors produced by the Bi-LSTM hidden layers. The context-related vectors generated by the attention mechanism layers are then concatenated and passed into a classifier to predict the final label. To disentangle the influence of these components, the proposed model is validated as three variant architectures on a multi-domains corpus, as well as four benchmarks. Experimental results show that incorporating Bi-LSTM and attention mechanism improves the model’s performance while yielding 96.08% in accuracy. Consequently, this architecture consistently outperforms the other State-of-The-Art approaches with up to + 14.47%, + 20.38%, and + 18.45% improvements in accuracy, precision, and recall respectively. These results demonstrated the strengths of this model in addressing the challenges of text classification tasks.
... Completely automatic deep learning has revolutionized the domain of computer vision and image processing by enabling the efficient processing of vast quantities of visual data. These systems [21,31] have become state-of-the-art technology for tasks such as image acquisition, processing, and analysis. Thanks to their ability to process large numbers of images rapidly, they have found applications in a variety of fields, from medical diagnostics to autonomous vehicles. ...
Article
Full-text available
The detection of traffic signs in the natural scene requires a great deal of information due to the variations it undergoes over time for the output of the detection model to be effective and relevant. The amount of annotated data can range from a few hundred to thousands or even millions of examples, the availability of labeled data is often a critical bottleneck in the development and deployment of deep learning models, and acquiring high-quality annotations can be time-consuming and costly. Unlabeled data is easy to acquire but expensive to annotate. Several works focus on the use of active learning as an approach to get rid of the problem of costly annotation and computation time. To address these difficulties, we propose a Semi-Automatic Deep Image Annotation system using a new Evolutionary Strategy for Reinforcement Learning (SADIA-ESRL). Our experiments demonstrate remarkable efficiency on Natural Scene Traffic Sign and panel guide Arabic-Latin Text Dataset (NaSTSArLaT). The annotation approach studied has allowed for annotating only 1/4 of the images without compromising the model’s efficiency. This reduction in annotation effort is accompanied by significant time savings, with the labeling process now taking as little as 1/5 of the initial time. Furthermore, this strategy grants us the capability to selectively annotate instances, ensuring optimal performance in the used detection model.
... Multilingual BERT was used in, [5,13,24]. As for the MARBERT model, it was employed in [13,24,38,41]. Furthermore, the AraBERT model was investigated by [13,24,33,37,42,43] as mentioned in the Table 2. ...
... To have a better observation, we proposed Table 3, presenting all the overviewed models and techniques with the best result performance. Glove [2,9,40] 94,80% (Accuracy) MUSE [31] 70% (Accuracy) ArWordVec [29] 88,56%(F1 score) Sense2Vec [26] 89,4% (Accuracy) AraBERT [13,24,33,37,43] 93,8% (Accuracy) QARIB [24,28,31] 95,4% (Accuracy) MARBERT [13,24,38,41] 95,2% (Accuracy) mBERT [13,24,40] 95,7% (Accuracy) CAMeLBERT [24] 95,6% (Accuracy) XLM [24] 95,1% (Accuracy) AraELECTRA [24] 95,2% (Accuracy) ALBERT [24] 94,5% (Accuracy) Flair [37] 54.15% (Accuracy) ...
Article
Full-text available
Feature extraction has transformed the field of Natural Language Processing (NLP) by providing an effective way to represent linguistic features. Various techniques are utilised for feature extraction, such as word embedding. This latter has emerged as a powerful technique for semantic feature extraction in Arabic Natural Language Processing (ANLP). Notably, research on feature extraction in the Arabic language remains relatively limited compared to English. In this paper, we present a review of recent studies focusing on word embedding as a semantic feature extraction technique applied in Arabic NLP. The review primarily includes studies on word embedding techniques applied to the Arabic corpus. We collected and analysed a selection of journal papers published between 2018 and 2023 in this field. Through our analysis, we categorised the different feature extraction techniques, identified the Machine Learning (ML) and/or Deep Learning (DL) algorithms employed, and assessed the performance metrics utilised in these studies. We demonstrate the superiority of word embeddings as a semantic feature representation in ANLP. We compare their performance with other feature extraction techniques, highlighting the ability of word embeddings to capture semantic similarities, detect contextual associations, and facilitate a better understanding of Arabic text. Consequently, this article provides valuable insights into the current state of research in word embedding for Arabic NLP.
... Ombabi, Abubakr H., et al. [24] introduced a unique deep learning approach for Arabic-based language sentiment classification built upon a one-layer CNN Model for localized extraction of features and a two-layer LSTM architecture for maintaining long-term relationships. The SVM classifier creates the final category from the feature maps learned by CNN and LSTM. ...
Article
Full-text available
Aspect-oriented extraction involves the identification and extraction of specific aspects, features, or entities within a piece of text. Traditional methods often struggled with the complexity and variability of language, leading to the exploration of advanced deep learning approaches. In the realm of sentiment analysis, the conventional approaches often fall short when it comes to providing a nuanced understanding of sentiments expressed in textual data. Traditional sentiment analysis models often overlook the specific aspects or entities within the text that contribute to the overall sentiment. This limitation poses a significant challenge for businesses and organizations aiming to gain detailed insights into customer opinions, product reviews, and other forms of user-generated content.In this research, we propose an innovative approach for aspect-oriented extraction and sentiment analysis leveraging optimized hybrid deep learning techniques. Our methodology integrates the powerful capabilities of deep learning models with the efficiency of Reptile Search Optimization. Furthermore, we introduce an advanced sentiment analysis framework employing the state-of-the-art Extreme Gradient Boosting Algorithm. The fusion of these techniques aims to enhance the precision and interpretability of aspect-oriented sentiment analysis. The proposed approach first utilizes deep learning architectures to extract and comprehend diverse aspects within textual data. Through the incorporation of Reptile Search Optimization, we optimize the learning process, ensuring adaptability and improved model generalization across various datasets. Subsequently, the sentiment analysis phase employs the robust Extreme Gradient Boosting Algorithm, known for its effectiveness in handling complex relationships and patterns within data. Our experiments, conducted on diverse datasets, demonstrate the superior performance of the proposed methodology in comparison to traditional approaches. The optimized hybrid deep learning approach, coupled with the Reptile Search Optimization and Extreme Gradient Boosting Algorithm, showcases promising results in accurately capturing nuanced sentiments associated with different aspects. This research contributes to the advancement of aspect-oriented sentiment analysis techniques, offering a comprehensive and efficient solution for understanding sentiment nuances in textual data across various domains. The ResNet 50 and EfficientNet B7 architecture of the modified pre-trained model is proposed for the aspect extraction function. The Reptile Search Optimization based Extreme Gradient Boosting Algorithm (RSO-EGBA) is proposed to analyze and predict customer sentiments. The execution of this study is carried out using python software. It has been observed that the overall accuracy of our proposed method is 99.8%, while that of the other state-of-the-art. The overall accuracy of our proposed method shows an increment of 9–16% from that of the state-of-the-art methods.
... Multilingual BERT was used in, [5,13,24]. As for the MARBERT model, it was employed in [13,24,38,41]. Furthermore, the AraBERT model was investigated by [13,24,33,37,42,43] as mentioned in the Table 2. ...
... To have a better observation, we proposed Table 3, presenting all the overviewed models and techniques with the best result performance. Glove [2,9,40] 94,80% (Accuracy) MUSE [31] 70% (Accuracy) ArWordVec [29] 88,56%(F1 score) Sense2Vec [26] 89,4% (Accuracy) AraBERT [13,24,33,37,43] 93,8% (Accuracy) QARIB [24,28,31] 95,4% (Accuracy) MARBERT [13,24,38,41] 95,2% (Accuracy) mBERT [13,24,40] 95,7% (Accuracy) CAMeLBERT [24] 95,6% (Accuracy) XLM [24] 95,1% (Accuracy) AraELECTRA [24] 95,2% (Accuracy) ALBERT [24] 94,5% (Accuracy) Flair [37] 54.15% (Accuracy) ...
Article
Full-text available
Feature extraction has transformed the field of Natural Language Processing (NLP) by providing an effective way to represent linguistic features. Various techniques are utilised for feature extraction, such as word embedding. This latter has emerged as a powerful technique for semantic feature extraction in Arabic Natural Language Processing (ANLP). Notably, research on feature extraction in the Arabic language remains relatively limited compared to English. In this paper, we present a review of recent studies focusing on word embedding as a semantic feature extraction technique applied in Arabic NLP. The review primarily includes studies on word embedding techniques applied to the Arabic corpus. We collected and analysed a selection of journal papers published between 2018 and 2023 in this field. Through our analysis, we categorised the different feature extraction techniques, identified the Machine Learning (ML) and/or Deep Learning (DL) algorithms employed, and assessed the performance metrics utilised in these studies. We demonstrate the superiority of word embeddings as a semantic feature representation in ANLP. We compare their performance with other feature extraction techniques, highlighting the ability of word embeddings to capture semantic similarities, detect contextual associations, and facilitate a better understanding of Arabic text. Consequently, this article provides valuable insights into the current state of research in word embedding for Arabic NLP.
... SNs platforms are becoming more and more crucial than ever for the dissemination of ideas about everything [2,3]. On SNs, members can utilize a variety of social data formats to share and express their thoughts and experiences. ...
... These forms include textual data (like reviews, tweets, and comments), visual data (like liked and shared photos), and multimedia data (like sounds and movies) [3][4][5]. Sentiment analysis (SA) seeks to identify a group of people's views about a certain issue on one or more social media platforms [5]. For decision-makers, corporate executives, and others, understanding public opinions and concerns voiced on these many platforms is a vital issue. ...
... Arabic language is among the languages widely utilized on SNs [3]. Around 422 million people speak Arabic natively or in one of its many dialects. ...
... Additionally, the researchers employed an SVM classifier for the final classification stage. The FastText word embedding model was utilized to further support their proposed model [33]. ...
Article
Full-text available
The rapid growth of social networking and micro-blogging websites has motivated researchers to analyze published content to identify and predict human behavior. With the ongoing growth in data volume, the efficient and effective extraction of valuable information has become crucial. Researchers are tackling this problem using big data analytics. However, most studies have focused on the English language and fewer research efforts have been devoted to the Arabic language. This paper covers the recent Arabic sentiment analysis research, highlighting the most important studies that analyzed content from social media to predict human behavior and the different approaches used. Sentiment analysis studies varied between lexicon-based, traditional machine learning-based, deep learning, and hybrid approaches, in addition to employing swarm intelligence to optimize the performance of text classification algorithms. The reviews showed that Naïve Bayes (NB) and Support Vector Machine (SVM) were the most widely used algorithms among traditional machine learning algorithms. The results also showed that using deep learning approaches achieves better accuracy than other approaches. Also, the use of optimization algorithms based on swarm intelligence had a significant impact on increasing the accuracy of text clustering.
... compared to standalone CNN and LSTM models. This significant performance can be related to the hybrid CNN-LSTM architecture, where the convolution layer extracts local features which are passed to LSTMs for sequence prediction [67]. The BiLSTM has the best performance among the baseline classifiers used in the experiment analysis with accuracy, precision, recall, and F1 score values of 76.31%, 76.62%, 75.79%, and 76.21% respectively. ...
Article
Full-text available
Sarcasm has a significant role in human communication especially on social media platforms where users express their sentiments through humor, satire, and criticism. The identification of sarcasm is crucial in comprehending the sentiment and the communication context on platforms like Twitter. This ambiguous nature of the expression of content presents the detection of sarcasm as a considerable challenge in natural language processing (NLP). The importance and challenges increase further, especially in languages like Urdu where resources for NLP are limited. The traditional rule-based approaches lack the desired performance because of the subtle and context-based nature of sarcasm. However, the recent advancements in NLP, particularly the transformer architecture-based large language models (LLMs) like BERT offer promising solutions. In this research, we have utilized a newly created Urdu sarcasm dataset comprising 12,910 tweets manually re-annotated into sarcastic and non-sarcastic classes. These tweets were derived from the public Urdu tweet dataset consisting of 19,995 tweets. We have established baseline results using deep learning classifiers comprising CNN, LSTM, GRU, BiLSTM, and CNN-LSTM. To comprehensively capture the contextual information, we propose a novel hybrid model architecture that integrates multilingual BERT (mBERT) embeddings with BiLSTM and multi-head attention (MHA) for Urdu sarcasm. The proposed mBERT-BiLSTM-MHA model demonstrates superior performance by achieving an accuracy of 79.51% and an F1 score of 80.04%, outperforming deep learning classifiers trained with fastText word embeddings.