Comparison between accuracy, precision, recall, and f 1 score of all learning models using the TF-IDF technique.

Source publication

FIGURE 5: Methodology diagram: Green color represent the data flow...

FIGURE 6: In this figure, confusion matrices of RF, LR, and AC are...

FIGURE 7: In this figure, confusion matrices of RF, LR, and AC are...

Classification of Shopify App User Reviews Using Novel Multi Text Features

Article

Full-text available

Feb 2020

App stores usually allow users to give reviews and ratings that are used by developers to resolve issues and make plans for their apps. In this way, these app stores collect large amounts of data for analysis. However, there are several challenges that must first be addressed, related to redundancy and the volume of data, by using machine learning....

The implemental procedure of proposed method

An example of project content description in Kickstarter.com

The key successful factors of video and mobile game crowdfunding projects using a lexicon-based feature selection approach

Article

Full-text available

Mar 2021

The emergence of crowdfunding has given many capital demanders a new fund-raising channel, but the overall project success rate is very low. Many scholars have begun to discover key suscessful factors of crowdfunding projects. Previous studies have used questionnaires survey to identify important project features. In addition to requiring a lot of...

Schematic block diagram of the proposed approach

C² search based learning algorithm for feature selection

B-HkNN algorithm for optimal classification results

Performance analysis of various attacks of CICIDS2017 dataset

Performance analysis of various attacks of ADFA-LD dataset

An Efficient Feature Selection for Intrusion Detection System Using B-HKNN and C2 Search Based Learning Model

Article

Full-text available

May 2022

With the emergence of big data era, the dimensions of data are enhanced exponentially and it becomes a difficult task to handle information of high dimensions in various sectors like text mining, machine learning and data analysis. Redundant and inappropriate feature enhances the complexities in dimensions that further results in poor performances....

Stacked Ensemble Feature Selection Method For Kannada Documents Categorization

Conference Paper

Full-text available

Nov 2023

In document-level text mining, feature selection is crucial for lowering ambiguity which in turn enhances classifier performance. The selection of the vital features is crucial, especially for the classification of documents in the morphologically rich Indian regional language Kannada. In this regard, the paper proposes Stacked Ensemble Feature Sel...

A survey on text classification and its applications

Article

Full-text available

Sep 2020

Text classification (a.k.a text categorisation) is an effective and efficient technology for information organisation and management. With the explosion of information resources on the Web and corporate intranets continues to increase, it has being become more and more important and has attracted wide attention from many different research fields....

Swarm Intelligence-Based Feature Selection for Multi-Label Classification: A Review

Article

Full-text available

Jun 2021

Multi-label classification is the process of specifying more than one class label for each instance. The high-dimensional data in various multi-label classification tasks have a direct impact on reducing the e ciency of traditional multi-label classifiers. To tackle this problem, feature selection is used as an effective approach to retain relevant...

A Web Application Fingerprint Recognition Method Based on Machine Learning

Article

Full-text available

Apr 2024
Comput Model Eng Sci

Web application fingerprint recognition is an effective security technology designed to identify and classify web applications, thereby enhancing the detection of potential threats and attacks. Traditional fingerprint recognition methods, which rely on preannotated feature matching, face inherent limitations due to the ever-evolving nature and diverse landscape of web applications. In response to these challenges, this work proposes an innovative web application fingerprint recognition method founded on clustering techniques. The method involves extensive data collection from the Tranco List, employing adjusted feature selection built upon Wappalyzer and noise reduction through truncated SVD dimensionality reduction. The core of the methodology lies in the application of the unsupervised OPTICS clustering algorithm, eliminating the need for preannotated labels. By transforming web applications into feature vectors and leveraging clustering algorithms, our approach accurately categorizes diverse web applications, providing comprehensive and precise fingerprint recognition. The experimental results, which are obtained on a dataset featuring various web application types, affirm the efficacy of the method, demonstrating its ability to achieve high accuracy and broad coverage. This novel approach not only distinguishes between different web application types effectively but also demonstrates superiority in terms of classification accuracy and coverage, offering a robust solution to the challenges of web application fingerprint recognition.

Product Helpfulness Detection With Novel Transformer Based BERT Embedding and Class Probability Features

Article

Full-text available

Apr 2024

Nowadays global market products are readily accessible worldwide, and a vast array of reviews across numerous platforms are posted daily in several categories, making it challenging for customers to stay informed about their product interests. To make informed decisions regarding product quality, users require access to reviews and ratings. Owners and managers must analyze customer ratings and the underlying emotional content of reviews to enhance the product’s quality, cost, customer service, and environmental impact. The primary aim of our proposed research is to accurately predict product helpfulness through customer reviews using the Large Language Model (LLM), thereby assisting customers in saving time and money. We employed a benchmark dataset, the Amazon Fine Food Reviews, to develop numerous advanced machine-learning techniques. We introduced a novel transformer approach BERF (BERT Random Forest) for feature engineering to enhance the value of user evaluations for Amazon’s gourmet food products. The BERF method utilizes BERT embeddings and class probability features derived from product helpfulness online reviews textual data. We have balanced the dataset using the Synthetic Minority Over-sampling TEchnique (SMOTE) approach. Our comprehensive study results demonstrated that the Light Gradient Boosting Machine (LGBM) strategy outperformed existing state-of-the-art approaches, achieving an accuracy of 98%. The performance of each method is confirmed using a k-fold method and further improved through hyperparameter optimization. Our innovative study employing a transformer model has significantly enhanced the utility of customer reviews, substantially reducing online product scams and preventing wasted time and money.

Detecting Thyroid Disease Using Optimized Machine Learning Model Based on Differential Evolution

Article

Full-text available

Jan 2024
INT J COMPUT INT SYS

Thyroid disease has been on the rise during the past few years. Owing to its importance in metabolism, early detection of thyroid disease is a task of critical importance. Despite several existing works on thyroid disease detection, the problem of class imbalance is not investigated very well. In addition, existing studies predominantly focus on the binary-class problem. This study aims to solve these issues by the proposed approach where ten types of thyroid diseases are considered. The proposed approach uses a differential evolution (DE)-based optimization algorithm to fine-tune the parameters of machine learning models. Moreover, conditional generative adversarial networks are used for data augmentation. Several sets of experiments are carried out to analyze the performance of the proposed approach with and without model optimization. Results suggest that a 0.998 accuracy score can be obtained using AdaBoost with DE optimization which is better than existing state-of-the-art models.

"To Stand with Ukraine is to stand with Humanity": Sentiment Analysis using Machine Learning with NLP

Article

Full-text available

Dec 2023

Dr. Yasmin Shaikh

Social media platforms and Microblogging sites can be used to gather public opinion and sentiment on a range of topics, including the current state of affairs in war-torn countries. During a crisis, Online Social Networks (OSNs) play a critical role in information sharing. The information gathered during such a crisis, public opinion and sentiments on a large scale can be reflected. Twitter, in particular, contains a significant quantity of geo tagged tweets, allowing for sentiment analysis over time and geography. The primary goal of this research study is to harness the power of social media to monitor, examine, and analyze public opinion on a recent "Russia's Invasion on Ukraine", as public opinion is crucial in forming government policy. By delving deeper into social media, one may readily study people's behavior on a variety of subjects and policies, which would be impossible to do otherwise using traditional sources. In this research paper, it is aimed to classify the viewpoint as Positive, Negative, or Neutral by using Machine learning techniques (Lexicon based) with Natural Language Processing (NLP). The findings of this study can assist various organizations and stockholders in improving their political strategies and commercial decision-making for current and future intents by utilizing social media networks as a valuable source of knowledge.

Research on Time-Aware Group Query Method with Exclusion Keywords

Article

Full-text available

Oct 2023
ISPRS

Aiming at the problem that the existing spatial keyword group query problem did not consider the query requirements with exclusion keywords and time attributes, a time-aware group query problem with exclusion keywords (TEGSKQ) is proposed for the first time. To solve this problem effectively, this paper proposes a query method based on the EKTIR-Tree index and dominating group (EKTDG). This method first proposes the EKTIR-tree index, which incorporates Huffman coding and integrates Bloom filters to deal with excluded keywords in order to improve the hit rate of keyword queries, significantly improving the query efficiency and reducing the storage occupancy. Then, the Candidate algorithm is proposed based on the EKTIR-tree index to filter out the spatial–textual objects that meet the query’s keywords and time requirements, narrowing the search space for subsequent queries on a large scale. To address the problem of the low efficiency of existing algorithms based on a spatial distance query, a distance-dominating group is defined and a pruning algorithm based on a spatial distance-dominating group is proposed, which is a refining process of query results and greatly improves the search efficiency of the query. Theoretical and experimental studies show that the proposed method can better handle group queries with exclusion keywords based on time awareness.

Wireless Capsule Endoscopy Bleeding Images Classification using CNN Based Model

Conference Paper

Full-text available

Oct 2023

Practice on Framework for Product Quality Analysis Based on User Feedback Data

Preprint

Full-text available

Oct 2023

Online products generate vast amounts of user feedback data, which has become crucial for companies to improve product quality and customer satisfaction. This paper proposes the FPQA-UFD (framework to analyze product quality based on user feedback data) using data mining algorithms, natural language processing, multi-classification methods, and statistical analysis, providing detailed data support for product development teams' decision-making. The framework effectively extracts information from user feedback, accurately dividing 305,311 user feedback data into 44 effective topics and extracting explanatory keywords. A multi-classification experiment achieved a classification accuracy and recall rate of 83%. This study offers valuable insights for businesses and academia to enhance decision-making and software development through user feedback analysis.

Incorporating Word Embedding and Hybrid Model Random Forest Softmax Regression for Predicting News Categories

Article

Full-text available

Sep 2023
MULTIMED TOOLS APPL

Online media reshaped the news industry leading to information richness, timely dissemination, and immense diversity. In addition, recent technological advancements enable on-spot, prompt and frequent reporting which can be viewed on smartphones, personal computers, and mobile devices. These recent developments enhanced the importance of news categorization. Accurate news categorization has become an important element to increase user satisfaction by providing the news of their interest and desired category. Despite the available approaches for news categorization, such approaches lack the desired accuracy and require further research to improve their performance. For this purpose, this research proposes a hybrid model that comprises random forest (RF) and SoftMax regression. To further increase the accuracy, special emphasis is placed on preprocessing steps to remove the noise from the textual data. Moreover, term frequency-inverse document frequency (TF-IDF) and bag of words (BoW) approaches are leveraged for the proposed model due to their reported efficacy for the task at hand. Experimental results indicate that the proposed model achieves 98.1% accuracy and outperforms individual machine learning classifiers regarding the accuracy, precision, recall, and F1 score. Hybrid approaches of RF and SMR tend to show better results than individual, as well as, state-of-the-art approaches.

Analyzing Sentiments Regarding ChatGPT Using Novel BERT Model

Article

Full-text available

Aug 2023

Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful responses to queries, aiding people in learning. While some individuals support ChatGPT, others view it as a disruptive tool in the field of education. Discussions about this tool can be found across different social media platforms. Analyzing the sentiment of such social media data, which comprises people’s opinions, is crucial for assessing public sentiment regarding the success and shortcomings of such tools. This study performs a sentiment analysis and topic modeling on ChatGPT-based tweets. ChatGPT-based tweets are the author’s extracted tweets from Twitter using ChatGPT hashtags, where users share their reviews and opinions about ChatGPT, providing a reference to the thoughts expressed by users in their tweets. The Latent Dirichlet Allocation (LDA) approach is employed to identify the most frequently discussed topics in relation to ChatGPT tweets. For the sentiment analysis, a deep transformer-based Bidirectional Encoder Representations from Transformers (BERT) model with three dense layers of neural networks is proposed. Additionally, machine and deep learning models with fine-tuned parameters are utilized for a comparative analysis. Experimental results demonstrate the superior performance of the proposed BERT model, achieving an accuracy of 96.49%.

Text Mining – A Comparative Review of Twitter Sentiments Analysis

Article

Jul 2023

Background Text mining derives information and patterns from textual data. Online social media platforms, which have recently acquired great interest, generate vast text data about human behaviors based on their interactions. This data is generally ambiguous and unstructured. The data includes typing errors and errors in grammar that cause lexical, syntactic, and semantic uncertainties. This results in incorrect pattern detection and analysis. Researchers are employing various text mining techniques that can aid in Topic Modeling, the detection of Trending Topics, the identification of Hate Speeches, and the growth of communities in online social media networks. Objective This review paper compares the performance of ten machine learning classification techniques on a Twitter data set for analyzing users' sentiments on posts related to airline usage. Methods Review and comparative analysis of Gaussian Naive Bayes, Random Forest, Multinomial Naive Bayes, Multinomial Naive Bayes with Bagging, Adaptive Boosting (AdaBoost), Optimized AdaBoost, Support Vector Machine (SVM), Optimized SVM, Logistic Regression, and Long-Short Term Memory (LSTM) for sentiment analysis. Results The results of the experimental study showed that the Optimized SVM performed better than the other classifiers, with a training accuracy of 99.73% and testing accuracy of 89.74% compared to other models. Conclusion Optimized SVM uses the RBF kernel function and nonlinear hyperplanes to split the dataset into classes, correctly classifying the dataset into distinct polarity. This, together with Feature Engineering utilizing Forward Trigrams and Weighted TF-IDF, has improved Optimized SVM classifier performance regarding train and test accuracy. Therefore, the train and test accuracy of Optimized SVM are 99.73% and 89.74% respectively. When compared to Random Forest, a marginal of 0.09% and 1.73% performance enhancement is observed in terms of train and test accuracy and 1.29% (train accuracy) and 3.63% (test accuracy) of improved performance when compared with LSTM. Likewise, Optimized SVM, gave more than 10% of enhanced performance in terms of train accuracy when compared with Gaussian Naïve Bayes, Multinomial Naïve Bayes, Multinomial Naïve Bayes with Bagging, Logistic Regression and a similar enhancement is observed with AdaBoost and Optimized AdaBoost which are ensemble models during the experimental process. Optimized SVM also has outperformed all the classification models in terms of AUC-ROC train and test scores.

Comparison between accuracy, precision, recall, and f 1 score of all learning models using the TF-IDF technique.

Similar publications

Citations