Figure - available from: SN Computer Science
This content is subject to copyright. Terms and conditions apply.
An example of a random forest structure considering multiple decision trees

An example of a random forest structure considering multiple decision trees

Source publication
Article
Full-text available
In the current age of the Fourth Industrial Revolution (4IR or Industry 4.0), the digital world has a wealth of data, such as Internet of Things (IoT) data, cybersecurity data, mobile data, business data, social media data, health data, etc. To intelligently analyze these data and develop the corresponding smart and automated applications, the know...

Similar publications

Article
Full-text available
The digital world has a wealth of data, such as internet of things (IoT) data, business data, health data, mobile data, urban data, security data, and many more, in the current age of the Fourth Industrial Revolution (Industry 4.0 or 4IR). Extracting knowledge or useful insights from these data can be used for smart decision-making in various appli...
Preprint
Full-text available
Internet of Things (IoT) devices are becoming ubiquitous in our lives, with applications spanning from the consumer domain to commercial and industrial systems. The steep growth and vast adoption of IoT devices reinforce the importance of sound and robust cybersecurity practices during the device development life-cycles. IoT-related vulnerabilities...
Article
Full-text available
Modern industrial systems now, more than ever, require secure and efficient ways of communication. The trend of making connected, smart architectures is beginning to show in various fields of the industry such as manufacturing and logistics. The number of IoT (Internet of Things) devices used in such systems is naturally increasing and industry lea...
Article
Full-text available
The smart city vision has driven the rapid development and advancement of interconnected technologies using the Internet of Things (IoT) and cyber-physical systems (CPS). In this paper, various aspects of IoT and CPS in recent years (from 2013 to May 2023) are surveyed. It first begins with industry standards which ensure cost-effective solutions a...
Article
Full-text available
The smart manufacturing ecosystem enhances the end-to-end efficiency of the mine-to-market lifecycle to create the value chain using the big data generated rapidly by edge computing devices, third-party technologies, and various stakeholders connected via the industrial Internet of things. In this context, smart manufacturing faces two serious chal...

Citations

... The most widely used criterion for splitting are "gini" for the Gini impurity and "entropy" for the information gain, which are both defined mathematically as follows: where i represents the possible values for the random variable x, P (xi) represents the probability of each value of x, and c is the number of classes in the dataset (Eq 5,6). [22] ( ) ...
... Prioritizing the flood-influencing factors involved using the RF feature selection technique, which uses the mean and standard deviation of accumulation of the impurity decrease within each tree. [22] The risk of floods increases with drainage density. High flood peaks, steep slopes, and river streams that carry a lot of silt are generally associated with high drainage densities. ...
Article
Full-text available
A BSTRACT Background The world economy is significantly impacted by floods. Identifying flood risk is essential to flood mitigation techniques. Aim The primary goal of this study is to create a geographic information system (GIS)-based flood susceptibility map for the study area. Methods Ten flood-influencing factors from a geospatial database were taken into account when mapping the flood-prone areas. Every element demonstrated a robust relationship with the probability of flooding. Results The highest contributing elements for the flood disaster in the study region were drainage density, distance, and the curvature. Flood susceptibility models’ performance was validated using standard statistical measures and AUC. The ROC curves demonstrated that all ensemble models had good performance on the validation data sets (AUC = >0.97) with high accuracy scores of 0.80. Based on the flood susceptibility maps, most of the northwest regions of the study area are more likely to flood because of low land areas, areas with a lower gradient slope, linear and concave shape curvature, high drainage density with high rainfall, more “water bodies,” “crops land,” and “built areas,” abundance on sea and surface water, and Quaternary types of soil feature and so on. The very high flood susceptibility class accounts for 18.2% of the study area, according to the RF-embedding model, whereas the high, moderate, low, and very low susceptibility classes were found at about 20.0%, 24.6%, 24.3%, and 12.9%, respectively. Conclusion In comparison with other commonly used applied approaches, this research presents a novel modeling approach for flood susceptibility that integrates machine learning and geospatial data. It has been found to be stronger and more efficient, highly accurate, has good prediction performance, and is less biased. Overall, our research into machine learning-based solutions points in a positive path technologically and can serve as a reference manual for future research and applications for academic specialists and decision-makers.
... ML is a subset of AI that uses computational models to learn large complex data and generate useful predictive outputs without any explicit programing. The fundamental principles of ML are based on intricate statistical and mathematical optimization using numerical data [5]. Decision tree learning, support vector machines (SVMs), and artificial neural networks (ANNs) are three common ML models. ...
... Organizations utilize real-time systems to offer recommendations and furnish consumers with valuable information by leveraging the extensive data gathered from these systems (Lyu et al., 2021). The comprehensive data gathering enables AI and machine learning algorithms to extract valuable information and conduct advanced analytics (Sarker, 2021). Furthermore, the utilization of sophisticated analytics has the potential to enhance precision and outcomes across various domains. ...
Article
Full-text available
    Organizations are embracing innovation and strategic development driven by three powerful forces, technology, big data, and corporate entrepreneurship (CE), to achieve revenue growth, sustainability, and global leadership. The three levels of an organization—individual, group, and organizational—are interconnected and mutually impact one another, ultimately determining the organization’s outcomes in terms of efficiency, performance, and sustainability. A study employing multiple research methods was carried out to understand this complex situation. A total of 450 professionals from various industries participated in an online survey, sharing their experiences and viewpoints. The analysis, conducted using both SPSS and Microsoft Excel, revealed a number of significant findings: Approximately 30% of individuals come from the telecommunications industry. 42.5% consider CE the initial step for strategic initiatives, while technology is regarded as the dominant force (70.45%). Surprisingly, although only 42% utilize all three levels, 40% believe their combined effort leads to the best possible outcomes. This study illustrates a complex ecosystem where big data, technology, and CE must collaborate to achieve organizational sustainability and ensure the success of executive initiatives. By utilizing the combined strength of this triumvirate, organizations can ascend to the highest position on the competitive hierarchy.
    ... Ayrıca, XGBoost hızlı bir şekilde yorumlanabilmekte ve büyük boyutlu veri kümeleriyle etkili bir şekilde başa çıkabilmektedir. Bu özellikleri sayesinde XGBoost, karmaşık veri analiz problemlerini çözmek için yaygın olarak kullanılan bir yöntem olarak literatürde yer almaktadır (Sarker, 2021). XGBoost'un başarısının önemli faktörleri şu şekilde gösterilmektedir (Chen ve Guestrin, 2016): ...
    ... Pembelajaran Mesin dan Kecerdasan Buatan memanfatkan Teori probabilitas, statistik, dan aljabar linear digunakan untuk mengembangkan algoritma pembelajaran mesin dan jaringan saraf tiruan. Ini memungkinkan komputer untuk 'belajar' dari data dan membuat keputusan atau prediksi yang cerdas (Sarker, 2021;Taye, 2023). ...
    ... Nowadays, researchers apply various machine learning (ML) models to detect cancer at an early stage. Tese models use previous data such as genomic results, medical test results, X-ray results, and MRI results as input and predict whether the patient is sufering from cancer or not [5][6][7]. To build an ML-based model for cancer diagnosis, two types of data must be present: biopsy data and microarray data [8]. ...
    ... Tese algorithms are computationally efcient and have advantages for handling large numbers of genes and feature spaces with large dimensions [15]. Te NB model's probability calculations indicate crucial features for classifcation, while LR's regularization techniques facilitate feature selection and are less prone to overftting [5]. SVMs are suitable for high-dimensional spaces, while MLP captures complex, nonlinear relationships within high-dimensional datasets. ...
    ... SVMs are suitable for high-dimensional spaces, while MLP captures complex, nonlinear relationships within high-dimensional datasets. KNN classifes samples by their k-nearest neighbors' majority classes, efectively capturing local data patterns [5,31]. DT handles nonlinear relationships and inherently performs feature selection by identifying the most informative genes at each split, aiding in identifying relevant biomarkers in microarray datasets [6]. ...
    Article
    Full-text available
    Cancer is one of the leading causes of death across the globe. There is a need for early diagnosis to improve the chance of successful treatment and reduce the mortality associated with cancer. Due to the availability of highly specialized cancer datasets, molecular classification of cancer by gene expression, machine learning, and deep learning, a part of artificial intelligence (AI) techniques is used in detecting the disease. The application of several classification and feature selection methods on microarray gene expression datasets helps learn models that are able to predict a given disease. However, the tremendous dimensionality of the microarray cancer dataset is the greatest challenge in interpreting the data. In this work, the optimal feature subsets are selected by combining the correlation-based feature selection (CFS) technique with five distinct meta-heuristic search methods: evolutionary search (ES), particle swarm optimization search (PSOS), genetic search (GS), harmony search (HS), and multiobject evolutionary search (MOES). Furthermore, a CFS-MOES (correlation-based feature selection—multiobject evolutionary search) ensemble model is proposed based on a majority voting mechanism to improve the classification performance. Six microarray cancer datasets are considered, and seven traditional classifiers are evaluated on those datasets. Three classifiers, namely, K-nearest neighbour (KNN), multilayer perceptron (MLP), and random forest (RF), were chosen as the base classifiers based on their F-measure score. The features chosen by our proposed CFS-MOES method significantly improve the accuracy of the proposed model. Moreover, the proposed model has also been compared with the other ensemble models generated using CFS-ES (correlation-based feature selection —evolutionary search), CFS-PSOS (correlation-based feature selection—particle swarm optimization search), CFS-GS (correlation-based feature selection—genetic search), and CFS-HS (correlation-based feature selection—harmony search) feature selection methods, ensuring better classification accuracy with a reduced feature subset. This model is also evaluated using significant parameters such as precision, recall, F-measure, accuracy, Matthews correlation coefficient (MCC), and mean absolute error (MAE). According to the experimental results, our proposed model has a remarkable accuracy of 98.83% for breast cancer and 98.79% for cervical cancer.
    ... Currently, AI literacy is mainly represented in the technical literature of computer science (Sarker, 2021;De Silva & Alahakoon, 2022) and information engineering (Bawack et al., 2019), while remaining marginal in the design research. Recently, some attempts have been made to formalise AI structured knowledge useful to design students in understanding how to integrate AI within the design practice. ...
    ... To collect a preliminary list of comprehensive verbs for the capabilities of AI systems, we first conducted a literature review as a primary search to look for existing taxonomies of AI capabilities using the keywords "AI", "capabilities", "functionalities", "ML", "framework", "AI systems classification", "AI design taxonomy". We considered and analysed both the more technical taxonomies (Sarker, 2021;De Silva & Alahakoon, 2022) and those that aim to develop an AI literacy suitable for designers (Liao et al., 2020;Figoli et al., 2022;Jansen & Colombo, 2023). To expand the search further, a secondary search was conducted using Chatgpt, elaborating and providing diverse prompts to validate the information retrieved during the conversation with the tool. ...
    ... There are four major types of ML models -supervised, semi-supervised, unsupervised, and reinforcement learning models, with supervised models having found most use in healthcare. 4 While traditional statistical analyses merely infer relationships between variables, ML models aim to make accurate predictions. 5 So, by integrating the systems biology approach, combining different -omics data with supported bioinformatics analysis from ML models, we can predict, treat and monitor these diseases, translating to better clinical outcomes and lower overall damage accrual. ...
    Article
    Full-text available
    Autoimmune rheumatic diseases are often characterised by heterogeneity in presentation. The traditional approach to diseases guided by their phenotype may be suboptimal with the advent of precision medicine. Precision medicine is the integration and application of multiomics to predict the best-performing drug and its toxicity profile to derive optimal benefits. With novel drug discoveries and an expanding therapeutic armamentarium, it potentially aids in clinical and therapeutic decision-making, while saving time and averting adverse events. However, multiomics comes with 'big data', and owing to the costs, the sample size is usually small. Machine learning (ML) plays an important role in these scenarios where conventional statistics fall short. So, by integrating clinical data with the data from-omics, ML models can be built, which can accurately predict the clinical factors or even novel biomarkers that predict response. This approach has a potential for great benefit as valuable time or the 'therapeutic window of opportunity' would be saved, with fewer adverse events, eventually translating to lower damage accrual and better outcomes. Most of the evidence for the use of ML in precision rheumatology comes from rheumatoid arthritis and the factors predicting response to various drugs, including tumour necrosis factor inhibitors. This approach also has its limitations such as the lack of generalizability and the current scarcity of longitudinal data. These models must be tested in larger cohorts and population-based studies for validation, failing which there is a risk of apparent identification of multiple 'novel' biomarkers that may or may not be mechanistic.
    ... As mentioned in Section 4.2, regarding AI technologies and techniques, the majority of articles center around ML. Therefore, we will focus on the data requirements of different ML algorithms to emphasize different considerations when handling data issues. According to a prominent article, ML algorithms can be primarily categorized into four types: Supervised learning, Unsupervised learning, Semi-supervised learning, and Reinforcement learning (Sarker, 2021), as depicted in Figure 5. We outline the specific data requirements based on the classification of ML algorithms. ...
    ... Categories of machine learning algorithms. Adapted from(Sarker, 2021;Zhang & Yu, 2020) Supervised learning Supervised learning utilizes labeled training data and a set of training examples to deduce a function. The primary tasks in supervised learning are classification, which segregates the data, and regression, which fits the data. ...
    Preprint
    Full-text available
    In the era of Industry 4.0, artificial intelligence (AI) is assuming an increasingly pivotal role within industrial systems. Despite the recent trend within various industries to adopt AI, the actual adoption of AI is not as developed as perceived. A significant factor contributing to this lag is the data issues in AI implementation. How to address these data issues stands as a significant concern confronting both industry and academia. To address data issues, the first step involves mapping out these issues. Therefore, this study conducts a meta-review to explore data issues and methods within the implementation of industrial AI. Seventy-two data issues are identified and categorized into various stages of the data lifecycle, including data source and collection, data access and storage, data integration and interoperation, data pre-processing, data processing, data security and privacy, and AI technology adoption. Subsequently, the study analyzes the data requirements of various AI algorithms. Building on the aforementioned analyses, it proposes a data management framework, addressing how data issues can be systematically resolved at every stage of the data lifecycle. Finally, the study highlights future research directions. In doing so, this study enriches the existing body of knowledge and provides guidelines for professionals navigating the complex landscape of achieving data usability and usefulness in industrial AI.
    ... Lastly, Sarker [20] integrates data mining algorithms into a bank's database system to predict customer behavior and improve marketing strategies. This study utilizes unsupervised ML algorithms-such as k-means clustering-as well as supervised algorithmsincluding multi-class decision forest, multi-class logistic regression, etc. ...
    Article
    Full-text available
    The integration of machine learning (ML) techniques into marketing strategies has become increasingly relevant in modern business. Utilizing scientific manuscripts indexed in the Scopus database, this article explores how this integration is being carried out. Initially, a focused search is undertaken for academic articles containing both the terms “machine learning” and “marketing” in their titles, which yields a pool of papers. These papers have been processed using the Supabase platform. The process has included steps like text refinement and feature extraction. In addition, our study uses two key ML methodologies: topic modeling through NMF and a comparative analysis utilizing the k-means clustering algorithm. Through this analysis, three distinct clusters emerged, thus clarifying how ML techniques are influencing marketing strategies, from enhancing customer segmentation practices to optimizing the effectiveness of advertising campaigns.