Fault Tolerant Techniques "A Comprehensive Survey of Fault Tolerance Techniques in Cloud Computing" [8], discusses various fault tolerance techniques which are deployed according to their policies and applications. This paper also describes the full taxonomy of faults, failures and errors. "Fault Tolerance in Cloud Computing: A Review", IJCST, [9] presents a review on fault tolerance in cloud computing and discusses reliability assessment algorithm and its impact analysis.

Source publication

AN APPROACH FOR FAULT TOLERANCE IN CLOUD COMPUTING USING MACHINE LEARNING TECHNIQUE

Research

Full-text available

Nov 2017

Cloud computing has developed as a successful new paradigm in the IT industry. We can define cloud computing in simple terms as the organization and provision of resources, information, software and application as services over the cloud which are actively scalable. The dynamic settings of cloud are more or less susceptible to failure. It is the ad...

Context 1

... the application is being migrated, first its state is saved and then migration to a different node takes place. Figure 2 enlists all the fault tolerance techniques in practice. ...

View in full-text

Design Access Control Model for the Cloud Computing Environment

Article

Full-text available

Jan 2021

Recently, cloud computing applications are growing very fast, with the increasing number of organizations that resorting to use or store resources in the Cloud, several 15 challenges have been identified. Security is one of the most challenging aspects of Cloud Computing. Access control offers strong security for data and is considered as a major r...

From sequential to parallel execution plan

Sequential execution plan with statistics collectors

Execution time, monetary cost, and benefit for simple (a–c) and complex...

SLA-driven resource re-allocation for SQL-like queries in the cloud

Article

Full-text available

Dec 2020

Cloud computing has become a widely used environment for database querying. In this context, the goal of a query optimizer is to satisfy the needs of tenants and maximize the provider’s benefit. Resource allocation is an important step toward achieving this goal. Allocation methods are based on analytical formulas and statistics collected from a ca...

Intelligent HTTP Request Distribution Strategies in One and Two-Layer Architectures of Cloud-Based Web Systems

Chapter

Full-text available

Mar 2020

Krzysztof Zatwarnicki

Web Cloud systems are very popular today. One of the main problems in cloud computing today is the better use of distributed resources and applying them to achieve higher throughput. To solve those problems load distribution mechanisms are implemented. A two-level decision HTTP request distribution strategy working in a one-layer architecture is pr...

Fig. 5 Schema of the manufacturing equipment

Comparison of characteristics of three advanced manufacturing models

Time required for A, B, and C to assign tasks

Time required for A, B, and C to complete tasks

Product platform architecture for cloud manufacturing

Article

Full-text available

May 2020

Cloud manufacturing is emerging as a new manufacturing paradigm and an integrated technology. To adapt to the increasing challenges of the traditional manufacturing industry transforming toward service-oriented and innovative manufacturing, this paper proposes a product platform architecture based on cloud manufacturing. Firstly, a framework for th...

Failure detectors for crash faults in cloud

Article

Full-text available

Nov 2019

Failure detector (FD) is an inherent component in atomic broadcast and consensus protocols. Failures are broadly categorized into two types: crash and byzantine. The crash failures simply discontinue the working of a system whereas byzantine reflects the malicious behavior while ongoing communication. The problem to detect a failure becomes more ch...

Improved accuracy and less fault prediction errors via modified sequential minimal optimization algorithm

Article

Full-text available

Apr 2023
PLOS ONE

The benefits and opportunities offered by cloud computing are among the fastest-growing technologies in the computer industry. Additionally, it addresses the difficulties and issues that make more users more likely to accept and use the technology. The proposed research comprised of machine learning (ML) algorithms is Naïve Bayes (NB), Library Support Vector Machine (LibSVM), Multinomial Logistic Regression (MLR), Sequential Minimal Optimization (SMO), K Nearest Neighbor (KNN), and Random Forest (RF) to compare the classifier gives better results in accuracy and less fault prediction. In this research, the secondary data results (CPU-Mem Mono) give the highest percentage of accuracy and less fault prediction on the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 folds cross-validation (74.88%), and (CPU-Mem Multi) in terms of 80/20 (89.72%), 70/30 (90.28%), and 5 folds cross-validation (92.83%). Furthermore, on (HDD Mono) the SMO classifier gives the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5 folds cross-validation (88.38%), and (HDD-Multi) in terms of 80/20 (93.64%), 70/30 (90.91%), and 5 folds cross-validation (88.20%). Whereas, primary data results found RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5 folds cross-validation (95.85%) in the primary data results, but the algorithm complexity (0.17 seconds) is not good. In terms of 80/20 (95.71%), 70/30 (95.71%), and 5 folds cross-validation (95.71%), SMO has the second highest accuracy and less fault prediction, but the algorithm complexity is good (0.3 seconds). The difference in accuracy and less fault prediction between RF and SMO is only (.13%), and the difference in time complexity is (14 seconds). We have decided that we will modify SMO. Finally, the Modified Sequential Minimal Optimization (MSMO) Algorithm method has been proposed to get the highest accuracy & less fault prediction errors in terms of 80/20 (96.42%), 70/30 (96.42%), & 5 fold cross validation (96.50%).

Achieving Reliability in Cloud Computing by a Novel Hybrid Approach

Article

Full-text available

Feb 2023
SENSORS-BASEL

Cloud computing (CC) benefits and opportunities are among the fastest growing technologies in the computer industry. Cloud computing’s challenges include resource allocation, security, quality of service, availability, privacy, data management, performance compatibility, and fault tolerance. Fault tolerance (FT) refers to a system’s ability to continue performing its intended task in the presence of defects. Fault-tolerance challenges include heterogeneity and a lack of standards, the need for automation, cloud downtime reliability, consideration for recovery point objects, recovery time objects, and cloud workload. The proposed research includes machine learning (ML) algorithms such as naïve Bayes (NB), library support vector machine (LibSVM), multinomial logistic regression (MLR), sequential minimal optimization (SMO), K-nearest neighbor (KNN), and random forest (RF) as well as a fault-tolerance method known as delta-checkpointing to achieve higher accuracy, lesser fault prediction error, and reliability. Furthermore, the secondary data were collected from the homonymous, experimental high-performance computing (HPC) system at the Swiss Federal Institute of Technology (ETH), Zurich, and the primary data were generated using virtual machines (VMs) to select the best machine learning classifier. In this article, the secondary and primary data were divided into two split ratios of 80/20 and 70/30, respectively, and cross-validation (5-fold) was used to identify more accuracy and less prediction of faults in terms of true, false, repair, and failure of virtual machines. Secondary data results show that naïve Bayes performed exceptionally well on CPU-Mem mono and multi blocks, and sequential minimal optimization performed very well on HDD mono and multi blocks in terms of accuracy and fault prediction. In the case of greater accuracy and less fault prediction, primary data results revealed that random forest performed very well in terms of accuracy and fault prediction but not with good time complexity. Sequential minimal optimization has good time complexity with minor differences in random forest accuracy and fault prediction. We decided to modify sequential minimal optimization. Finally, the modified sequential minimal optimization (MSMO) algorithm with the fault-tolerance delta-checkpointing (D-CP) method is proposed to improve accuracy, fault prediction error, and reliability in cloud computing.

Energy efficient fault tolerance techniques in green cloud computing: A systematic survey and taxonomy

Article

Oct 2022

Cloud computing has brought the accessibility of several software platforms under a single roof. It has transformed resources into scalable services on demand and provides the only solution to the high resource requirements. All cloud service providers usually offer all types of services in the cloud computing environment, even though they also handle security-related challenges like reliability, availability, and throughput. One of the most decisive challenges in the cloud is handling faults. High fault tolerance in the cloud is a must to attain high performance, and the defects must be investigated and examined for future guidance. The principal target of this paper is to gain insight into the fault tolerance techniques that are available to us and the challenges that are required to be overcome. We concluded that there is always a relation between faults and energy consumption during our survey. If there is a high potential to tolerate a fault, there will be a need for more infrastructure and devices to fix those faults, which further leads to more power consumption. In this paper, 129 Research papers published through February 2022 were considered and further classified. This paper critically reviews techniques to tolerate faults in cloud computing systems and discusses the taxonomy of errors, faults, and failures. Furthermore, this paper aims to investigate several critical research topics and advanced techniques, such as artificial intelligence, deep learning, the Internet of Things, and machine learning, that may be employed as an intelligent fault tolerance strategy in the cloud environment.

Cloud failure prediction based on traditional machine learning and deep learning

Article

Full-text available

Sep 2022

Cloud failure is one of the critical issues since it can cost millions of dollars to cloud service providers, in addition to the loss of productivity suffered by industrial users. Fault tolerance management is the key approach to address this issue, and failure prediction is one of the techniques to prevent the occurrence of a failure. One of the main challenges in performing failure prediction is to produce a highly accurate predictive model. Although some work on failure prediction models has been proposed, there is still a lack of a comprehensive evaluation of models based on different types of machine learning algorithms. Therefore, in this paper, we propose a comprehensive comparison and model evaluation for predictive models for job and task failure. These models are built and trained using five traditional machine learning algorithms and three variants of deep learning algorithms. We use a benchmark dataset, called Google Cloud Traces, for training and testing the models. We evaluated the performance of models using multiple metrics and determined their important features, as well as measured their scalability. Our analysis resulted in the following findings. Firstly, in the case of job failure prediction, we found that Extreme Gradient Boosting produces the best model where the disk space request and CPU request are the most important features that influence the prediction. Second, for task failure prediction, we found that Decision Tree and Random Forest produce the best models where the priority of the task is the most important feature for both models. Our scalability analysis has determined that the Logistic Regression model is the most scalable as compared to others.

HBI-LB: A Dependable Fault-Tolerant Load Balancing Approach for Fog based Internet-of-Things Environment

Article

Full-text available

Sep 2022
J SUPERCOMPUT

The need for real-time analysis of smart data gave birth to the idea of fog computing. On one hand, the introduction of the fog layer to the cloud-IoT ecosystem provides faster response, mobility, and location awareness; On the other hand, it increases the attack surface area for the adversaries. The user data becomes highly probable to fall prey to the attackers as now it is processed near the end devices. Under these circumstances, the development of a trustworthy network is very important. Trust management in fog computing network involves different factors; ‘dependability’ being one of them. In this paper, the authors have presented a transitive interpretation to manage dependability in the said scenario. As per the proposed interpretation, load balancing may be deployed for a dependable fog system. Therefore, in the given research work, the authors have presented HBI-LB, a dependable fault-tolerant load balancing technique using a nature-inspired approach. The proposed approach is simulated using CloudSim 3.0.3-based Cloud Analyst tool. The obtained results are compared to the traditional and state-of-the-art approaches. The comparison is done based on average response time versus the number of tasks and executable instruction length per task.

A Survey of Fault Tolerance in Cloud Computing

Article

Full-text available

Oct 2018

Cloud computing has brought about a transformation in the delivery model of information technology from a product to a service. It has enabled the availability of various software, platforms and infrastructural resources as scalable services on demand over the internet. However, the performance of cloud computing services is hampered due to their inherent vulnerability to failures owing to the scale at which they operate. It is possible to utilize cloud computing services to their maximum potential only if the performance related issues of reliability, availability, and throughput are handled effectively by cloud service providers. Therefore, fault tolerance becomes a critical requirement for achieving high performance in cloud computing. This paper presents a comprehensive overview of fault tolerance-related issues in cloud computing; emphasizing upon the significant concepts, architectural details, and the state-of-art techniques and methods. The objective is to provide insights into the existing fault tolerance approaches as well as challenges yet required to be overcome. The survey enumerates a few promising techniques that may be used for efficient solutions and also, identifies important research directions in this area.

Implementation of Fault-Tolerance Mechanism in Quorum-Based Blockchain Provisioning in Cloud Infrastructure Using Replication and Monitoring Protocols

Conference Paper

Dec 2023

Fault Tolerance of Cloud Infrastructure with Machine Learning

Article

Full-text available

Nov 2023

Enhancing the fault tolerance of cloud systems and accurately forecasting cloud performance are pivotal concerns in cloud computing research. This research addresses critical concerns in cloud computing by enhancing fault tolerance and forecasting cloud performance using machine learning models. Leveraging the Google trace dataset with 10000 cloud environment records encompassing diverse metrics, we systematically have employed machine learning algorithms, including linear regression, decision trees, and gradient boosting, to construct predictive models. These models have outperformed baseline methods, with C5.0 and XGBoost showing exceptional accuracy, precision, and reliability in forecasting cloud behavior. Feature importance analysis has identified the ten most influential factors affecting cloud system performance. This work significantly advances cloud optimization and reliability, enabling proactive monitoring, early performance issue detection, and improved fault tolerance. Future research can further refine these predictive models, enhancing cloud resource management and ultimately improving service delivery in cloud computing.

Fault aware task scheduling in cloud using min-min and DBSCAN

Article

Jul 2023

Blockchain Based Simulated Virtual Machine Placement Hybrid Approach for Decentralized Cloud and Edge Computing Environments

Chapter

Jun 2023

In the distributed cloud environment, each server (computing server (CS)) is configured with Local Resource Monitors (LRM), which runs independently and performs Virtual Machine (VM) migrations to nearby servers. Approaches like predictive VM migration considering peer servers’ CPU usage, setting up rotative decision-making capacity among the peer server are some approaches proposed by the authors for decentralized cloud and edge computing environment and edge computing during their study. Decentralized cloud and edge computing environment suffer from overutilization caused due to multiple VM placements by peer servers on the same server. This work proposes adaptive predictive VM placement using blockchain with two- threshold for the decentralized cloud and edge computing environment combined with the edge computing environment. In this work, each server in the framework considers its own and peer server’s current and future CPU utilization before it takes a decision for VM migration. Experimental results reveal that the proposed dynamic threshold based predictive approach has better results compared with randomized peer-to-peer VM placement. The use of blockchain during VM placement allows the identified server to maintain its current and future utilization below the upper threshold usage limit and also ensure tampering proof communication in the peer server during VM placement.KeywordsBlockchainEdge computingDecentralized cloud

Context in source publication

Similar publications

Citations