Fig 2 - uploaded by Deepak Kochhar
Content may be subject to copyright.
Fault Tolerant Techniques "A Comprehensive Survey of Fault Tolerance Techniques in Cloud Computing" [8], discusses various fault tolerance techniques which are deployed according to their policies and applications. This paper also describes the full taxonomy of faults, failures and errors. "Fault Tolerance in Cloud Computing: A Review", IJCST, [9] presents a review on fault tolerance in cloud computing and discusses reliability assessment algorithm and its impact analysis.

Fault Tolerant Techniques "A Comprehensive Survey of Fault Tolerance Techniques in Cloud Computing" [8], discusses various fault tolerance techniques which are deployed according to their policies and applications. This paper also describes the full taxonomy of faults, failures and errors. "Fault Tolerance in Cloud Computing: A Review", IJCST, [9] presents a review on fault tolerance in cloud computing and discusses reliability assessment algorithm and its impact analysis.

Source publication
Research
Full-text available
Cloud computing has developed as a successful new paradigm in the IT industry. We can define cloud computing in simple terms as the organization and provision of resources, information, software and application as services over the cloud which are actively scalable. The dynamic settings of cloud are more or less susceptible to failure. It is the ad...

Context in source publication

Context 1
... the application is being migrated, first its state is saved and then migration to a different node takes place. Figure 2 enlists all the fault tolerance techniques in practice. ...

Similar publications

Article
Full-text available
Recently, cloud computing applications are growing very fast, with the increasing number of organizations that resorting to use or store resources in the Cloud, several 15 challenges have been identified. Security is one of the most challenging aspects of Cloud Computing. Access control offers strong security for data and is considered as a major r...
Article
Full-text available
Cloud computing has become a widely used environment for database querying. In this context, the goal of a query optimizer is to satisfy the needs of tenants and maximize the provider’s benefit. Resource allocation is an important step toward achieving this goal. Allocation methods are based on analytical formulas and statistics collected from a ca...
Chapter
Full-text available
Web Cloud systems are very popular today. One of the main problems in cloud computing today is the better use of distributed resources and applying them to achieve higher throughput. To solve those problems load distribution mechanisms are implemented. A two-level decision HTTP request distribution strategy working in a one-layer architecture is pr...
Article
Full-text available
Cloud manufacturing is emerging as a new manufacturing paradigm and an integrated technology. To adapt to the increasing challenges of the traditional manufacturing industry transforming toward service-oriented and innovative manufacturing, this paper proposes a product platform architecture based on cloud manufacturing. Firstly, a framework for th...
Article
Full-text available
Failure detector (FD) is an inherent component in atomic broadcast and consensus protocols. Failures are broadly categorized into two types: crash and byzantine. The crash failures simply discontinue the working of a system whereas byzantine reflects the malicious behavior while ongoing communication. The problem to detect a failure becomes more ch...

Citations

... Kochhar et al. [13] the NB classifier are one of the most useful machine learning algorithms. The NB classifier is based on the Bayes theorem, which requires significant independence (nave) between qualities or features (predictors). ...
Article
Full-text available
The benefits and opportunities offered by cloud computing are among the fastest-growing technologies in the computer industry. Additionally, it addresses the difficulties and issues that make more users more likely to accept and use the technology. The proposed research comprised of machine learning (ML) algorithms is Naïve Bayes (NB), Library Support Vector Machine (LibSVM), Multinomial Logistic Regression (MLR), Sequential Minimal Optimization (SMO), K Nearest Neighbor (KNN), and Random Forest (RF) to compare the classifier gives better results in accuracy and less fault prediction. In this research, the secondary data results (CPU-Mem Mono) give the highest percentage of accuracy and less fault prediction on the NB classifier in terms of 80/20 (77.01%), 70/30 (76.05%), and 5 folds cross-validation (74.88%), and (CPU-Mem Multi) in terms of 80/20 (89.72%), 70/30 (90.28%), and 5 folds cross-validation (92.83%). Furthermore, on (HDD Mono) the SMO classifier gives the highest percentage of accuracy and less fault prediction fault in terms of 80/20 (87.72%), 70/30 (89.41%), and 5 folds cross-validation (88.38%), and (HDD-Multi) in terms of 80/20 (93.64%), 70/30 (90.91%), and 5 folds cross-validation (88.20%). Whereas, primary data results found RF classifier gives the highest percentage of accuracy and less fault prediction in terms of 80/20 (97.14%), 70/30 (96.19%), and 5 folds cross-validation (95.85%) in the primary data results, but the algorithm complexity (0.17 seconds) is not good. In terms of 80/20 (95.71%), 70/30 (95.71%), and 5 folds cross-validation (95.71%), SMO has the second highest accuracy and less fault prediction, but the algorithm complexity is good (0.3 seconds). The difference in accuracy and less fault prediction between RF and SMO is only (.13%), and the difference in time complexity is (14 seconds). We have decided that we will modify SMO. Finally, the Modified Sequential Minimal Optimization (MSMO) Algorithm method has been proposed to get the highest accuracy & less fault prediction errors in terms of 80/20 (96.42%), 70/30 (96.42%), & 5 fold cross validation (96.50%).
... Kochhar et al. [21] suggested the NB classifier is one of the most useful ML algorithms. The NB classifier is based on the Bayes theorem, which requires significant independence (naïve) between qualities or features (predictors). ...
... [21] Deepak Kochhar et al. 2017 The proactive fault-tolerance technique is used in this article, and they propose using the NB classifier to classify the nodes. ...
... In the equation, (19) represents the VM's reliability, and (20) to (21) represents the host's reliability, where MMi is the available memory ratio, CPi is the available MIPS ratio, BWi is the available bandwidth ratio, and Ri is the reliability of the ith VM. ...
Article
Full-text available
Cloud computing (CC) benefits and opportunities are among the fastest growing technologies in the computer industry. Cloud computing’s challenges include resource allocation, security, quality of service, availability, privacy, data management, performance compatibility, and fault tolerance. Fault tolerance (FT) refers to a system’s ability to continue performing its intended task in the presence of defects. Fault-tolerance challenges include heterogeneity and a lack of standards, the need for automation, cloud downtime reliability, consideration for recovery point objects, recovery time objects, and cloud workload. The proposed research includes machine learning (ML) algorithms such as naïve Bayes (NB), library support vector machine (LibSVM), multinomial logistic regression (MLR), sequential minimal optimization (SMO), K-nearest neighbor (KNN), and random forest (RF) as well as a fault-tolerance method known as delta-checkpointing to achieve higher accuracy, lesser fault prediction error, and reliability. Furthermore, the secondary data were collected from the homonymous, experimental high-performance computing (HPC) system at the Swiss Federal Institute of Technology (ETH), Zurich, and the primary data were generated using virtual machines (VMs) to select the best machine learning classifier. In this article, the secondary and primary data were divided into two split ratios of 80/20 and 70/30, respectively, and cross-validation (5-fold) was used to identify more accuracy and less prediction of faults in terms of true, false, repair, and failure of virtual machines. Secondary data results show that naïve Bayes performed exceptionally well on CPU-Mem mono and multi blocks, and sequential minimal optimization performed very well on HDD mono and multi blocks in terms of accuracy and fault prediction. In the case of greater accuracy and less fault prediction, primary data results revealed that random forest performed very well in terms of accuracy and fault prediction but not with good time complexity. Sequential minimal optimization has good time complexity with minor differences in random forest accuracy and fault prediction. We decided to modify sequential minimal optimization. Finally, the modified sequential minimal optimization (MSMO) algorithm with the fault-tolerance delta-checkpointing (D-CP) method is proposed to improve accuracy, fault prediction error, and reliability in cloud computing.
... According to Yanchao Zhu et al. [93], replication is frequently employed to assess computational correctness; however, it necessitates replacing damaged instruments with redundant materials for node failure defects. Capability to be placed and tolerance to flaws are two aspects [94] that are inextricably intertwined. According to Setlur et al. [95], replication makes an exact clone of a mission. ...
Article
Cloud computing has brought the accessibility of several software platforms under a single roof. It has transformed resources into scalable services on demand and provides the only solution to the high resource requirements. All cloud service providers usually offer all types of services in the cloud computing environment, even though they also handle security-related challenges like reliability, availability, and throughput. One of the most decisive challenges in the cloud is handling faults. High fault tolerance in the cloud is a must to attain high performance, and the defects must be investigated and examined for future guidance. The principal target of this paper is to gain insight into the fault tolerance techniques that are available to us and the challenges that are required to be overcome. We concluded that there is always a relation between faults and energy consumption during our survey. If there is a high potential to tolerate a fault, there will be a need for more infrastructure and devices to fix those faults, which further leads to more power consumption. In this paper, 129 Research papers published through February 2022 were considered and further classified. This paper critically reviews techniques to tolerate faults in cloud computing systems and discusses the taxonomy of errors, faults, and failures. Furthermore, this paper aims to investigate several critical research topics and advanced techniques, such as artificial intelligence, deep learning, the Internet of Things, and machine learning, that may be employed as an intelligent fault tolerance strategy in the cloud environment.
... This replication increases the likelihood that at least one task will be completed correctly. The second technique is the re-submission of tasks [11]. When a task fails, it will be rerun with the same node or with a different resource. ...
Article
Full-text available
Cloud failure is one of the critical issues since it can cost millions of dollars to cloud service providers, in addition to the loss of productivity suffered by industrial users. Fault tolerance management is the key approach to address this issue, and failure prediction is one of the techniques to prevent the occurrence of a failure. One of the main challenges in performing failure prediction is to produce a highly accurate predictive model. Although some work on failure prediction models has been proposed, there is still a lack of a comprehensive evaluation of models based on different types of machine learning algorithms. Therefore, in this paper, we propose a comprehensive comparison and model evaluation for predictive models for job and task failure. These models are built and trained using five traditional machine learning algorithms and three variants of deep learning algorithms. We use a benchmark dataset, called Google Cloud Traces, for training and testing the models. We evaluated the performance of models using multiple metrics and determined their important features, as well as measured their scalability. Our analysis resulted in the following findings. Firstly, in the case of job failure prediction, we found that Extreme Gradient Boosting produces the best model where the disk space request and CPU request are the most important features that influence the prediction. Second, for task failure prediction, we found that Decision Tree and Random Forest produce the best models where the priority of the task is the most important feature for both models. Our scalability analysis has determined that the Logistic Regression model is the most scalable as compared to others.
... They have proposed Energy-efficient Checkpointing Restore and Backup with Classification (ECRBC) and Energy-efficient Checkpointing and Load Balancing (ECLB); where ECRBC is a scheduling algorithm that can abide the existence of possible faults in fog, and ECLB optimizes ECRBC by distributing modules among fog nodes. Authors in [19] have mentioned that fault tolerance is dealt with in two ways; Reactive and Proactive. Reactive fault tolerance techniques are the ones that come into play after the faults have already occurred while the proactive ones are the pre-strategic techniques that operate in advance of occurrence. ...
... Context-aware placement policy is a class of offloading mechanisms that undertakes the system's state at a particular instant and adapts its performance accordingly [18]. Fault tolerance techniques are the policies employed to minimize the fault manifestation in computing systems [19]. Middleware in this scenario refers to the programs that help establish a connection between the diversified components of the fog environment [17]. ...
... The system can endure the instability that persists due to the occurrence of faults. Fault tolerance can be further split into two categories viz., Proactive fault tolerance and Reactive fault tolerance [19]. Proactive fault tolerance involves the strategies that predict the faults and errors before their occurrence and fabricate their patch in advance. ...
Article
Full-text available
The need for real-time analysis of smart data gave birth to the idea of fog computing. On one hand, the introduction of the fog layer to the cloud-IoT ecosystem provides faster response, mobility, and location awareness; On the other hand, it increases the attack surface area for the adversaries. The user data becomes highly probable to fall prey to the attackers as now it is processed near the end devices. Under these circumstances, the development of a trustworthy network is very important. Trust management in fog computing network involves different factors; ‘dependability’ being one of them. In this paper, the authors have presented a transitive interpretation to manage dependability in the said scenario. As per the proposed interpretation, load balancing may be deployed for a dependable fog system. Therefore, in the given research work, the authors have presented HBI-LB, a dependable fault-tolerant load balancing technique using a nature-inspired approach. The proposed approach is simulated using CloudSim 3.0.3-based Cloud Analyst tool. The obtained results are compared to the traditional and state-of-the-art approaches. The comparison is done based on average response time versus the number of tasks and executable instruction length per task.
... These machine learning techniques have also been used to develop fault tolerance methods to enhance service reliability. In particular, machine learning is employed to develop proactive fault tolerance methods where failure prediction needs to be done before it occurs in the system based on previous data of the system for the same (Leam, xxxx; Kochhar and Hilda, 2017;Li-qun et al., 2011). Some machine learning based algorithms are described in Table 6. ...
Article
Full-text available
Cloud computing has brought about a transformation in the delivery model of information technology from a product to a service. It has enabled the availability of various software, platforms and infrastructural resources as scalable services on demand over the internet. However, the performance of cloud computing services is hampered due to their inherent vulnerability to failures owing to the scale at which they operate. It is possible to utilize cloud computing services to their maximum potential only if the performance related issues of reliability, availability, and throughput are handled effectively by cloud service providers. Therefore, fault tolerance becomes a critical requirement for achieving high performance in cloud computing. This paper presents a comprehensive overview of fault tolerance-related issues in cloud computing; emphasizing upon the significant concepts, architectural details, and the state-of-art techniques and methods. The objective is to provide insights into the existing fault tolerance approaches as well as challenges yet required to be overcome. The survey enumerates a few promising techniques that may be used for efficient solutions and also, identifies important research directions in this area.
Article
Full-text available
Enhancing the fault tolerance of cloud systems and accurately forecasting cloud performance are pivotal concerns in cloud computing research. This research addresses critical concerns in cloud computing by enhancing fault tolerance and forecasting cloud performance using machine learning models. Leveraging the Google trace dataset with 10000 cloud environment records encompassing diverse metrics, we systematically have employed machine learning algorithms, including linear regression, decision trees, and gradient boosting, to construct predictive models. These models have outperformed baseline methods, with C5.0 and XGBoost showing exceptional accuracy, precision, and reliability in forecasting cloud behavior. Feature importance analysis has identified the ten most influential factors affecting cloud system performance. This work significantly advances cloud optimization and reliability, enabling proactive monitoring, early performance issue detection, and improved fault tolerance. Future research can further refine these predictive models, enhancing cloud resource management and ultimately improving service delivery in cloud computing.
Chapter
In the distributed cloud environment, each server (computing server (CS)) is configured with Local Resource Monitors (LRM), which runs independently and performs Virtual Machine (VM) migrations to nearby servers. Approaches like predictive VM migration considering peer servers’ CPU usage, setting up rotative decision-making capacity among the peer server are some approaches proposed by the authors for decentralized cloud and edge computing environment and edge computing during their study. Decentralized cloud and edge computing environment suffer from overutilization caused due to multiple VM placements by peer servers on the same server. This work proposes adaptive predictive VM placement using blockchain with two- threshold for the decentralized cloud and edge computing environment combined with the edge computing environment. In this work, each server in the framework considers its own and peer server’s current and future CPU utilization before it takes a decision for VM migration. Experimental results reveal that the proposed dynamic threshold based predictive approach has better results compared with randomized peer-to-peer VM placement. The use of blockchain during VM placement allows the identified server to maintain its current and future utilization below the upper threshold usage limit and also ensure tampering proof communication in the peer server during VM placement.KeywordsBlockchainEdge computingDecentralized cloud