Cloud System Architecture

Cloud System Architecture

Source publication
Conference Paper
Full-text available
Performance and availability are key aspects to evaluate the quality of cloud computing systems. The assessment of these systems should consider the effects of queuing and failure/recovery behavior of data center subsystems and disaster occurrences. Additionally, penalties may be applied if the defined quality level of SLA contracts is not satisfie...

Context in source publication

Context 1
... section presents a case study to illustrate the impor- tance of software tools for helping cloud computing designers to estimate performability metrics. Figure 8 presents a cloud architecture located in Brazil, which is composed of a data center located in Recife (Data Center 1), other in Rio de Janeiro (Data Center 2) and a Backup server in São Paulo. Each data center consists of two physical machines and each machine is capable to run up two virtual machines. ...

Similar publications

Conference Paper
Full-text available
Concurrent Kleene algebras support equational reasoning about computing systems with concurrent behaviours. Their natural semantics is given by series(-parallel) rational pomset languages, a standard true concurrency semantics, which is often associated with processes of Petri nets. We use constructions on Petri nets to provide two decision procedu...
Article
Full-text available
Petri nets and B-Method represent a pair of formal methods, for computer systems engineering, with interesting complementary features. Petri nets have nice graphical representation, valuable analytical properties and can express concurrency. B-Method supports verified software development. To gain from these complements, a mapping from Petri nets t...
Article
Full-text available
This dissertation presents theoretical and practical results on computer system reliability and availability growth modeling. Two kinds of system behavior characterizations are considered: first, with respect to time, and second, with respect to the number of executions performed. The dissertation is centered on two reliability growth models: the c...
Article
Full-text available
Modern port terminals are equipped with various local transport systems, which have the main task to transport cargo between local storehouses and transport resources (ships, trains, trucks) in the fastest and most efficient way, and at the lowest possible cost. These local transport systems consist of fully automated transport units (AGV- automati...
Article
Full-text available
Performance evaluation of cloud computing systems studies the relationships among system configuration, system load, and performance indicators. However, such evaluation is not feasible by dint of measurement methods or simulation methods, due to the properties of cloud computing, such as large scale, diversity, and dynamics. To overcome those chal...

Citations

... Similarly, Silva et al. (2014) developed a tool to analyse geographically distributed DCs. The tool adopted Reliability Block Diagrams (RBDs) and SPNs to analyse geographically distributed DCs. ...
Article
Systems outages can have disastrous effects on businesses such as data loss, customer dissatisfaction, and subsequent revenue loss. Disaster recovery (DR) solutions have been adopted by companies to minimise the effects of these outages. However, the selection of an optimal DR solution is difficult since there does not exist a single solution that suits the requirement of every company (e.g., availability and costs). In this paper, we propose an integrated model-experiment approach to evaluate DR solutions. We perform experiments in different real-world DR solutions and propose analytic models to evaluate these solutions regarding DR key-metrics: steady-state availability, recovery time objective (RTO), recovery point objective (RPO), downtime, and costs. The results reveal that DR solutions can significantly improve availability and minimise costs. Also, a sensitivity analysis identifies the parameters that most affect the RPO and RTO of the DR adopted solutions.
... It is worth mentioning that to define this architecture, we use the work of Silva et al. [30]. However, unlike the authors, we consider the network link flow as an essential factor in the data transfer time for distributed redundancy purposes. ...
... To accomplish the first step of this research methodology (Get Component's MTTF and MTTR metrics) we used the data obtained in Silva's et al. [30] work, according to Table 2. ...
Article
Full-text available
The increasing number of companies that are migrating their IT infrastructure to cloud environments has been motivated many studies on distributed backup strategies to improve the availability of these companies’ systems. In this scenario, it is essential to study mechanisms to evaluate the network conditions to minimize the transmission time to improve the availability of the system. The goal of this study is to build models to evaluate the availability of services running in cloud data center infrastructure, emphasizing the impact of the variation of throughput on the data redundancy, and consequently, on the availability of the service. Based on it, this research purposes some smart models which can be deployed in each data center of a distributed arrange of data centers and help the system administrator to choose the best data center to restore the services of a faulty one. To analyze the impact of the network throughput over the service’s availability, we gathered the MTTF and MTTR metrics of data center’s components and services, generated a reliability block diagram to get the MTTF of the system as a whole, and developed a formalism to model the network component. Based on the results, we built an SPN model to represent the system and get the availability of it in many network conditions. After that, we analyze the availability of the system to discuss the impact of the network conditions over the system’s availability. After building the models and get the system’s availability in many network conditions, we can perceive the enormous impact of the network conditions over the system’s availability through a plot that exhibits the annual downtime along of a year. Using the models developed to study the system availability, we developed smart agents capable of predicting the transfer time of a bulk of data and, with it, choose the data center with the best network conditions to restore the services of a faulty one.
... Araujo et al. [3] presented an approach for analyzing cloud infrastructures based on stochastic models and used a multiplecriteria method to compute and rank dependability-related metrics (availability, capacity-oriented availability, and costs). Silva et al. [4] and Nguyen et al. [5] presented works for evaluating Infrastructure-as-a-service (IaaS) systems deployed in geographically distributed data centers (DCs). These studies analyzed the adoption of virtual machines (VMs) transmission across DCs using stochastic modeling as a strategy to improve applications' performability and availability. ...
... Model-based evaluation techniques such as Markov chains and Petri nets have been employed to analyze different characteristics of systems. For instance, Petri nets [11] is a formalism quite widespread to evaluate systems focusing on dependability, concurrency, or performance [5,12,4]. Petri nets models are based on states (places) and activities (transitions). ...
Conference Paper
The consequences for a company losing its data or having its IT system disrupted are severe and can impact negatively on business operations. It can also cause customer dissatisfaction and subsequent revenue loss. In a competitive global market, companies have been adopting disaster recovery (DR) strategies as an attempt to keep IT systems operational, prevent data loss, and ensure business continuity. However, there is not a single DR strategy that meets the requirements of every business (e.g., availability and cost). Besides, most of the time, these requirements are conflicting. Therefore, efficient and accurate analysis of DR strategies before its deployment is crucial to choose the best strategy that suits companies’ needs and budget. In this paper, we propose the adoption of a multiplecriteria decision-making (MCDM) method and stochastic models to evaluate and rank DR strategies for IT infrastructures. The stochastic models are used for quantitative assessing distinct DR strategies regarding five DR key-metrics: availability, downtime, Recovery Time Objective (RTO), and Recovery Point Objective (RPO), and cost. We also use an MCDM method to rank the strategies according to multiple criteria (e.g., availability maximization and costs minimization). A case study demonstrates the feasibility and usefulness of the proposed approach for finding the best DR strategies according to multiple criteria.
... Os autores em [6] apresentam modelos de desempenho com a finalidade de avaliar os sistemas de computação em nuvem distribuídos em vários data centers, levando em consideração a ocorrência de desastres. Este trabalho apresenta uma abordagem para avaliar a capacidade de desempenho nos sistemas IaaS implantados em data centers geograficamente distribuídos. ...
... Assumindo os resultados das distâncias euclidianas para ambas políticas de manutenção, pode-se identificar a de menor valor queé 0,8259 para o SLA3 com um time de manutenção (6,9) 0,99999993 (7,2) 0,99999993 (7,2) SLA 4 0,99999998 (7,7) 0,99999999 (8,0) 0,99999999 (8,0) corretiva. Vale ressaltar que o contrato do nível três com apenas uma equipe para a manutenção preventiva apresenta resultados bem próximos com uma distância de 0,8481. ...
Article
Full-text available
Due to the growth of cloud computing, data center environment has grown in importance and in use. Data centers are responsible for maintaining and processing several critical-value applications. Therefore, data center infrastructures must be evaluated in order to improve the high availability and reliability demanded for such environments. This work adopts Stochastic Petri Nets (SPN) to evaluate the impact of maintenance policies on the data center dependability. The main goal is to analyze maintenance policies, associated to SLA contracts, and to propose improvements. In order to accomplish this, an optimization strategy that uses Euclidean distance is adopted to indicate the most appropriate solution assuming conflicting requirements (e.g., cost and availability). To illustrate the applicability of the proposed models and approach, this work presents case studies comparing different SLA contracts and maintenance policies (preventive and corrective) applied on data center electrical infrastructures.
... Numerical analysis indicated that by using redundant components, the system reduces the probability of failure and the number of lost requests. Nguyen et al. [8] and Silva et al. [12] presented studies for disaster tolerant data centers (DCs). In the proposed solutions, they adopted an automatic backup mechanism to store the virtual machines (VMs) data and, in case a disaster, have a mechanism to restore the VMs in another DC that could be operational. ...
Conference Paper
Systems unavailability may produce severe consequences for modern business such as data loss, customer dissatisfaction, and subsequent revenue loss. Disaster recovery (DR) solutions have been adopted by many organizations as an attempt to prevent data loss and ensure business continuity. With the cloud computing expansion, different cloud providers have been offering low-cost solutions for DR purposes such as the Backup-as-a-service (BaaS) for consumers. Therefore, in this paper, we present an integrated model-experiment approach to evaluate a BaaS environment for DR purposes. We use analytic models and fault-injection experiments to evaluate DR key-metrics such as availability, downtime, Recovery Time Objective (RTO), and Recovery Point Objective (RPO) in a real-world BaaS environment. The results revealed that the environment availability can vary according to the amount of data to backed up and restored. Besides, a sensitivity analysis shows that the RTO and RPO are mainly influenced by the the mean time to recover from a disaster and the backup interval, respectively.
... However, nowadays 410 they are vital to ensure business operation [35]. Moreover, losing data is expensive for companies because it can impact severely on the company revenue, for instance, interruption of business transactions, SLA violations, or loss of confi-dence [33,36,39,40]. In this context, one key challenge is to define a low-cost and yet effective business continuity strategy [49]. ...
... However, this strategy tends to be costly due to the adoption of a secondary infrastructure necessary to handle IT systems operation in case of a disaster. In this way, some papers have discussed the adoption of warm-standby and cold-standby models for this strategy as a way to reduce the cost [35,39,70,74]. Considering the 545 three site types, the hot-standby tends to have a higher cost than the warm and cold-standby, but it can present better results in case of a disaster [74]. ...
... Stochastic Petri nets (SPNs) and Markov chains were approaches used in three primary studies each. Both of these approaches are state-based models 680 widely employed to evaluate different types of complex systems, including IT infrastructures [34,39]. In the primary studies, these approaches were adopted to propose models that evaluate DR solutions regarding RTO, RPO, availability, performability, and costs. ...
Article
Context: Organizations are spending an unprecedented amount of money towards the cost of keeping Information Technology (IT) systems operational. Hence, these systems need to be designed using effective fault-tolerant techniques like Disaster Recovery (DR) solutions. Even though research has been done in the DR field, it is necessary to assess the current state of research and practice, to provide practitioners with evidence that enables foster its further development. Objective: This paper has the following goals: to investigate state-of-the-art solutions for DR, as well as to systematically analyze the current published research and identify different strategies available in the literature. Method: A systematic mapping study was conducted, in which 49 studies, dated from 2007 to 2017, were evaluated. Results: Various DR practices are being investigated. The results identified a number of relevant issues, including reasons to adopt DR solutions, strategies used to implement DR solutions, approaches employed to analyze DR solutions, and metrics considered during the analyses of DR solutions. Conclusion: The number of strategies and reasons for adopting DR solutions is overwhelming. Hence, there was a need to provide a consolidated view of the field. Also, the results can help to direct future research efforts in this critical area.
... This section presents a high-level model to represent the proposed architecture. 34 • V p represents a VM finite set assigned to the PM at cloud system start up; ...
Article
Because of the dependence on Internet‐based services, many efforts have been conceived to mitigate the impact of disasters on service provision. In this context, cloud computing has become an interesting alternative for implementing disaster tolerant services due to its resource on‐demand and pay‐as‐you‐go models. This paper proposes a sensitivity analysis approach to assess the parameters that most impact the availability of cloud data centers, taking into account disaster occurrence, hardware and software failures, and disaster recovery mechanisms for cloud systems. The analysis adopts continuous‐time Markov chains, and the results indicate that disaster issues should not be neglected. Hardware failure rate and time for migration of virtual machines (VMs) are the critical factors pointed out for the system modeled in our analysis. Moreover, the location where data centers are placed has a significant impact on system availability, due to time for migrating VMs from a backup server. This paper proposes a sensitivity analysis approach to assess the parameters that most impact the availability of cloud data centers, taking into account disaster occurrence, hardware and software failures, and disaster recovery mechanisms for cloud systems. The analysis adopts continuous‐time Markov chains, and the results indicate that disaster issues should not be neglected. Hardware failure rate and time for migration of virtual machines are the critical factors pointed out for the system modeled in our analysis.
... The proposed environment offers useful features that are not easily found in other tools, such as: more than 25 probability distributions supported in SPN simulation, solution of RBD through simulation, sensitivity analysis of CTMC and RBD models, computation of reliability importance indices, and moment matching of empirical data. The tool's web page [1] presents the papers (e.g., [6]) that adopted Mercury as evaluation engine. ...
... The proposed environment has been adopted by MODCS Research Group. The tool's web [1] page presents the papers (e.g., [6]) that adopted Mercury as evaluation engine. ...
... The proposed environment has been adopted by MODCS Research Group. The tool's web [1] page presents the papers (e.g., [6]) that adopted Mercury as evaluation engine. ...
Presentation
Full-text available
The evaluation of dependability or performance of general systems is not a trivial task. Therefore, the assistance of software tools to obtain the wanted metrics is of utmost importance. This paper introduces the Mercury environment, which is an integrated software that enables creating and evaluating Reliability Block Diagrams, Stochastic Petri Nets, Continuous Time Markov Chains, and Energy Flow Models. Mercury provides graphical user interface for these modeling formalisms and a script language that allows using it through command-line interface and also integration with external applications. The set of features available in the Mercury tool make it helpful for dependability and performance evaluation of various systems in both academy and industry scenarios.