Fig 1 - uploaded by Kostas Magoutis
Content may be subject to copyright.
Experimental testbed and methodology 

Experimental testbed and methodology 

Contexts in source publication

Context 1
... this section we describe our experimental setup and methodology, outlined in Figure 1. We implement three differ- ent cluster (1-, 3-, and 5-node) setups for each of four machine types allocated for experiments from the iMinds Virtual Wall infrastructure [42] through jFed [43]. ...
Context 2
... first setup is a 1-node installation of the data store on a single machine. This setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). ...
Context 3
... setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. ...
Context 4
... first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. ...
Context 5
... experiments are conducted over the following four types of physical machines (denoted as "pcgenX" in Figure 1). ...
Context 6
... A: AMD Opteron 2212 @2.00GHz, 4 cores, 4GB RAM • B: Intel Xeon E5620 @2.40GHz, 16 cores, 12GB RAM • C: Intel Xeon E5645 @2.40GHz, 24 cores, 24GB RAM • D: Intel Xeon E3-1220v3@3.10GHz, 4 cores, 16GB RAM The SPECint CPU2006 score of the machines are 9.34, 19.6, 21.9 and 34.6 respectively. Each of the 1, 3, and 5-node setups were fitted uniformly with machines of the same type (Figure 1). Each machine runs 64-bit Ubuntu Server 14.04 LTS and has a 16GB local disk. ...
Context 7
... this section we describe our experimental setup and methodology, outlined in Figure 1. We implement three differ- ent cluster (1-, 3-, and 5-node) setups for each of four machine types allocated for experiments from the iMinds Virtual Wall infrastructure [42] through jFed [43]. Our workload genera- tor is the Yahoo! Cloud Serving Benchmark (YCSB) [44], a configurable benchmark offering a variety of predefined workloads. We used the latest versions of MongoDB (3.2.9) and YCSB (0.10.0) available at the time of ...
Context 8
... first setup is a 1-node installation of the data store on a single machine. This setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. Each one of the 3 nodes that comprise the data store cluster runs three instances of the data store server. The first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. The YCSB server is hosted on a dedicated machine with sufficient resources to produce the targeted load for each ...
Context 9
... first setup is a 1-node installation of the data store on a single machine. This setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. Each one of the 3 nodes that comprise the data store cluster runs three instances of the data store server. The first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. The YCSB server is hosted on a dedicated machine with sufficient resources to produce the targeted load for each ...
Context 10
... first setup is a 1-node installation of the data store on a single machine. This setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. Each one of the 3 nodes that comprise the data store cluster runs three instances of the data store server. The first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. The YCSB server is hosted on a dedicated machine with sufficient resources to produce the targeted load for each ...
Context 11
... experiments are conducted over the following four types of physical machines (denoted as "pcgenX" in Figure ...
Context 12
... A: AMD Opteron 2212 @2.00GHz, 4 cores, 4GB RAM • B: Intel Xeon E5620 @2.40GHz, 16 cores, 12GB RAM • C: Intel Xeon E5645 @2.40GHz, 24 cores, 24GB RAM • D: Intel Xeon E3-1220v3@3.10GHz, 4 cores, 16GB RAM The SPECint CPU2006 score of the machines are 9.34, 19.6, 21.9 and 34.6 respectively. Each of the 1, 3, and 5-node setups were fitted uniformly with machines of the same type (Figure 1). Each machine runs 64-bit Ubuntu Server 14.04 LTS and has a 16GB local ...

Citations

... The reported results show that depending on the use case scenario, deployment conditions, current workload and database settings any NoSQL database can outperform the others. Other recent related works, such as [11,12,13], investigate the measurement-based performance prediction of NoSQL data stores. However, the studies mentioned above, do not analyse an interdependency between consistency and performance, that is in the very nature of such distributed database systems and do not study how consistency settings affect database latency. ...
... System developers performing predictive modelling and forecasting system performance will need to find equations which theoretically fit their own benchmarking results using a variety of tools and APIs. If needed, more sophisticated regression techniques, like multivariate adaptive regression splines, support vector regression or artificial neural networks can be also applied [12]. ...
Chapter
Full-text available
This experience report analyses performance of the Cassandra NoSQL database and studies the fundamental trade-off between data consistency and delays in distributed data storages. The primary focus is on investigating the interplay between the Cassandra performance (response time) and its consistency settings. The paper reports the results of the read and write performance benchmarking for a replicated Cassandra cluster, deployed in the Amazon EC2 Cloud. We present quantitative results showing how different consistency settings affect the Cassandra performance under different workloads. One of our main findings is that it is possible to minimize Cassandra delays and still guarantee the strong data consistency by optimal coordination of consistency settings for both read and write requests. Our experiments show that (i) strong consistency costs up to 25% of performance and (ii) the best setting for strong consistency depends on the ratio of read and write operations. Finally, we generalize our experience by proposing a benchmarking-based methodology for run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong data consistency under mixed workloads.
... MongoDB supports horizontal data partitioning across nodes and uses B-Trees to index data within shards. Each shard may be replicated for high availability and failure recovery [16]. Chalkiadaki [17] supports a primary-backup scheme using a shadow QoS controller to replicate the state of the primary for high availability in Cassandra [18]. ...
... Karniavoura [16] has proposed a way to predict NoSQL performance in a measurement-based with high accuracy, but this does not guarantee high availability in NoSQL. In reality, the data distribution of some data sets is not uniform, and some have great randomness. ...
Article
Full-text available
The dependability and elasticity of various NoSQL stores in critical application are still worth studying. Currently, the cluster and backup technologies are commonly used for improving NoSQL availability, but these approaches do not consider the availability reduction when NoSQL stores encounter performance bottlenecks. In order to enhance the availability of Riak TS effectively, a resource-aware mechanism is proposed. Firstly, the data table is sampled according to time, the correspondence between time and data is acquired, and the real-time resource consumption is recorded by Prometheus. Based on the sampling results, the polynomial curve fitting algorithm is used to constructing prediction curve. Then the resources required for the upcoming operation are predicted by the time interval in the SQL statement, and the operation is evaluated by comparing with the remaining resources. Using the real hydrological sensor dataset as experimental data, the effectiveness of the mechanism is experimented in two aspects of sensitivity and specificity, respectively. The results show that through the availability enhancement mechanism, the average specificity is 80.55% and the sensitivity is 76.31% which use the initial sampling dataset. As training datasets increase, the specificity increases from 80.55% to 92.42%, and the sensitivity increases from 76.31% to 87.90%. Besides, the availability increases from 40.33% to 89.15% in hydrological application scenarios. Experimental results show that this resource-aware mechanism can effectively prevent potential availability problems and enhance the availability of Riak TS. Moreover, as the number of users and the size of the data collected grow, our method will become more accurate and perfect.
... Regression techniques applied in the domain of distributed storage systems include Linear Regression [94], Classification and Regression Trees (CART) [95], Artificial Neural Networks [96] and others [97]. Some studies analyze the efficiency of a particular regression method on the task of storage performance prediction [98], [99], while others compare the efficiency of different regression techniques [97], [100], [101]. Interpolation [102] is another well-known mathematical technique used to calculate values under previously untested circumstances, using data derived from past, similar ones. ...
Article
Distributed storage systems designed to offer explicit performance quality-of-service (QoS) guarantees must regulate the allocation and use of resources to achieve a user-specified level of service. QoS-driven systems employ decision-making techniques to decide on appropriate actions to take during initial deployment or under variations in workload and/or system configuration. In this survey we cover both traditional approaches to decision-making for explicit performance QoS (control theory, multi-dimensional constrained optimization, policy-based techniques) as well as more recent approaches based on machine-learning, offering a broad perspective to the state-of-the-art in the field. As performance prediction is a central concept in decision-making, we also summarize research on performance prediction techniques used in this context.
... Incre-mental elasticity is an instance of the latter. Recent work [19] proposed a measurement-based prediction approach [29] to achieving data store performance SLA by means of elasticity actions over the Cassandra NoSQL store. Here we improve on this work by reducing the impact of elasticity actions. ...
Article
Full-text available
A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been proposed to effectively manage Big Data set whose volume, velocity and variability are difficult to deal with by using the traditional Relational Database Management Systems. Tradeoffs between consistency, availability, partition tolerance and latency is intrinsic to such systems. Although relations between these properties have been previously identified by the well-known CAP and PACELC theorems in qualitative terms, it is still necessary to quantify how different consistency settings, deployment patterns and other properties affect system performance.This experience report analysis performance of the Cassandra NoSQL database cluster and studies the tradeoff between data consistency guaranties and performance in distributed data storages. The primary focus is on investigating the quantitative interplay between Cassandra response time, throughput and its consistency settings considering different single- and multi-region deployment scenarios. The study uses the YCSB benchmarking framework and reports the results of the read and write performance tests of the three-replicated Cassandra cluster deployed in the Amazon AWS. In this paper, we also put forward a notation which can be used to formally describe distributed deployment of Cassandra cluster and its nodes relative to each other and to a client application. We present quantitative results showing how different consistency settings and deployment patterns affect Cassandra performance under different workloads. In particular, our experiments show that strong consistency costs up to 22 % of performance in case of the centralized Cassandra cluster deployment and can cause a 600 % increase in the read/write requests if Cassandra replicas and its clients are globally distributed across different AWS Regions.
Book
This book constitutes refereed proceedings of the Workshops of the 16th European Dependable Computing Conference, EDCC: Workshop on Articial Intelligence for Railways, AI4RAILS 2020, Worskhop on Dynamic Risk Management for Autonomous Systems, DREAMS 2020, Workshop on Dependable Solutions for Intelligent Electricity Distribution Grids, DSOGRI 2020, Workshop on Software Engineering for Resilient Systems, SERENE 2020, held in September 2020. Due to the COVID-19 pandemic the workshops were held virtually. The 12 full papers and 4 short papers were thoroughly reviewed and selected from 35 submissions. The workshop papers complement the main conference topics by addressing dependability or security issues in specic application domains or by focussing in specialized topics, such as system resilience.