Experimental testbed and methodology

Source publication

A Measurement-Based Approach to Performance Prediction in NoSQL Systems

Conference Paper

Full-text available

Sep 2017

Context 1

View in full-text

Context 2

View in full-text

Context 3

... setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. ...

View in full-text

Context 4

... first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. ...

View in full-text

Context 5

... experiments are conducted over the following four types of physical machines (denoted as "pcgenX" in Figure 1). ...

View in full-text

Context 6

... A: AMD Opteron 2212 @2.00GHz, 4 cores, 4GB RAM • B: Intel Xeon E5620 @2.40GHz, 16 cores, 12GB RAM • C: Intel Xeon E5645 @2.40GHz, 24 cores, 24GB RAM • D: Intel Xeon E3-1220v3@3.10GHz, 4 cores, 16GB RAM The SPECint CPU2006 score of the machines are 9.34, 19.6, 21.9 and 34.6 respectively. Each of the 1, 3, and 5-node setups were fitted uniformly with machines of the same type (Figure 1). Each machine runs 64-bit Ubuntu Server 14.04 LTS and has a 16GB local disk. ...

View in full-text

Context 7

... this section we describe our experimental setup and methodology, outlined in Figure 1. We implement three differ- ent cluster (1-, 3-, and 5-node) setups for each of four machine types allocated for experiments from the iMinds Virtual Wall infrastructure [42] through jFed [43]. Our workload genera- tor is the Yahoo! Cloud Serving Benchmark (YCSB) [44], a configurable benchmark offering a variety of predefined workloads. We used the latest versions of MongoDB (3.2.9) and YCSB (0.10.0) available at the time of ...

View in full-text

Context 8

... first setup is a 1-node installation of the data store on a single machine. This setup, typically practiced by standard users with minimal requirements, uses node 1 in Figure 1. The second setup employs sharding over 3 nodes (nodes 1- 3 in Figure 1), each hosting a shard (horizontal partition of the YCSB table). Each shard is replicated three times, with replicas placed on different nodes. Each one of the 3 nodes that comprise the data store cluster runs three instances of the data store server. The first one acts as a primary replica for one shard and the other two as secondary replicas of other shards (whose primary is stored on another machine). The third setup consists of 5 nodes (nodes 1-5 in Figure 1) each hosting a shard, using the same replication method (each machine stores one primary and two secondary replicas). These setups allow us to study the scaling behavior of the NoSQL system. The YCSB server is hosted on a dedicated machine with sufficient resources to produce the targeted load for each ...

View in full-text

Context 9

View in full-text

Context 10

View in full-text

Context 11

... experiments are conducted over the following four types of physical machines (denoted as "pcgenX" in Figure ...

View in full-text

Context 12

View in full-text

OSVRT NA NOSQL BAZE PODATAKA – ČETIRI OSNOVNE TEHNOLOGIJE

Article

Full-text available

Sep 2023

Aleksandar Stojanović

Fig 2. Simple Data Structure of a Document Datastore.

Fig 3. Simple Data Structure of a Graph Datastore.

A Qualitative Comparison of NoSQL Data Stores

Article

Full-text available

Jan 2019

NoSQL Products: IT Giants Perspectives

Article

Full-text available

Jan 2017

Discovering of a Conceptual Model from a NoSQL Database

Conference Paper

Full-text available

Jan 2020

NoSQL databases: Critical analysis and comparison

Conference Paper

Full-text available

Oct 2017

Interplaying Cassandra NoSQL Consistency and Performance: A Benchmarking Approach

Chapter

Full-text available

Aug 2020

This experience report analyses performance of the Cassandra NoSQL database and studies the fundamental trade-off between data consistency and delays in distributed data storages. The primary focus is on investigating the interplay between the Cassandra performance (response time) and its consistency settings. The paper reports the results of the read and write performance benchmarking for a replicated Cassandra cluster, deployed in the Amazon EC2 Cloud. We present quantitative results showing how different consistency settings affect the Cassandra performance under different workloads. One of our main findings is that it is possible to minimize Cassandra delays and still guarantee the strong data consistency by optimal coordination of consistency settings for both read and write requests. Our experiments show that (i) strong consistency costs up to 25% of performance and (ii) the best setting for strong consistency depends on the ratio of read and write operations. Finally, we generalize our experience by proposing a benchmarking-based methodology for run-time optimization of consistency settings to achieve the maximum Cassandra performance and still guarantee the strong data consistency under mixed workloads.

Availability Enhancement of Riak TS Using Resource-Aware Mechanism

Article

Full-text available

Jun 2019
MATH PROBL ENG

The dependability and elasticity of various NoSQL stores in critical application are still worth studying. Currently, the cluster and backup technologies are commonly used for improving NoSQL availability, but these approaches do not consider the availability reduction when NoSQL stores encounter performance bottlenecks. In order to enhance the availability of Riak TS effectively, a resource-aware mechanism is proposed. Firstly, the data table is sampled according to time, the correspondence between time and data is acquired, and the real-time resource consumption is recorded by Prometheus. Based on the sampling results, the polynomial curve fitting algorithm is used to constructing prediction curve. Then the resources required for the upcoming operation are predicted by the time interval in the SQL statement, and the operation is evaluated by comparing with the remaining resources. Using the real hydrological sensor dataset as experimental data, the effectiveness of the mechanism is experimented in two aspects of sensitivity and specificity, respectively. The results show that through the availability enhancement mechanism, the average specificity is 80.55% and the sensitivity is 76.31% which use the initial sampling dataset. As training datasets increase, the specificity increases from 80.55% to 92.42%, and the sensitivity increases from 76.31% to 87.90%. Besides, the availability increases from 40.33% to 89.15% in hydrological application scenarios. Experimental results show that this resource-aware mechanism can effectively prevent potential availability problems and enhance the availability of Riak TS. Moreover, as the number of users and the size of the data collected grow, our method will become more accurate and perfect.

Decision-Making Approaches for Performance QoS in Distributed Storage Systems: A Survey

Article

Jan 2019
IEEE T PARALL DISTR

Distributed storage systems designed to offer explicit performance quality-of-service (QoS) guarantees must regulate the allocation and use of resources to achieve a user-specified level of service. QoS-driven systems employ decision-making techniques to decide on appropriate actions to take during initial deployment or under variations in workload and/or system configuration. In this survey we cover both traditional approaches to decision-making for explicit performance QoS (control theory, multi-dimensional constrained optimization, policy-based techniques) as well as more recent approaches based on machine-learning, offering a broad perspective to the state-of-the-art in the field. As performance prediction is a central concept in decision-making, we also summarize research on performance prediction techniques used in this context.

Incremental Elasticity for NoSQL Data Stores

Conference Paper

Sep 2017

NoSQL Database Performance Diagnosis through System Call-level Introspection

Conference Paper

Apr 2022

Performance evaluation of various deployment scenarios of the 3-replicated Cassandra NoSQL cluster on AWS

Article

Full-text available

Nov 2021

A concept of distributed replicated NoSQL data storages Cassandra-like, HBase, MongoDB has been proposed to effectively manage Big Data set whose volume, velocity and variability are difficult to deal with by using the traditional Relational Database Management Systems. Tradeoffs between consistency, availability, partition tolerance and latency is intrinsic to such systems. Although relations between these properties have been previously identified by the well-known CAP and PACELC theorems in qualitative terms, it is still necessary to quantify how different consistency settings, deployment patterns and other properties affect system performance.This experience report analysis performance of the Cassandra NoSQL database cluster and studies the tradeoff between data consistency guaranties and performance in distributed data storages. The primary focus is on investigating the quantitative interplay between Cassandra response time, throughput and its consistency settings considering different single- and multi-region deployment scenarios. The study uses the YCSB benchmarking framework and reports the results of the read and write performance tests of the three-replicated Cassandra cluster deployed in the Amazon AWS. In this paper, we also put forward a notation which can be used to formally describe distributed deployment of Cassandra cluster and its nodes relative to each other and to a client application. We present quantitative results showing how different consistency settings and deployment patterns affect Cassandra performance under different workloads. In particular, our experiments show that strong consistency costs up to 22 % of performance in case of the centralized Cassandra cluster deployment and can cause a 600 % increase in the read/write requests if Cassandra replicas and its clients are globally distributed across different AWS Regions.

Dependable Computing - EDCC 2020 Workshops AI4RAILS, DREAMS, DSOGRI, SERENE 2020, Munich, Germany, September 7, 2020, Proceedings: AI4RAILS, DREAMS, DSOGRI, SERENE 2020, Munich, Germany, September 7, 2020, Proceedings

Book

Sep 2020

This book constitutes refereed proceedings of the Workshops of the 16th European Dependable Computing Conference, EDCC: Workshop on Articial Intelligence for Railways, AI4RAILS 2020, Worskhop on Dynamic Risk Management for Autonomous Systems, DREAMS 2020, Workshop on Dependable Solutions for Intelligent Electricity Distribution Grids, DSOGRI 2020, Workshop on Software Engineering for Resilient Systems, SERENE 2020, held in September 2020. Due to the COVID-19 pandemic the workshops were held virtually. The 12 full papers and 4 short papers were thoroughly reviewed and selected from 35 submissions. The workshop papers complement the main conference topics by addressing dependability or security issues in specic application domains or by focussing in specialized topics, such as system resilience.

On the Impact of log Compaction on Incrementally Checkpointing Stateful Stream-Processing Operators

Conference Paper

Oct 2019

Improving NoSQL's Performance Metrics via Machine Learning

Conference Paper

Sep 2019

Resource-Aware Availability Enhancement Mechanism of Riak TS

Conference Paper

Aug 2019

Experimental testbed and methodology

Contexts in source publication

Similar publications

Citations