Response time of RDBMS + Data Cache, CAIDC, RDBMS + NoSQL and RDBMS +...

Challenges and Solutions for Processing Real-Time Big Data Stream: A Systematic Literature Review

Article

Full-text available

Jun 2020

Contribution: Recently, real-time data warehousing (DWH) and big data streaming have become ubiquitous due to the fact that a number of business organizations are gearing up to gain competitive advantage. The capability of organizing big data in efficient manner to reach a business decision empowers data warehousing in terms of real-time stream processing. A systematic literature review for real-time stream processing systems is presented in this paper which rigorously look at the recent developments and challenges of real-time stream processing systems and can serve as a guide for the implementation of real-time stream processing framework for all shapes of data streams. Background: Published surveys and reviews either cover papers focusing on stream analysis in applications other than real-time DWH or focusing on extraction, transformation, loading (ETL) challenges for traditional DWH. This systematic review attempts to answer four specific research questions. Research Questions: 1)Which are the relevant publication channels for real-time stream processing research? 2) Which challenges have been faced during implementation of real-time stream processing? 3) Which approaches/tools have been reported to address challenges introduced at ETL stage while processing real-time stream for real-time DWH? 4) What evidence have been reported while addressing different challenges for processing real-time stream? Methodology: A systematic literature was conducted to compile studies related to publication channels targeting real-time stream processing/joins challenges and developments. Following a formal protocol, semi-automatic and manual searches were performed for work from 2011 to 2020 excluding research in traditional data warehousing. Of 679,547 papers selected for data extraction, 74 were retained after quality assessment. Findings: This systematic literature highlights implementation challenges along with developed approaches for real-time DWH and big data stream processing systems and provides their comparisons. This study found that there exists various algorithms for implementing real-time join processing at ETL stage for structured data whereas less work for un-structured data is found in this subject matter.

Разработка концепции миграции данных между реляционными и нереляционными системами БДDevelopment of the concept of data migration between relational and non-relational database systems

Article

Feb 2019

A Dependable Time Series Analytic Framework for Cyber-Physical Systems of IoT-based Smart Grid

Article

Aug 2018

With the emergence of cyber-physical systems (CPS), we are now at the brink of next computing revolution. The Smart Grid (SG) built on top of IoT (Internet of Things) is one of the foundations of this CPS revolution, which involves a large number of smart objects connected by networks. The volume of time series of SG equipment is tremendous and the raw time series are very likely to contain missing values because of undependable network transferring. The problem of storing a tremendous volume of raw time series thereby providing a solid support for precise time series analytics now becomes tricky. In this article, we propose a dependable time series analytics (DTSA) framework for IoT-based SG. Our proposed DTSA framework is capable of providing a dependable data transforming from CPS to the target database with an extraction engine to preliminary refining raw data and further cleansing the data with a correction engine built on top of a sensor-network-regularization-based matrix factorization method. The experimental results reveal that our proposed DTSA framework is capable of effectively increasing the dependability of raw time series transforming between CPS and the target database system through the online lightweight extraction engine and the offline correction engine. Our proposed DTSA framework would be useful for other industrial big data practices.

Dynamic multi-variant relational scheme-based intelligent ETL framework for healthcare management

Article

Full-text available

Mar 2022
SOFT COMPUT

The growth of information technology has opened the gate for the organizations to maintain their data in various forms and at various volumes. This increases the volume and dimension of data being maintained. However, they store their data in their data servers or in cloud environment. Such data have been used to generate various intelligence to support various problems. To support such analysis process, different data have been used and the big data comes to play in this part. Optimizing techniques to help improve the process of ETL could greatly help in real-time analysis of data. ETL optimization could be achieved through several factors simplest being increasing the frequency of the process. Other ways to achieve optimization are through the use of various architectures, programming models, intelligence in transformation and security. To improve the performance of ETL, an efficient dynamic multi-variant relational intelligent ETL framework has been presented in this article. The distributed approach maintains various ontology’s and data dictionaries which have been dynamically updated by different threads of ETL process. Initially, the process is start by applying the extraction process which extracts the data from different sources and finds set of dimensions and their characteristics. Such data extracted have been verified over the data dictionary. Further, the relational score has been measured for each data source with the existing one. Similarly, the method computes the value of multi-variant relational similarity (MVRS) for the data obtained from a single source. This will be performed by different threads of ETL process. According to the value of MVRS, the method performs map reduce and merging of data. According to the value of MVRS the method selects the data node and merges the data to store in the data warehouse. The threads of ETL are capable of reading the changes in data dictionaries and ontology’s to iterate the process of transformation and loading. The method improves the performance of ETL with least time complexity and higher performance.

ROLAP DW transformation proposal for OLAP architecture in NoSQL database

Conference Paper

Nov 2020

RSCVC: Row-based semantic cache with incremental versioning consistency

Article

Mar 2020
CONCURR COMP-PRACT E

In the mobile computing environment, how to make the data access more efficient is a challenge due to the narrow communication bandwidth, the frequent disconnections of network, and the limited resources. Therefore, it is necessary to cache data on the client side. Besides, a good cache consistency method is essential to ensure the correctness. In this article, a row‐based semantic cache with incremental versioning consistency (RSCVC) is proposed. In RSCVC, we designed a semantic cache algorithm, a query trimming and optimizing algorithm, and a version‐based consistency strategy. This RSCVC cache mainly has two advantages. On one hand, it can obviously improve the response time of query and the hit ratio of the cache. On the other hand, the version‐based consistency enhances the stability of the system especially in high‐concurrency situations. Experiments demonstrate the efficacy of our proposed method and its superiority to state‐of‐the‐art methods.

Toward an Efficient Cache Management Framework

Conference Paper

Oct 2018

Distributed cache memory data migration strategy based on cloud computing

Article

Oct 2018
CONCURR COMP-PRACT E

In view of the huge scale of cloud computing, cloud computing systems face many problems in terms of scalability and high availability. In view of the distributed data storage in the cloud computing environment, a recursion based N boundary network model is constructed. A data management model is given on the basis of the data center network structure. A replica distribution strategy and a copy selection strategy are designed to improve the data availability and load balance. On the premise of guaranteeing data availability, a data migration algorithm based on coverage set is proposed, which uses node selection strategy to reduce the migration cost as much as possible. Data migration makes more machines dormant and reduces energy consumption. By comparing and analyzing the experimental data, the correctness and effectiveness of the proposed network topology, data management model, and data migration technology are verified. In the edge cloud computing architecture, in order to ensure the response time of the cloud computing service and minimize the data redundancy in the system, this paper proposes and analyzes the data migration strategy and the pre stored data migration strategy for the network transmission, and gives the suitable application scene. On this basis, in order to control the balance of access network transmission and storage flexibly, a data migration strategy based on network performance is proposed, which ensures the real‐time response of the service.

Optimization of stream-based live data migration strategy in the cloud

Article

Aug 2017
CONCURR COMP-PRACT E

Live data migration in the cloud is responsible to migrate blocks of data from one emigration node to several immigration nodes. However, live data migration strategy is a NP-hard problem like task scheduling. Recently, in-stream processing is a new technique to process large-scale data nearly instantaneously. This framework works fast that all decisions are made without a continuous stream of events. In this paper, we explore a real-time live data migration strategy with stream processing paradigm. First, the nonlinear migration cost model and balance model are introduced as the metrics to evaluate the data migration strategy. Subsequently, a live data migration strategy with particle swarm optimization (PSO) is proposed. Two improvement measures called loop context and particle grouping are proposed. As an improvement of stream processing framework, nested loop context structure is a feedback to support iterative optimization algorithm. As an improvement of PSO, grouping particles before in-stream processing are to speed up the convergence rate of PSO. Afterwards, we rebuild stream processing framework to implement these methods. The experimental results show the best performance of our method.

Segment access-aware dynamic semantic cache in cloud computing environment

Article

May 2017
J PARALLEL DISTR COM

In recent years, researches focus on addressing the query bottleneck issue using semantic cache. However, the challenges of this method are how to increase cache hit ratio, decrease the query processing time, and address cache consistency issue. In this paper, we construct segment access-aware dynamic semantic cache for relational databases. Some definitions of semantic segment, probe query, and remainder query are proposed to describe the semantic cache. Then, estimation of the query result is proposed. Next, cache access algorithm of our proposed segment access-aware dynamic semantic cache is presented in case of cache exact hit, cache extended hit, cache partial hit and cache miss. Cache item with effective lifecycle tag is proposed to address cache consistency issue. Finally, experimental results show that this approach performs better than regular semantic cache and decisional semantic cache.

Response time of RDBMS + Data Cache, CAIDC, RDBMS + NoSQL and RDBMS + Query Cache.

Citations