Memory Architecture for Optimized HYBRIDJOIN.

Memory Architecture for Optimized HYBRIDJOIN.

Source publication
Article
Full-text available
As rapid decision making in business organizations gain in popularity, the complexity and adaptability of extract, transform, and load (ETL) process of near real-time data warehousing has dramatically increased. The most important part of near real-time data warehouse is to feed new data from different data sources on near-real-time basis. However,...

Similar publications

Conference Paper
Full-text available
Extract, Transform, and Load (ETL) pipelines are widely used to ingest data into Enterprise Data Warehouse (EDW) systems. These pipelines can be very complex and often tightly coupled to a given EDW, making it challenging to upgrade from a legacy EDW to a Cloud Data Warehouse (CDW). This paper presents a novel solution for a transparent and fully-a...

Citations

... The ETL process is needed to cleanse data, eliminate null values, replace the missing attributes, etc. In the ETL process, before loading the data into the data warehouse, at the transform phase, data must be appropriately handled by eliminating irrelevant data columns, reduction in repeated data available in the database and collecting data are in various formats; therefore, the normalization process is needed [2]. ...
... The big data has coincided with enormous Page 2 of 12 Dinesh and Devi Journal of Cloud Computing (2024) 13:12 different features. In work [2], the author has addressed ETL for big data cost-effectively using a data aggregation model. The cloud-ETL model is used to handle big data. ...
... We proposed efficient hybrid optimization of the transformation process in the cloudbased architecture of the data warehouse for data maintenance. The first two processes of extract and transform have been widely concentrated [2], but loading the complex data without redundancy and handling dimensionality reduction in different data formats is a challenging research problem. In the transform phase, this paper proposed two things: reducing original data size by high dimensionality reduction of data using the swarm intelligence algorithm of the grey-wolf optimizer. ...
... The ETL process is needed to cleanse data, eliminate null values, replace the missing attributes, etc. In the ETL process, before loading the data into the data warehouse, at the transform phase, data must be appropriately handled by eliminating irrelevant data columns, reduction in repeated data available in the database and collecting data are in various formats; therefore, the normalization process is needed [2]. ...
... The big data has coincided with enormous Page 2 of 12 Dinesh and Devi Journal of Cloud Computing (2024) 13:12 different features. In work [2], the author has addressed ETL for big data cost-effectively using a data aggregation model. The cloud-ETL model is used to handle big data. ...
... We proposed efficient hybrid optimization of the transformation process in the cloudbased architecture of the data warehouse for data maintenance. The first two processes of extract and transform have been widely concentrated [2], but loading the complex data without redundancy and handling dimensionality reduction in different data formats is a challenging research problem. In the transform phase, this paper proposed two things: reducing original data size by high dimensionality reduction of data using the swarm intelligence algorithm of the grey-wolf optimizer. ...
Article
Full-text available
In big data, analysis data is collected from different sources in various formats, transforming into the aspect of cleansing the data, customization, and loading it into a Data Warehouse. Extracting data in other formats and transforming it to the required format requires transformation algorithms. This transformation stage has redundancy issues and is stored across any location in the data warehouse, which increases computation costs. The main issues in big data ETL are handling high-dimensional data and maintaining similar data for effective data warehouse usage. Therefore, Extract, Transform, Load (ETL) plays a vital role in extracting meaningful information from the data warehouse and trying to retain the users. This paper proposes hybrid optimization of Swarm Intelligence with a tabu search algorithm for handling big data in a cloud-based architecture-based ETL process. This proposed work overcomes many issues related to complex data storage and retrieval in the data warehouse. Swarm Intelligence algorithms can overcome problems like high dimensional data, dynamical change of huge data and cost optimization in the transformation stage. In this work for the swarm intelligence algorithm, a Grey-Wolf Optimizer (GWO) is implemented to reduce the high dimensionality of data. Tabu Search (TS) is used for clustering the relevant data as a group. Clustering means the segregation of relevant data accurately from the data warehouse. The cluster size in the ETL process can be optimized by the proposed work of (GWO-TS). Therefore, the huge data in the warehouse can be processed within an expected latency.
... Apache Hadoop is an open source Java framework that is used to run applications in a Big Data environment [20]. Hadoop is usually used in environments that provide storage and computing in a distributed manner to clusters of interconnected computers/nodes (parallel computing) with large data sizes. ...
... In Aziz et al. (2021), the authors proposed two new optimized algorithms namely Parallel-Hybrid Join (P-HYBRIDJOIN) and Hybrid Join with Queue and Stack (QaS-HYBRIDJOIN). Both of these algorithms were extensions of the existing HYBRIDJOIN algorithm. ...
Article
Full-text available
Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper, we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of R in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.
... Optimize supply chain management applying active RFID technology ( Nguyen et al., 2021 ;Yang et al., 2021 ) Improved Product Life cycle ( Guo et al., 2019 ). RFID-based production and distribution management systems for the home appliance industry ( Gonzalez et al., 2006 ;Aziz et al., 2021 ). An intelligent system for production resources planning ( Liu, 2021 ) Optimizing supply chain waste management through the use of RFID technology ( Reyes et al., 2020 ;Reyes et al., 2021 ). ...
Article
Full-text available
Supply Chain processes are continuously marred by myriad factors including varying demands, changing routes, major disruptions, and compliance issues. Therefore, supply chains require monitoring and ongoing optimization. Data science uses real-time data to provide analytical insights, leading to automation and improved decision making. RFID is an ideal technology to source big data, particularly in supply chains, because RFID tags are consumed across supply chain process, which includes scanning raw materials, completing products, transporting goods, and storing products, with accuracy and speed. This study carries out a systematic literature review of research articles published during the timeline (2000-2021) that discuss the role of RFID technology in developing decision support systems that optimize supply chains in light of Industry 4.0. Furthermore, the study offers recommendations on operational efficiency of supply chains while reducing the costs of implementing the RFID technology. The core contribution of this paper is its analysis and evaluation of various RFID implementation methods in supply chains with the aim of saving time effectively and achieving cost efficiencies.
... However, data dispersion and heterogeneity make interrogation necessary but difficult, especially if the data is non-indexed. User studies have shown that high user latency drives customers away [8]. Similarly, queries or calls to stored procedures/user-defined functions are typically called numerous times in a relational model, either within a loop in an application program or from the WHERE/SELECT clause of the outer query [9]. ...
Article
Full-text available
The increasing demand for the simultaneous transaction and review of the data for either decision making or forecasting has created a need for faster and better Hybrid Transactional/Analytical Processing (HTAP). This paper emphasizes the speedup of Online Analytical Processing (OLAP) operations in an HTAP environment where analytical queries are mainly repetitive and contain non-indexed keys as their predicates. Zone maps and materialized views are popular approaches adopted by more extensive databases to address this issue. However, they are absent in in-memory databases because of space constraints. Instead, in-memory databases load the cache with result pages of frequently accessed queries. Increasing the number of such queries can fill the cache and raise the system’s overhead. This paper presents Query_Dictionary, a hybrid storage solution that leverages the full capabilities of SQLite by retaining less information of repetitive queries in the cache and efficiently accommodating the newly updated data by the end-user. The solution proposes storing page-level metadata query information for a larger result set and row-level information for a smaller result set. It demonstrates Query_Dictionary capabilities on three types of representative queries: single table, binary join, and transactional queries on non-indexed attributes. In comparison with SQLite, the proposed method performs better.
... In [16], the authors proposed two new optimized algorithms namely Parallel-Hybrid Join (P-HYBRIDJOIN) and Hybrid Join with Queue and Stack (QaS-HYBRIDJOIN). Both of these algorithms were extensions of the existing HYBRIDJOIN algorithm. ...
Preprint
Full-text available
Semi-stream join is an emerging research problem in the domain of near-real-time data warehousing. A semi-stream join is basically a join between a fast stream (S) and a slow disk-based relation (R). In the modern era of technology, huge amounts of data are being generated swiftly on a daily basis which needs to be instantly analyzed for making successful business decisions. Keeping this in mind, a famous algorithm called CACHEJOIN (Cache Join) was proposed. The limitation of the CACHEJOIN algorithm is that it does not deal with the frequently changing trends in a stream data efficiently. To overcome this limitation, in this paper we propose a TinyLFU-CACHEJOIN algorithm, a modified version of the original CACHEJOIN algorithm, which is designed to enhance the performance of a CACHEJOIN algorithm. TinyLFU-CACHEJOIN employs an intelligent strategy which keeps only those records of $R$ in the cache that have a high hit rate in S. This mechanism of TinyLFU-CACHEJOIN allows it to deal with the sudden and abrupt trend changes in S. We developed a cost model for our TinyLFU-CACHEJOIN algorithm and proved it empirically. We also assessed the performance of our proposed TinyLFU-CACHEJOIN algorithm with the existing CACHEJOIN algorithm on a skewed synthetic dataset. The experiments proved that TinyLFU-CACHEJOIN algorithm significantly outperforms the CACHEJOIN algorithm.
... BCenabled AI Intelligent IoT Architecture that offers an effective way to merge BC and AI for IoT with existing stateof-the-art [46] techniques, algorithms, and applications. To reduce the optimization time for those algorithms several hybrid techniques can be used [47] [48]. Hospital records are also a significant contribution without breaching confidentiality. ...
Article
Full-text available
In recent years Fifth Generation (5G) technology is the most recent advancement in a wireless communication network. There is the advent of using the 5G with diverse data structures. The Blockchain (BC) has become an approving adoption for decentralized, peer-to-peer, distributed transparent ledger systems with a diverse data structure. The use of 5G with BC is an emerging trend in communication technology. The elasticity of 5G with BC enables many applications to reciprocity information molds it a fast, transparent, consequential, and safe for transportation of data in this smart era. Green computing (GC) is presently the intense optimistic tactic for the integration of smart technology in a diverse and distributed world of power consumption. This Systematic Mapping Study (SMS) has been analyzed by cautiously elected publications between 2016 and 2020 in well-putative venus. This study analyzed the advanced research on power consumption solutions for BC-based 5G communication, Moreover, a taxonomy of 5G based on green BC and GC in various areas is presented. Furthermore, Green energy renewable communication (GERC) problems are being observed in this research by integrating three discrete technologies such as 5G with green BC and GC also along with smart systems. Lastly, the research gaps had been bestowed to render future directions for the researchers in 5G with green BC and GC as the solution for rechargeable data packets.
... BCenabled AI Intelligent IoT Architecture that offers an effective way to merge BC and AI for IoT with existing stateof-the-art [46] techniques, algorithms, and applications. To reduce the optimization time for those algorithms several hybrid techniques can be used [47] [48]. Hospital records are also a significant contribution without breaching confidentiality. ...
Article
In recent years Fifth Generation (5G) technology is the most recent advancement in a wireless communication network. There is the advent of using the 5G with diverse data structures. The Blockchain (BC) has become an approving adoption for decentralized, peer-to-peer, distributed transparent ledger systems with a diverse data structure. The use of 5G with BC is an emerging trend in communication technology. The elasticity of 5G with BC enables many applications to reciprocity information molds it a fast, transparent, consequential, and safe for transportation of data in this smart era. Green computing (GC) is presently the intense optimistic tactic for the integration of smart technology in a diverse and distributed world of power consumption. This Systematic Mapping Study (SMS) has been analyzed by cautiously elected publications between 2016 and 2020 in well-putative venus. This study analyzed the advanced research on power consumption solutions for BC-based 5G communication, Moreover, a taxonomy of 5G based on green BC and GC in various areas is presented. Furthermore, Green energy renewable communication (GERC) problems are being observed in this research by integrating three discrete technologies such as 5G with green BC and GC also along with smart systems. Lastly, the research gaps had been bestowed to render future directions for the researchers in 5G with green BC and GC as the solution for rechargeable data packets.
... BC-enabled AI Intelligent IoT Architecture that offers an effective way to merge BC and AI for IoT with existing state-of-theart [46] techniques, algorithms, and applications. To reduce the optimization time for those algorithms several hybrid techniques can be used [47], [48]. Hospital records are also a significant contribution without breaching confidentiality. ...