Figure 2 - uploaded by Daniel Ritter
Content may be subject to copyright.
Basic integration aspects on hardware. 

Basic integration aspects on hardware. 

Source publication
Conference Paper
Full-text available
The growing number of (cloud) applications and devices massively increases the communication rate and volume pushing integration systems to their (throughput) limits. While the usage of modern hardware like Field Programmable Gate Arrays (FPGAs) led to low latency when employed for query and event processing, application integration adds yet unexpl...

Contexts in source publication

Context 1
... Channel on Hardware. The message channels decouple sending and receiving endpoints or processors and denote the com- munication between them. Thereby, the sending endpoint writes data to the channel, while the receiving endpoint reads the data for further processing. Our message channel definition on hardware is depicted in Fig. 2. We use hardware signals and data lines to represent the control and data flow through a message channel. The channels contain a unique identifier as id, the message length as length, and the body as data of 8 bit chunks from the previously defined message over the data line (data(0..7)). To indicate that a message is sent over the channel, we added a message signal as message, which is set to one (i. e., high), when one message is sent -even if there is currently no valid data on the data line. The message signal is zero (i. e., low) only between messages (i. e., the channel is ready to receive another message). For the transport of the data to the subsequent processor we define an enable signal as enable, which is high, when valid data is on the data line and low, when there is no valid data on the data line. The id and length are separate lines, which are constant, when the message line is ...
Context 2
... the FPGA, we define flow control similar to [3], which is ex- clusively used there for the synchronous communication between remote endpoints. For the back-pressure between message proces- sors (i. e., no TCP support), we cannot reject messages atomically, because the stream might already be processed partially. There- fore, we decided for an approach with small FIFO queues in each processor that are used to buffer message data that cannot be im- mediately processed by the subsequent processor and thus ensure that no message data is lost. The receiving processor signals this by setting its readReady to low (cf. Fig. 2). The FIFO queues can be represented on hardware using flip-flops (FF), Block RAM (BRAM) or built-in FIFOs. Since FFs can only store one bit at a time and are very important for the logic of message processors, we chose BRAM. Although BRAM is a limited resources as well, it can be more easily extended by on-board DRAM to buffer larger messages. If the queue limit is exceeded, and the successor processor is not ready yet (i. e., readReady low), the current processor notifies its sender by setting its readReady to ...

Similar publications

Article
Full-text available
Many advances have been made in the field of computer vision. Several recent research trends have focused on mimicking human vision by using a stereo vision system. In multi-camera systems, a calibration process is usually implemented to improve the results accuracy. However, these systems generate a large amount of data to be processed; therefore,...

Citations

... Integration developers and experts provide semantically meaningful pattern realizations in expressive specialised languages such as timed db-nets [6] ( 1 ), but such languages are cumbersome to use for process modelers, and makes automatic identification of improvements difficult. Process modelers on the other hand like higher level languages and notations to specify processes as a composition of integration patterns [3,4,5,11] ( 2 ). However, the composition Figure 1: End-to-end perspective from integration process modeling to verifiable execution semantics and automatic, correctness-preserving improvements (current gaps or missing aspects in red color). ...
... instance scheduling, parallelization [24], ordering, materialization, arguments, algebraic [25] Added n/a 13 expert knowledge business process [26], workflow survey [10,27], data integration [28], distributed applications [29], EAI [4,5,11,30], placement [31,32], resilience [33] Removed -1 classification only [15] Overall 616 23 ...
... Process simplification can be achieved by removing redundant patterns like Redundant Subprocess Removal (e.g., removing one of two identical sub-flows), Combine Sibling Patterns (e.g., removing one of two identical patterns), or Unnecessary Conditional Fork (e.g., removing redundant branching). As far as we know, the only practical study of combining sibling patterns can be found in Ritter et al. [11], showing moderate throughput improvements. The simplifications requires a formalization of patterns as a control graph structure, which helps to identify and deal with the structural changes. ...
Preprint
Full-text available
Enterprise Application Integration deals with the problem of connecting heterogeneous applications, and is the centerpiece of current on-premise, cloud and device integration scenarios. For integration scenarios, structurally correct composition of patterns into processes and improvements of integration processes are crucial. In order to achieve this, we formalize compositions of integration patterns based on their characteristics, and describe optimization strategies that help to reduce the model complexity, and improve the process execution efficiency using design time techniques. Using the formalism of timed DB-nets - a refinement of Petri nets - we model integration logic features such as control- and data flow, transactional data storage, compensation and exception handling, and time aspects that are present in reoccurring solutions as separate integration patterns. We then propose a realization of optimization strategies using graph rewriting, and prove that the optimizations we consider preserve both structural and functional correctness. We evaluate the improvements on a real-world catalog of pattern compositions, containing over 900 integration processes, and illustrate the correctness properties in case studies based on two of these processes.
... Such minimization is achieved by reducing the intensity of data exchanges, especially between remote transmitters and receivers [1]. For example, the data exchanges between the processor and memory during the execution of programs can consume up to 75 % of the energy consumed due to the wire heating and high-current buffer switching [2]. ...
Article
The use of lossless compression in the application specific computers provides such advantages as minimized amount of memory, increased bandwidth of interfaces, reduced energy consumption, and improved self-testing systems. The article discusses known algorithms of lossless compression with the aim of choosing the most suitable one for implementation in a hardware-software decompressor. Among them, the Lempel-Ziv-Welch (LZW) algorithm makes it possible to perform the associative memory of the decompressor dictionary in the simplest way by using the sequential reading the symbols of the decompressed word. The analysis of the existing hardware implementations of the decompressors showed that the main goal in their development was to increase the bandwidth at the expense of increasing hardware costs and limited functionality. It is proposed to implement the LZW decompressor in a hardware module based on a microprocessor core with a specialized instruction set. For this, a processor core with a stack architecture was selected, which is developed by the authors for the tasks of the file grammar analyzing. Additional memory block for the dictionary storing and an input buffer which converts the byte stream of the packed file into a sequence of unpacked codes are added to it. The processor core instruction set is adjusted to both speed up decompression and reduce hardware costs. The decompressor is described by the Very high-speed integral circuit Hardware Description Language and is implemented in a field programable gate array (FPGA). At a clock frequency of up to two hundred megahertz, the average throughput of the decompressor is more than ten megabytes per second. Because of the hardware and software implementation, an LZW decompressor is developed, which has approximately the same hardware costs as that of the hardware decompressor and has a lower bandwidth at the costs of flexibility, multifunctionality, which is provided by the processor core software. In particular, a decompressor of the Graphic Interchange Format files is implemented on the basis of this device in FPGA for the application of dynamic visualization of patterns on the embedded system display
... With the advent of cloud computing, many applications that used to operate on-premises in companies have been offered as cloud services. In this scenario, software ecosystems have become even more heterogeneous by increasing the need for integration amongst them, so that they work synchronously and support business processes However, many of these applications still need to be adapted to operate in a cloud computing context, so they can keep or even improve the same performance they once achieved by running locally (Harman et al., 2013;Linthicum, 2017;Ritter et al., 2017). The main user requirement in the software is the performance. ...
Article
Full-text available
Companies' software ecosystem-composed of local applications and cloud computing services-is made up by the connection of integration platforms and applications. Run-time systems are arguably the most considerable components for integration platforms performance. Our literature review has identified that most integration run-time systems adopt a global pool as configuration for threads. However, it is possible to configure local thread pools to increase the performance of run-time systems. This article brings a comparison between two configurations of thread pools simulating the execution of a real integration problem. Results show that the execution performance through the local pool configuration exceeds the performance through the global pool in high workload scenarios. These results were reviewed by rigorous statistical analysis.
... With the advent of cloud computing, many applications that used to operate on-premises in companies have been offered as cloud services. In this scenario, software ecosystems have become even more heterogeneous by increasing the need for integration amongst them, so that they work synchronously and support business processes However, many of these applications still need to be adapted to operate in a cloud computing context, so they can keep or even improve the same performance they once achieved by running locally (Harman et al., 2013;Linthicum, 2017;Ritter et al., 2017). The main user requirement in the software is the performance. ...
Preprint
Full-text available
Companies' software ecosystem-composed of local applications and cloud computing services-is made up by the connection of integration platforms and applications. Run-time systems are arguably the most considerable components for integration platforms performance. Our literature review has identified that most integration run-time systems adopt a global pool as configuration for threads. However, it is possible to configure local thread pools to increase the performance of run-time systems. This article brings a comparison between two configurations of thread pools simulating the execution of a real integration problem. Results show that the execution performance through the local pool configuration exceeds the performance through the global pool in high workload scenarios. These results were reviewed by rigorous statistical analysis.
... With the advent of cloud computing, many applications that used to operate on-premises in companies have been offered as cloud services. In this scenario, software ecosystems have become even more heterogeneous by increasing the need for integration amongst them, so that they work synchronously and support business processes However, many of these applications still need to be adapted to operate in a cloud computing context, so they can keep or even improve the same performance they once achieved by running locally (Harman et al., 2013;Linthicum, 2017;Ritter et al., 2017). The main user requirement in the software is the performance. ...
... If the database system is network-attached, we choose a SmartNIC. This placement was also found to be efficient in related data processing domains like data-intensive messaging [103,104]. If the database system is only part of a larger architecture and not network-attached, we chose the near-data approach. ...
Preprint
Full-text available
Non-relational database systems (NRDS), such as graph, document, key-value, and wide-column, have gained much attention in various trending (business) application domains like smart logistics, social network analysis, and medical applications, due to their data model variety and scalability. The broad data variety and sheer size of datasets pose unique challenges for the system design and runtime (incl. power consumption). While CPU performance scaling becomes increasingly more difficult, we argue that NRDS can benefit from adding field programmable gate arrays (FPGAs) as accelerators. However, FPGA-accelerated NRDS have not been systematically studied, yet. To facilitate understanding of this emerging domain, we explore the fit of FPGA acceleration for NRDS with a focus on data model variety. We define the term NRDS class as a group of non-relational database systems supporting the same data model. This survey describes and categorizes the inherent differences and non-trivial trade-offs of relevant NRDS classes as well as their commonalities in the context of common design decisions when building such a system with FPGAs. For example, we found in the literature that for key-value stores the FPGA should be placed into the system as a smart network interface card (SmartNIC) to benefit from direct access of the FPGA to the network. However, more complex data models and processing of other classes (e.g., graph and document) commonly require more elaborate near-data or socket accelerator placements where the FPGA respectively has the only or shared access to main memory. Across the different classes, FPGAs can be used as communication layer or for acceleration of operators and data access. We close with open research and engineering challenges to outline the future of FPGA-accelerated NRDS.
... In this section we collect and discuss EAI optimization objectives in the context of classical EAI [21,15] and emerging application integration scenarios [31]. The latter results to new EAI challenges and solutions, which are represented in this work by our studies on "data-aware" message processing solution spaces: dealing with high velocity and increasing message volume through table-centric processing [?,28] and streaming on dataflow (hardware) architectures [29], as well as new message format variety aspects in terms of multimedia integration [33]. Figure 2 a high-level view on the classical system architecture (based on [30]), evolved by new components for multimedia integration (from [33]). ...
... The work on vectorized integration patterns [?,28] illustrates the trade-off of immense message throughput gains, when processing sets of messages in contrast to a reduced overall latency (throughput → Vectorization). Furthermore the message throughput can be increased, through processing messages in multiple parallel sub-processes, e.g., separate hardware resources [29] ( → Parallelization). The message stream [29] and multimedia integration [33] showed decreasing message throughput for inceasing message sizes. ...
... Furthermore the message throughput can be increased, through processing messages in multiple parallel sub-processes, e.g., separate hardware resources [29] ( → Parallelization). The message stream [29] and multimedia integration [33] showed decreasing message throughput for inceasing message sizes. While message indexing reached its limits for increasing multimedia data [33], keeping message sizes smaller helped throughout the experiments ( → Data Reduction). ...
Preprint
Full-text available
The discipline of Enterprise Application Integration (EAI) is the centrepiece of current on-premise, cloud and device integration scenarios. However, the building blocks of integration scenarios, i.e., essentially a composition of Enterprise Integration Patterns (EIPs), are only informally described, and thus their composition takes place in an informal, ad-hoc manner. This leads to several issues including a currently missing optimization of application integration scenarios. In this work, we collect and briefly explain the usage of process optimizations from the literature for integration scenario processes as catalog.
... instance scheduling, parallelization [46], ordering, materialization, arguments, algebraic [19] Added n/a 8 expert knowledge business process [45], workflow survey [27,28], data integration [12], distributed applications [8,9], EAI [35,36] Removed -1 classification only [44] Overall 616 18 integration scenarios. With our approach, we can show that 81% of the original scenarios from 2015 and still up to 52% of the current SAP CPI content from 2017 could be improved through a parallelization of scenario parts. ...
... Process simplification can be achieved by removing redundant patterns like Redundant Subprocess Removal (e.g., remove one of two identical sub-flows), Combine Sibling Patterns (e.g., remove one of two identical patterns), or Unnecessary Conditional Fork (e.g., remove redundant branching). As far as we know, the only practical study of combining sibling patterns can be found in Ritter et al. [36], showing moderate throughput improvements. The simplifications requires a formalization of patterns as a control graph structure (R1), which helps to identify and deal with the structural change representation. ...
... The simplifications requires a formalization of patterns as a control graph structure (R1), which helps to identify and deal with the structural change representation. Previous work targeting process simplification include Böhm et al. [11] and Habib, Anjum and Rana [22], who use Redundant Sub-process Removal [11] +/-+ + -Combine Sibling Patterns [11,22] +/-+ + ( [36]) Unnecessary conditional fork [11,45] (+) + + -OS-2: Data Reduction Early-Filter [11,19,22,31,45] + +/-+/- [36] Early-Mapping [11,19,22] + +/-+/- [36,39] Early-Aggregation [11,19,22] + +/-+/- [39] Claim Check [11,19] + +/---Early-Split [36] + +/-- [36,39] ...
Conference Paper
Full-text available
Enterprise Application Integration is the centerpiece of current on-premise, cloud and device integration scenarios. We describe optimization strategies that help reduce the model complexity, and improve the process execution using design time techniques. In order to achieve this, we formalize compositions of Enterprise Integration Patterns based on their characteristics, and propose a realization of optimization strategies using graph rewriting. The framework is successfully evaluated on a real-world catalog of pattern compositions, containing over 900 integration scenarios.
... Copyrights for thirdparty components of this work must be honored. For all other uses, contact the owner/author(s). of message throughput for route branching as well as the degree of distribution and the resource / energy consumption have been discovered [5,7]. ...
... Similar to the work in related domains (e. g., database management [3], complex event processing [9]), we studied the idea of moving EAI processing to re-configurable hardware (e. g., FPGA), embodying a data flow architecture [1], closer to the network [5]. In this talk we summarize and discuss the resulting challenges and opportunities, e. g., ...
... request-reply [2] vs (a)synch. streaming [5]), non-functional aspects (e. g., security, exception handling, multi-tenancy [6]) and optimization • Message Endpoints: the impact of hardware accelerated EAI on "conventional" process integration endpoints (e. g., business applications). • (Cloud) Operations: the impact on data center (blueprints), cloud architectures and operations (e. g., hardware virtualization) of a shift to re-configurable hardware dataflow architectures in the context of application integration. ...
Conference Paper
In this talk we set the emerging domain of application integration into the context of recent hardware advances.