Table 1 - uploaded by Praveen K. Murthy
Content may be subject to copyright.
Buffer sizes on practical examples 

Buffer sizes on practical examples 

Source publication
Article
Full-text available
There has been a proliferation of block-diagram environments for specifying and prototyping digital signal processing (DSP) systems. These include tools from academia such as Ptolemy and commercial tools such as DSPCanvas from Angeles Design Systems, signal processing work system (SPW) from Cadence, and COSSAP from Synopsys. The block diagram langu...

Citations

... For instance, the Viola-Jones technique may struggle to identify facial features that are occluded, turned, or in poor lighting scenarios [48]. Similarly, the HOG method might fall short in recognizing pedestrians who are partly veiled or in complicated surroundings [50,57]. ...
... Implementation I is expressed as a three-tuple operating at the recipient's edge [18,57] (refer to Eq. 10). I id is an application's unique identifier. ...
... This layer is in charge of the fundamental data refining, computation, and processing that results in the fog layer [35]. Fog nodes seek to increase the efficiency of IoT applications; consequently, the fog layer can lower the quantity of data transferred to the cloud layer and shorten the request-response time for IoT applications [54,57]. This is frequently necessary to improve QoS, such as lowering latency and increasing network bandwidth [43]. ...
Article
Full-text available
Train Surfing is an extremely dangerous practice that involves riding on the roof of a moving train. Every year a lot of people especially youths lose their life due to this illegal phenomenon. To bring this phenomenon under control the government must book the train surfers before they could even reach the top of the train. To fulfill this, we need artificial intelligence-based real-time monitoring of the trains. In this paper, we present an artificial intelligence-inspired IoT-Fog-based framework for the detection of susceptible ways of people traveling in trains based on surveillance videos. In this study, a framework consisting of feature extraction, feature expression, and assessment criteria for identifying train surfing is proposed. The proposed framework is not constrained by camera angle and includes guidelines for determining unsafe status. The proposed framework can quickly and accurately identify vulnerable passengers during travel and send out early warnings to concerned authorities. The comparative analysis between the proposed framework and other state-of-the-art algorithms shows that it performs better than most of them with a precision score of 95%. The framework would help authorities apprehend the actual culprits and ensure safer rail transport.
... In particular, we incorporate a new FIFO abstract data type (ADT) implementation in LIDE-C, called shared FIFO , that enables multiple dataflow edges in a graph to be implemented through FIFO ADT instances that share the same region of memory. Such buffer sharing in dataflow implementations has been investigated in different forms for various contexts of automated scheduling and software synthesis (e.g., see [34][35][36] ). In STMCM, we make it easy for the system designer to apply buffer sharing explicitly within her or his implementation rather than depending on its implicit support through the toolset that is used. ...
Article
Full-text available
This paper presents a new methodology for design and implementation of signal processing systems on system-on-chip (SoC) platforms. The methodology is centered on the use of lightweight application programming interfaces for applying principles of dataflow design at different layers of abstraction. The development processes integrated in our approach are software implementation, hardware implementation, hardware-software co-design, and optimized application mapping. The proposed methodology facilitates development and integration of signal processing hardware and software modules that involve heterogeneous programming languages and platforms. As a demonstration of the proposed design framework, we present a dataflow-based deep neural network (DNN) implementation for vehicle classification that is streamlined for real-time operation on embedded SoC devices. Using the proposed methodology, we apply and integrate a variety of dataflow graph optimizations that are important for efficient mapping of the DNN system into a resource constrained implementation that involves cooperating multicore CPUs and field-programmable gate array subsystems. Through experiments, we demonstrate the flexibility and effectiveness with which different design transformations can be applied and integrated across multiple scales of the targeted computing system.
... Several studies attempt to reduce SAS memory by scheduling [Bhattacharyya et al. 1997;?]. Murthy et al [Murthy and Bhattacharyya 2001] minimizes the maximum number of live tokens under the particular buffer lifetime model. Sung [Sung and Ha 2000] and Zitzler [Zitzler et al. 1999] relax the assumption by also considering non-SAS for data memory reduction. ...
... The SDF graphs for the first benchmark suite are shown in Fig. 10. The first is the filter bank [Murthy and Bhattacharyya 2001]. We also have three SDF graphs from our own wireless sensor applications, including (1) transmitting data from a triaxial accelerometer to RF, (2) a wireless receiver, and (3) a wireless data logger to an SD card and displaying on an LCD. ...
... The SDF graphs for the second benchmark suite are shown in Fig. 13. They are taken from the following sources: (a) from [Teich et al. 1998], (b)-(e) from [Murthy and Bhattacharyya 2001], (f) from [Bhattacharyya et al. 1999b], (g,h) from [Murthy and Bhattacharyya 2004], (i) from [Bhattacharyya et al. 1999b], (j,k) from [Liu et al. 2009], (l) from [Bhattacharyya 1999], and (m) from [Bhattacharyya et al. 1999b]. These were chosen because either their schedules were also available or they published buffer optimization results. ...
Article
Full-text available
This article presents a buffer minimization scheme with low dispatching overhead for embedded software processes. To accomplish this, we exploit behavioral transparency in the model of computation. In such a model (e.g., synchronous dataflow), the state of buffer requirements is determined completely by the firing sequence of the actors without requiring functional simulation of the actors. Fine-grained buffer allocation incurs high and code pointer overhead while coarse-grained allocation suffers from memory fragmentation. Instead, we propose a medium-grained, “access-contiguous” buffer allocation scheme that minimizes the total buffer space and pointer overhead. We formulate the buffer allocation problem as 2D tiles that represent the lifetime of the buffers to minimize their memory occupation spatially and temporally. Experimental results show that our scheme uses less data memory than existing techniques by 26% on average, or up to 57% in the best case. Our technique retains code modularity for dynamic configuration and, more importantly, enables many more applications that otherwise would not fit if implemented using previous state-of-the-art techniques.
... The techniques of Murthy et al. [21], [22], [23], [24], Teich et al. [25], and Geilen et al. [26] are closest to ours. They describe several algorithms for merging buffers in signal processing systems that use synchronous data flow models [27]. ...
Article
Full-text available
Most compilers focus on optimizing performance, often at the expense of memory, but efficient memory use can be just as important in constrained environments such as embedded systems. This paper presents a memory reduction technique for rendezvous communication, which is applied to the deterministic concurrent programming language SHIM. It focuses on reducing memory consumption by sharing communication buffers among tasks. It determines pairs of buffers that can never be in use simultaneously and use a shared region of memory for each pair. The technique produces a static abstraction of a SHIM program's dynamic behavior, which is then analyzed to find buffers that are never occupied simultaneously. Experiments show the technique runs quickly on modest-sized programs and can sometimes reduce memory requirements by half.
... The changes to the capacity of a buffer occur on execution of producer and consumer tasks of the corresponding channel. Such temporal change, however, can be captured at different levels of granularity [12]. The highest resolution temporal view of a buffer's storage requirement would need to follow the execution at the granularity of firing individual actors. ...
... In this scheme, buffers are assumed to have maximum capacity throughout their live range in the schedule. Therefore, two buffers would conflict if they are alive in at least one point in time in which case, they cannot share any physical memory location and have to be allocated in distinct memory spaces [12]. For example M T (A B, S) = 6, and under the coarse-grain analysis model six memory cells have to be allocated during its entire life time (three time steps) to implement this buffer. ...
... We report the worst and the best buffer size that we observed in 10 runs. The coarse-grain analysis is done according to the buffer lifetime analysis principle, developed by Murthy and Bhattacharyya [12]. The first fit heuristic is used to allocate the buffers in the shared buffer, under the same SA schedule. ...
Conference Paper
Many embedded applications demand processing of a seemingly endless stream of input data in real-time. Productive development of such applications is typically carried out by synthesizing software from high-level specifications, such as data-flow graphs. In this context, we study the problem of inter-actor buffer allocation, which is a critical step during compilation of streaming applications. We argue that fine-grain analysis of buffers' spatio-temporal characteristics, as opposed to conventional live range analysis, enables dramatic improvements in buffer sharing. Improved sharing translates to reduction of the compiled binary memory footprint, which is of prime concern in many embedded systems. We transform the buffer allocation problem to two-dimensional packing using complex polygons. We develop an evolutionary packing algorithm, which readily yields buffer allocations. Experimental results show an average of over 7X and 2X improvement in total buffer size, compared to baseline and conventional live range analysis schemes, respectively.
... • Theoretical studies have considered memory sharing between different logic buffers in Chapter 5, similar as in [37,43,71,72]. Compared with disjoint partitioning, it shows a great reduction in memory size. ...
... There are many research papers on finding an optimized SDFG schedule subject to one or more criteria [4], [5], [20], [21], [29], [16], [11], [22], [14]. [4] proposes Single Appearance Schedules (SAS), which are specific to single processor platforms and aim to minimize code size. ...
... There are many research papers on finding an optimized SDFG schedule subject to one or more criteria [4], [5], [20], [21], [29], [16], [11], [22], [14]. [4] proposes Single Appearance Schedules (SAS), which are specific to single processor platforms and aim to minimize code size. [5] minimizes buffer size for SAS without buffer sharing. [20], [21], [16] allow sharing memory between channels to reduce the total memory usage. However, SAS are not necessarily optimal when other objectives than code size are to be optimized. For multi-processor platforms, where the schedule length does not necessarily lead to extra code size, non-SAS schedules can be better than SAS. [29] relax ...
Conference Paper
Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10-50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.
... Our algorithm doesTable 2. Effect of threshold on the FIR filter example not add scheduling constraints to the problem: it reduces the total buffer size with affecting the schedule, and thereby not affecting the overall speed. The work of Murthy and Bhattacharyya (2000 Bhattacharyya ( , 2001 Bhattacharyya ( , 2004 Bhattacharyya ( , 2006) and Teich et al. (1998) is closest to ours. They describe several algorithms for merging buffers in signal processing systems that use synchronous data flow models (Lee and Messerschmitt 1987). ...
Conference Paper
Most compilers focus on optimizing performance, often at the expense of memory, but efficient memory use can be just as impor- tant in constrained environments such as embedded systems. In this paper, we present a memory reduction technique for the deterministic concurrent programming language SHIM. We focus on reducing memory consumption by sharing buffers among the tasks, which use them to communicate using CSP-style rendezvous. We determine pairs of buffers that can never be in use simultane- ously and use a shared region of memory for each pair. Our technique produces a static abstraction of a SHIM pro- gram's dynamic behavior, which we then analyze to find buffers that can share memory. Experimentally, we find our technique runs quickly on modest-sized programs and often reduces memory re- quirements by half.
... In generating multithread software code from the Simulink algorithm model, we apply buffer memory optimization techniques, which are enabled by raising the abstraction level from the transaction-level model to the algorithm-level model. Several previous studies addressed buffer sharing [28,40] and scheduling techniques for maximizing buffer sharing [41,42] in software generation from dataflow specification. However, they did not address buffer memory minimization for high-level specification with explicit conditionals; our multithread code generator takes the conditionals into consideration [14]. ...
Article
As a solution for dealing with the design complexity of multiprocessor SoC architectures, we present a joint Simulink-SystemC design flow that enables mixed hardware/software refinement and simulation in the early design process. First, we introduce the Simulink combined algorithm/architecture model (CAAM) unifying the algorithm and the abstract target architecture. From the Simulink CAAM, a hardware architecture generator produces architecture models at three different abstract levels, enabling a trade-off between simulation time and accuracy. A multithread code generator produces memory-efficient multithreaded programs to be executed on the architecture models. To show the applicability of the proposed design flow, we present experimental results on two real video applications.
... Of the various possible scheduling strategies the focus here is on looped schedules [8] and parameterized looped sched- ules [7], [43] , which construct schedules in terms of static and dynamic looping constructs , respectively. These types of schedules combine the advantages of efficient looping facilities in programmable digital signal processors [46] , low complexity storage and manipulation of schedule information [43], [54], and potential for extensive analysis and optimization [54]. Dataflow graph transformation is an effective technique to produce high-performance DSP software as well as hardware/software solutions. ...
... Of the various possible scheduling strategies the focus here is on looped schedules [8] and parameterized looped sched- ules [7], [43] , which construct schedules in terms of static and dynamic looping constructs , respectively. These types of schedules combine the advantages of efficient looping facilities in programmable digital signal processors [46] , low complexity storage and manipulation of schedule information [43], [54], and potential for extensive analysis and optimization [54]. Dataflow graph transformation is an effective technique to produce high-performance DSP software as well as hardware/software solutions. ...
Chapter
Full-text available
Computer vision has emerged as one of the most popular domains of embedded applications. The applications in this domain are characterized by complex, intensive computations along with very large memory requirements. Parallelization and multiprocessor implementations have become increasingly important for this domain, and various powerful new embedded platforms to support these applications have emerged in recent years. However, the problem of efficient design methodology for optimized implementation of such systems remains vastly unexplored. In this chapter, we look into the main research problems faced in this area and how they vary from other embedded design methodologies in light of key application characteristics in the embedded computer vision domain.We also provide discussion on emerging solutions to these various problems.