Buffer sizes on practical examples

Artificial intelligence inspired IoT-fog based framework for generating early alerts while train passengers traveling in dangerous states using surveillance videos

Article

Full-text available

Jul 2023
MULTIMED TOOLS APPL

Train Surfing is an extremely dangerous practice that involves riding on the roof of a moving train. Every year a lot of people especially youths lose their life due to this illegal phenomenon. To bring this phenomenon under control the government must book the train surfers before they could even reach the top of the train. To fulfill this, we need artificial intelligence-based real-time monitoring of the trains. In this paper, we present an artificial intelligence-inspired IoT-Fog-based framework for the detection of susceptible ways of people traveling in trains based on surveillance videos. In this study, a framework consisting of feature extraction, feature expression, and assessment criteria for identifying train surfing is proposed. The proposed framework is not constrained by camera angle and includes guidelines for determining unsafe status. The proposed framework can quickly and accurately identify vulnerable passengers during travel and send out early warnings to concerned authorities. The comparative analysis between the proposed framework and other state-of-the-art algorithms shows that it performs better than most of them with a precision score of 95%. The framework would help authorities apprehend the actual culprits and ensure safer rail transport.

An Integrated Hardware/Software Design Methodology for Signal Processing Systems

Article

Full-text available

Dec 2018
J SYST ARCHITECT

This paper presents a new methodology for design and implementation of signal processing systems on system-on-chip (SoC) platforms. The methodology is centered on the use of lightweight application programming interfaces for applying principles of dataflow design at different layers of abstraction. The development processes integrated in our approach are software implementation, hardware implementation, hardware-software co-design, and optimized application mapping. The proposed methodology facilitates development and integration of signal processing hardware and software modules that involve heterogeneous programming languages and platforms. As a demonstration of the proposed design framework, we present a dataflow-based deep neural network (DNN) implementation for vehicle classification that is streamlined for real-time operation on embedded SoC devices. Using the proposed methodology, we apply and integrate a variety of dataflow graph optimizations that are important for efficient mapping of the DNN system into a resource constrained implementation that involves cooperating multicore CPUs and field-programmable gate array subsystems. Through experiments, we demonstrate the flexibility and effectiveness with which different design transformations can be applied and integrated across multiple scales of the targeted computing system.

Buffer Optimization and Dispatching Scheme for Embedded Systems with Behavioral Transparency

Article

Full-text available

Oct 2012
ACM T DES AUTOMAT EL

This article presents a buffer minimization scheme with low dispatching overhead for embedded software processes. To accomplish this, we exploit behavioral transparency in the model of computation. In such a model (e.g., synchronous dataflow), the state of buffer requirements is determined completely by the firing sequence of the actors without requiring functional simulation of the actors. Fine-grained buffer allocation incurs high and code pointer overhead while coarse-grained allocation suffers from memory fragmentation. Instead, we propose a medium-grained, “access-contiguous” buffer allocation scheme that minimizes the total buffer space and pointer overhead. We formulate the buffer allocation problem as 2D tiles that represent the lifetime of the buffers to minimize their memory occupation spatially and temporally. Experimental results show that our scheme uses less data memory than existing techniques by 26% on average, or up to 57% in the best case. Our technique retains code modularity for dynamic configuration and, more importantly, enables many more applications that otherwise would not fit if implemented using previous state-of-the-art techniques.

Buffer Sharing in Rendezvous Programs

Article

Full-text available

Nov 2010
IEEE T COMPUT AID D

Most compilers focus on optimizing performance, often at the expense of memory, but efficient memory use can be just as important in constrained environments such as embedded systems. This paper presents a memory reduction technique for rendezvous communication, which is applied to the deterministic concurrent programming language SHIM. It focuses on reducing memory consumption by sharing communication buffers among tasks. It determines pairs of buffers that can never be in use simultaneously and use a shared region of memory for each pair. The technique produces a static abstraction of a SHIM program's dynamic behavior, which is then analyzed to find buffers that are never occupied simultaneously. Experiments show the technique runs quickly on modest-sized programs and can sometimes reduce memory requirements by half.

Look Into Details: The Benefits of Fine-Grain Streaming Buffer Analysis

Conference Paper

Apr 2010
ACM SIGPLAN NOTICES

Many embedded applications demand processing of a seemingly endless stream of input data in real-time. Productive development of such applications is typically carried out by synthesizing software from high-level specifications, such as data-flow graphs. In this context, we study the problem of inter-actor buffer allocation, which is a critical step during compilation of streaming applications. We argue that fine-grain analysis of buffers' spatio-temporal characteristics, as opposed to conventional live range analysis, enables dramatic improvements in buffer sharing. Improved sharing translates to reduction of the compiled binary memory footprint, which is of prime concern in many embedded systems. We transform the buffer allocation problem to two-dimensional packing using complex polygons. We develop an evolutionary packing algorithm, which readily yields buffer allocations. Experimental results show an average of over 7X and 2X improvement in total buffer size, compared to baseline and conventional live range analysis schemes, respectively.

Performance Analysis and Implementation of Predictable Streaming Applications on Multiprocessor Systems-on-Chip

Article

Jan 2010

Jun Zhu

Exploring trade-offs between performance and resource requirements for synchronous dataflow graphs

Conference Paper

Nov 2009

Synchronous dataflow graphs (SDFGs) are widely used to model streaming applications such as signal processing and multimedia applications. These are often implemented on resource-constrained embedded platforms ranging from PDAs and cell phones to automobile equipment and printing systems. Trade-off analysis between resource usage and performance is critical in the life cycle of those products, from tailoring platforms to target applications at design time to resource management at runtime. We present a trade-off analysis method for SDFGs based on model-checking techniques and leveraging knowledge from the dataflow domain. We develop results to prune the state space of an SDFG for multi-objective model checking without loosing optimality. To achieve scalability to large state spaces, we combine these pruning techniques with pragmatic heuristics. We evaluate our techniques with two sets of experiments. One set shows we can now do throughput-storage trade-off analysis for shared memory architectures, showing reductions in memory usage of 10-50% compared to existing distributed memory based analysis. A second set of experiments shows how our techniques support design-space exploration for the digital datapath of a professional printer system. Analysis times range from less than a second to at most several minutes.

Buffer sharing in CSP-like programs

Conference Paper

Jul 2009

Most compilers focus on optimizing performance, often at the expense of memory, but efficient memory use can be just as impor- tant in constrained environments such as embedded systems. In this paper, we present a memory reduction technique for the deterministic concurrent programming language SHIM. We focus on reducing memory consumption by sharing buffers among the tasks, which use them to communicate using CSP-style rendezvous. We determine pairs of buffers that can never be in use simultane- ously and use a shared region of memory for each pair. Our technique produces a static abstraction of a SHIM pro- gram's dynamic behavior, which we then analyze to find buffers that can share memory. Experimentally, we find our technique runs quickly on modest-sized programs and often reduces memory re- quirements by half.

Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation

Article

Feb 2009
INTEGRATION

As a solution for dealing with the design complexity of multiprocessor SoC architectures, we present a joint Simulink-SystemC design flow that enables mixed hardware/software refinement and simulation in the early design process. First, we introduce the Simulink combined algorithm/architecture model (CAAM) unifying the algorithm and the abstract target architecture. From the Simulink CAAM, a hardware architecture generator produces architecture models at three different abstract levels, enabling a trade-off between simulation time and accuracy. A multithread code generator produces memory-efficient multithreaded programs to be executed on the architecture models. To show the applicability of the proposed design flow, we present experimental results on two real video applications.

Design Methodology for Embedded Computer Vision Systems

Chapter

Full-text available

Jan 2009

Computer vision has emerged as one of the most popular domains of embedded applications. The applications in this domain are characterized by complex, intensive computations along with very large memory requirements. Parallelization and multiprocessor implementations have become increasingly important for this domain, and various powerful new embedded platforms to support these applications have emerged in recent years. However, the problem of efficient design methodology for optimized implementation of such systems remains vastly unexplored. In this chapter, we look into the main research problems faced in this area and how they vary from other embedded design methodologies in light of key application characteristics in the embedded computer vision domain.We also provide discussion on emerging solutions to these various problems.

Buffer sizes on practical examples

Citations