A multi-threaded custom co-processor.

Source publication

The Design of Mixed Hardware/Software Systems

Article

Full-text available

Feb 1970

Over the past several years there has been a great deal of interest in the design of mixed hardware/software systems, sometimes referred to as hardware/software co-design or hardware/software co-synthesis. However, although many new design methodologies have taken the name hardware/software co-design, they often do not seem to share much in common...

Context 1

... slight generalization of the custom co-processor arrangement is one in which the custom co-processor is understood to comprise more than one controller and data path and, consequently, is able to implement concurrent threads of control. Figure 9 shows the hardware/software boundary for multi-threaded co-processor sys- tems. In this case the hardware/software partitioning problem is further complicated by the opportunity to exploit parallelism both between hardware and software components and among hardware components. ...

View in full-text

Fig. 1: (a) The error characteristics and (b) hardware requirement of...

Fig. 3: Clustering result with different multipliers: (a) original, (b)...

Fig. 4: Power consumption analysis of R4ABMs with different approximate...

Fig. 5: Hybrid approximate computing overview.

Fig. 7: The compound modes of approximate computing modules.

A Hardware/Software Co-Design Methodology for Adaptive Approximate Computing in Clustering and ANN Learning

Article

Full-text available

Jan 2021

As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing (AxC) can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have...

Real-time low-power binocular stereo vision based on FPGA

Article

Full-text available

Feb 2022

Binocular stereo vision is a commonly applied computer vision technique with a wide range of applications in 3D scene perception. However, binocular stereo matching algorithms are computationally intensive and complicated. In addition, some traditional platforms are unable to meet the real-time and energy efficient dual requirements. In this paper, we proposed a hardware/software co-design FPGA (Field Programmable Gate Array) approach to overcome these limitations. Based on the characteristics of binocular stereo vision, we modularize the system functions to achieve the hardware/software partitioning. This accelerates the data processing on the FPGA, while simultaneously performing data control on the ARM (Advanced RISC Machine) cores. The parallelism of the FPGA allows for a full-pipeline design that is synchronized with an identical system clock for the simultaneous running of multiple stereo processing components, thus improving the processing speed. Furthermore, to minimize hardware costs, the collected images and data are compressed prior to matching, while the precision is subsequently enhanced during post-processing. The proposed system was evaluated on the PYNQ-Z2 development board, with experimental results revealing its high real-time performance and low power consumption for a 100M clock frequency. Compared with existing designs, the simple yet flexible system demonstrated a higher image processing speed and less hardware resource overhead (thus lower power consumption). The average error rate of the BM matching algorithm was also improved, particularly with the limited PYNQ-Z2 hardware resource. The proposed system has been opened on GitHub.

Benefits and challenges in developing warship simulator based on DEVS formalism

Conference Paper

Full-text available

Apr 2010

Modeling and simulation(M&S) engineering is one of the most challenging areas that have to deal with problems from multiple domains. Hence, in the M&S field, the various domain experts and the M&S experts often work together to build a simulator. Yet, in some domains like military, cooperation has been limited because of the security policies in domains. Therefore, the domain experts in such fields are required to have M&S knowledge on the top of their domain knowledge to build simulation models by themselves. This paper describes our experience of developing a simple warship simulator and assisting such domain experts to obtain the M&S knowledge using Warship Simulator Project. From the experience of the project, we found that the DEVS formalism is easy to learn, and it is suitable for developing a simulator easily with implementation of DEVS formalism.

High Performance FPGA-Based Computation and Simulation for MIMO Measurement and Control Systems

Article

Jan 2009

Johan Palm

Methodology for Automatic Synthesis of Wargame Simulator using DEVS

Conference Paper

Mar 2007

In specific domain such as wargame, simulator developers may not well understand domain knowledge, which domain experts know. In such a case, the developers may leave detail domain knowledge within simulation models as a black box which is filled by domain experts. Thus, a simulator can be synthesized by filling the black box with algorithms for domain specific objects. This paper proposes a methodology for automatic synthesis of wargame simulators which are developed by the "DEVS (discrete event systems specification)" framework. For the synthesis the co-modeling methodology is employed in the specification and implementation of discrete event models.

Micropreemption synthesis: An enabling mechanism for multitask VLSI systems

Article

Feb 2006
IEEE T COMPUT AID D

Task preemption is a critical enabling mechanism in multitask very large scale integration (VLSI) systems. On preemption, data in the register files must be preserved for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, techniques and algorithms to incorporate micropreemption constraints during multitask VLSI system synthesis are presented. Specifically, algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints, techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and a controller-based scheme to preclude the preemption-related performance degradation by: 1) partitioning the states of a task into critical sections; 2) executing the critical sections atomically; and 3) preserving atomicity by rolling forward to the end of the critical sections on preemption have been developed. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples. Validation of all the results is complete in the sense that functional simulation is conducted to complete layout implementation.

Custom-Instruction Synthesis for Extensible-Processor Platforms

Article

Mar 2004
IEEE T COMPUT AID D

Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.

Mixed Control/Data-Flow Representation For Modelling And Verification Of Embedded Systems

Article

Full-text available

Nov 2002

Mauricio Varea

FACULTY OF ENGINEERING ELECTRONICS AND COMPUTER SCIENCE DEPARTMENT MPhil/PhD Transfer Report Mixed Control/Data-Flow Representation for Modelling and Verification of Embedded Systems by Mauricio Varea Embedded system design issues become critical as implementation technologies evolve. The interaction between the control and data flow of an embedded system specification is an important consideration and, in order to cope with this aspect, a new internal design representation called Dual Flow Net (DFN) is introduced and further analysed in this thesis. One of the key features of this internal representation is its tight control and data flow interaction, which is achieved by means of two new concepts. Firstly, the structure of the new DFN model is formulated employing a tripartite graph as basis, which turns out to be advantageous for modelling heterogeneous systems. Secondly, a complex domain marking scheme is used to describe the behaviour of the system, leading to better results in terms of modelling the dynamics of the embedded system specification. Structural definitions, behavioural rules and graphical representation of the new DFN model is presented in this work.

Using codesign techniques to support analog functionality

Conference Paper

Full-text available

Feb 1999

With the growth of System on a Chip (SoC), the functionality of analog components must also be considered in the design process. This paper describes some of the design implementation partitioning issues and experiences using analog and digital techniques for embedded systems. To achieve a quick turn around for new embedded system development, a design methodology was extended for analog codesign based on the specify-explore-refine paradigm and system-level design methodology. Many system-level issues were addressed including hardware/software codesign trade-offs

HiPART: a new hierarchical semi-interactive HW-/SW partitioningapproach with fast debugging for real-time embedded systems

Conference Paper

Full-text available

Apr 1998

In this contribution we present a new system-level hardware/software partitioning approach (HiPART) which is run in the frame of an integrated hardware software design methodology for embedded system design. The benefits of the approach result from an hierarchical partitioning algorithm, consisting of three phases of constructive and iterative methods. The main advantage of the system is a freely selectable degree of user interaction and manual partitioning. A permanent observation of timing constraint violations during partitioning guarantees the applicability for real-time systems

Reliability Verification of Digital Systems Design Based on Mutation Analysis

Article

Jan 1998

This work presents an innovative approach for system reliability verification based on an adaptation of the weak mutation analysis technique. This technique was originally proposed for software testing by means of verifying the adequacy of a test vectors set for a given program. We also present a case study in order to illustrate the proposed approach. Resumo Neste trabalho é apresentada uma nova abordagem para verificação da confiabilidade de um sistema, baseada na técnica de análise de mutantes. Esta técnica foi originariamente proposta para teste de software por intermédio da verificação da adequação de um conjunto de vetores de teste para um determinado programa. Com o objetivo de ilustrar a abordagem proposta é apresentado um estudo de caso.

A multi-threaded custom co-processor.

Context in source publication

Similar publications

Citations