Figure 9 - uploaded by Donald E. Thomas
Content may be subject to copyright.
A multi-threaded custom co-processor.

A multi-threaded custom co-processor.

Source publication
Article
Full-text available
Over the past several years there has been a great deal of interest in the design of mixed hardware/software systems, sometimes referred to as hardware/software co-design or hardware/software co-synthesis. However, although many new design methodologies have taken the name hardware/software co-design, they often do not seem to share much in common...

Context in source publication

Context 1
... slight generalization of the custom co-processor arrangement is one in which the custom co-processor is understood to comprise more than one controller and data path and, consequently, is able to implement concurrent threads of control. Figure 9 shows the hardware/software boundary for multi-threaded co-processor sys- tems. In this case the hardware/software partitioning problem is further complicated by the opportunity to exploit parallelism both between hardware and software components and among hardware components. ...

Similar publications

Article
Full-text available
As one of the most promising energy-efficient emerging paradigms for designing digital systems, approximate computing has attracted a significant attention in recent years. Applications utilizing approximate computing (AxC) can tolerate some loss of quality in the computed results for attaining high performance. Approximate arithmetic circuits have...

Citations

... PL-based implantation significantly improves the execution speed at the cost of logical resource consumption. Thus, the system functions are modularized according to the performance requirements, implementation costs, modifiability, and computation for binocular stereo vision [1]. The system architecture (Fig. 7) includes the PS (a dual-core ARM A9 processor) and PL (a complete data stream processing link). ...
Article
Full-text available
Binocular stereo vision is a commonly applied computer vision technique with a wide range of applications in 3D scene perception. However, binocular stereo matching algorithms are computationally intensive and complicated. In addition, some traditional platforms are unable to meet the real-time and energy efficient dual requirements. In this paper, we proposed a hardware/software co-design FPGA (Field Programmable Gate Array) approach to overcome these limitations. Based on the characteristics of binocular stereo vision, we modularize the system functions to achieve the hardware/software partitioning. This accelerates the data processing on the FPGA, while simultaneously performing data control on the ARM (Advanced RISC Machine) cores. The parallelism of the FPGA allows for a full-pipeline design that is synchronized with an identical system clock for the simultaneous running of multiple stereo processing components, thus improving the processing speed. Furthermore, to minimize hardware costs, the collected images and data are compressed prior to matching, while the precision is subsequently enhanced during post-processing. The proposed system was evaluated on the PYNQ-Z2 development board, with experimental results revealing its high real-time performance and low power consumption for a 100M clock frequency. Compared with existing designs, the simple yet flexible system demonstrated a higher image processing speed and less hardware resource overhead (thus lower power consumption). The average error rate of the BM matching algorithm was also improved, particularly with the limited PYNQ-Z2 hardware resource. The proposed system has been opened on GitHub.
... To do so, system specification is partitioned in hardware and software parts for concurrent job. After that, each part is implemented and then integrated for co-simulation[4][5]. Co-modeling methodology is analogous to HW/SW co-design and the process is shown by Figure1.Figure 1. Co-modeling methodology using UML and DEVS[3] At first, we design the simulator architecture from requirements and specification for a system to be simulated. ...
Conference Paper
Full-text available
Modeling and simulation(M&S) engineering is one of the most challenging areas that have to deal with problems from multiple domains. Hence, in the M&S field, the various domain experts and the M&S experts often work together to build a simulator. Yet, in some domains like military, cooperation has been limited because of the security policies in domains. Therefore, the domain experts in such fields are required to have M&S knowledge on the top of their domain knowledge to build simulation models by themselves. This paper describes our experience of developing a simple warship simulator and assisting such domain experts to obtain the M&S knowledge using Warship Simulator Project. From the experience of the project, we found that the DEVS formalism is easy to learn, and it is suitable for developing a simulator easily with implementation of DEVS formalism.
... 349]. Accordingly hardware/software co-design is the result of trying to find, through the integration of hardware and software design techniques, a unified system design methodology [ADA96]. Advantages of having a single methodology includes shorter development times and an environment which facilitates evaluation of tradeoffs between implementing a function in hardware or software [ADA96]. ...
... Accordingly hardware/software co-design is the result of trying to find, through the integration of hardware and software design techniques, a unified system design methodology [ADA96]. Advantages of having a single methodology includes shorter development times and an environment which facilitates evaluation of tradeoffs between implementing a function in hardware or software [ADA96]. At the same time it is important to realize the scope of hardware/software co-design; it often requires knowledge of a number of different subjects such as computer architectures, embedded systems, real-time systems, etc. [WOL03]. ...
... Due to the varying nature of the problems, where different problems might require different approaches, a large number of different methodologies have been generated. In order to be able to compare the different methodologies, Adams and Thomas (1996, [ADA96]) suggest examining how they handle different aspects of hardware/software co-design. These aspects are described briefly in the list below. ...
... To do so, system specification is partitioned in hardware and software parts for concurrent job. After that, each part is implemented and then integrated for co-simulation [6][7]. Co-modeling methodology is analogous to HW/SW co-design and the process is shown byFigure 2. At first, we design the simulator architecture from requirements and specification for a system to be simulated. ...
Conference Paper
In specific domain such as wargame, simulator developers may not well understand domain knowledge, which domain experts know. In such a case, the developers may leave detail domain knowledge within simulation models as a black box which is filled by domain experts. Thus, a simulator can be synthesized by filling the black box with algorithms for domain specific objects. This paper proposes a methodology for automatic synthesis of wargame simulators which are developed by the "DEVS (discrete event systems specification)" framework. For the synthesis the co-modeling methodology is employed in the specification and implementation of discrete event models.
... Synthesis systems that optimize power [17], testability [15], and fault-tolerance [31], [39] have been developed. System level synthesis has become an active research topic [22], [30]. Examples include, hardware/software cosynthesis techniques targeting microcontroller design [20], and hardware/software interface generation techniques [25], [29]. ...
Article
Task preemption is a critical enabling mechanism in multitask very large scale integration (VLSI) systems. On preemption, data in the register files must be preserved for the task to be resumed. This entails extra memory to preserve the context and additional clock cycles to save and restore the context. In this paper, techniques and algorithms to incorporate micropreemption constraints during multitask VLSI system synthesis are presented. Specifically, algorithms to insert and refine preemption points in scheduled task graphs subject to preemption latency constraints, techniques to minimize the context switch overhead by considering the dedicated registers required to save the state of a task on preemption and the shared registers required to save the remaining values in the tasks, and a controller-based scheme to preclude the preemption-related performance degradation by: 1) partitioning the states of a task into critical sections; 2) executing the critical sections atomically; and 3) preserving atomicity by rolling forward to the end of the critical sections on preemption have been developed. The effectiveness of all approaches, algorithms, and software implementations is demonstrated on real examples. Validation of all the results is complete in the sense that functional simulation is conducted to complete layout implementation.
... An approach that generates new logic capabilities for a processor dynamically has been developed for an adaptive machine architecture in [12]. The work in [13] places ASIP synthesis in the context of hardware-software cosynthesis. It argues that, since the customized instructions added to an existing instruction set are implemented in hardware, whereas the original instructions are run on the basic processor core, ASIP synthesis is a variant of hardware-software partitioning. ...
Article
Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.
... If the specification (or the IDR) is too operational, i.e. influenced by the current technology, it will bias the design towards an architecture which might not be favourable to solve in the best way the problem formulated. The outcome of the synthesis process is a final implementation of the embedded system , i.e. a mixed hardware/software system [1, 38] serving to fulfil the specification requirements . This consists of a combination between standard and custom hardware. ...
Article
Full-text available
FACULTY OF ENGINEERING ELECTRONICS AND COMPUTER SCIENCE DEPARTMENT MPhil/PhD Transfer Report Mixed Control/Data-Flow Representation for Modelling and Verification of Embedded Systems by Mauricio Varea Embedded system design issues become critical as implementation technologies evolve. The interaction between the control and data flow of an embedded system specification is an important consideration and, in order to cope with this aspect, a new internal design representation called Dual Flow Net (DFN) is introduced and further analysed in this thesis. One of the key features of this internal representation is its tight control and data flow interaction, which is achieved by means of two new concepts. Firstly, the structure of the new DFN model is formulated employing a tripartite graph as basis, which turns out to be advantageous for modelling heterogeneous systems. Secondly, a complex domain marking scheme is used to describe the behaviour of the system, leading to better results in terms of modelling the dynamics of the embedded system specification. Structural definitions, behavioural rules and graphical representation of the new DFN model is presented in this work.
... This paper first discusses briefly the current design methodologies and in particular the specify-explore-refine paradigm. Presented here is our methodology of the system-level methodology based on the specify-explore-refine paradigm [1]. A analog/digital embedded subsystem application was designed using this methodology. ...
... This system-level methodology is strong in hardware/software codesign, cost and performance techniques. Additional citations of design methodologies on hardware/software codesign are in [1] [2] [5] [6] [11]. However, none of these consider analog as part of the design methodology or the ability to identify cores in particular analog cores in the system design exploration process. ...
Conference Paper
Full-text available
With the growth of System on a Chip (SoC), the functionality of analog components must also be considered in the design process. This paper describes some of the design implementation partitioning issues and experiences using analog and digital techniques for embedded systems. To achieve a quick turn around for new embedded system development, a design methodology was extended for analog codesign based on the specify-explore-refine paradigm and system-level design methodology. Many system-level issues were addressed including hardware/software codesign trade-offs
... The scene of hardware/software codesign has introduced a number of hardware/software partitioning approaches to speed-up performance, to optimize hardware/software trade-offs, and to reduce total design time [6], [3], [II], [4], [12], [8], [7], [1] among others. The introduced approaches perform their techniques on different partitioning granularities , ranging from fine-grain [6], over medium-grain [3], [11], to coarse-grain [12], [7] granularities. ...
Conference Paper
Full-text available
In this contribution we present a new system-level hardware/software partitioning approach (HiPART) which is run in the frame of an integrated hardware software design methodology for embedded system design. The benefits of the approach result from an hierarchical partitioning algorithm, consisting of three phases of constructive and iterative methods. The main advantage of the system is a freely selectable degree of user interaction and manual partitioning. A permanent observation of timing constraint violations during partitioning guarantees the applicability for real-time systems
... Then, the testability, roughly defined as the testing effort, is related to the number of test vectors needed to test the functionality of the system. In [13], the authors present a tutorial which describes a set of criteria that can be used to HW/SW partitioning, such as performance requirements, implementation costs, modifiability, nature of computation, concurrency, and communication. However, none of above strategies suggest how to optimize system design towards reliability. ...
Article
This work presents an innovative approach for system reliability verification based on an adaptation of the weak mutation analysis technique. This technique was originally proposed for software testing by means of verifying the adequacy of a test vectors set for a given program. We also present a case study in order to illustrate the proposed approach. Resumo Neste trabalho é apresentada uma nova abordagem para verificação da confiabilidade de um sistema, baseada na técnica de análise de mutantes. Esta técnica foi originariamente proposta para teste de software por intermédio da verificação da adequação de um conjunto de vetores de teste para um determinado programa. Com o objetivo de ilustrar a abordagem proposta é apresentado um estudo de caso.