Figure 1 - uploaded by Michela Milano
Content may be subject to copyright.
Single chip multi-processor architecture. 

Single chip multi-processor architecture. 

Source publication
Article
Full-text available
SOMMARIO/ABSTRACT Multi-Processor Systems-on-Chips (MPSoCs) are becoming increasingly complex, and mapping and scheduling of multi-task applications on computational units is key to meeting performance constraints and power budgets. Abstract models of system components and deployment of advanced algorithmic techniques for the optimization problem c...

Context in source publication

Context 1
... show the benefits of validating the model accuracy on a virtual platform for the purpose of defin- ing and refining an effective allocation and scheduling methodology on MPSoCs. The paper is organized as follows: the application is described in section 2 and commented in section 3. The models developed are introduced in section 4 and simplifying assumptions are motivated in section 5, explaining why they could affect the quality of the overall solution. Experiments are presented in section 6. Related works and a discussion on future perspec- tives conclude the paper. Recent advances in very large scale integration (VLSI) of digital electronic circuits have made it possible to integrate more than a billion of elementary devices onto a single chip, thereby enabling the de- velopment of low-power, low-cost, high-performance single-chip multi-processors. These devices, called multi-processor systems-on-chip (MPSoCs), are finding widespread application in embedded systems (such as cellular phones, automotive control engines, etc.) where they are employed as special-purpose computing engines. In other words, once deployed in field, they always run the same application, in a well- characterized context. It is therefore possible to spend a large amount of time for finding an optimal allocation and scheduling off-line and then deploy it on the field. For this reason, many researchers in digital design automation have explored complete approaches for allocating and scheduling pre-characterized workloads on MPSoCs [18], instead of using on-line, dy- namic (sub-optimal) schedulers [5, 4]. The multi-processor system we consider consists of a pre-defined number of distributed computation nodes, as depicted in Figure 1. All nodes are assumed to be homogeneous and made by a processing core and by a tightly coupled local memory. This latter is a low access cost scratchpad memory, which is commonly used both as hardware extension to support message passing and as a storage means for computation data and processor instructions which are frequently accessed. Unfortunately, the scratchpad memory is of limited size, therefore data in excess must be stored externally in a remote on-chip memory, accessible via the bus. The bus for state-of-the-art MPSoCs is a shared communication channel, and serialization of bus access requests of the processors (the bus masters) must be car- ried out by a centralized arbitration mechanism. The bus is re-arbitrated on a transaction basis, based on several policies (fixed priority, round-robin, latency- driven, etc.). A task graph representation of the target application is input to our methodology. For each node/task, the worst case execution time (WCET) is specified and plays a critical role whenever application real time (RT) constraints (expressed here in terms of minimum required throughput) are to be met. In fact, tasks are scheduled on each processor based on a time-wheel. The sum of the WCETs of the tasks for one iteration of the time wheel must not exceed time period RT (i.e., the minimum task scheduling period ensur- ing that throughput constraints are met), which is the same for each processor since the minimum throughput is an application (not single processor) requirement. However, since we are now developing an allocation and mapping methodology, we are primarily inter- ested in assessing the accuracy of our problem model. Therefore, we at first extracted the average (not the worst) case execution time of the tasks by means of functional simulation, since it is likely to better match the average task durations measured on the validation virtual platform. When our stable methodology will be used for design purposes, the WCET should be used instead, not to violate timing requirements. Each task also has three kinds of memory requirements. Program Data : storage locations are required for computation data and for processor in- structions. They can be allocated either on the local scratchpad memory or on the remote on-chip memory. Internal State : when needed, an internal state of the task can be stored either locally or remotely. Communication queues : the task needs queues to transmit and receive messages to/from other tasks, eventually mapped on different processors. In the class of MPSoCs we are considering, such queues should be allocated only on local memories, in order to implement an efficient inter-processor communication mechanism. Finally, the communication requirements of each task are automatically determined depending on the size of communication data and on the physical lo- cation of computation data in scratchpad or remote memory. The methodology proposed in this paper has been applied to a synthetic signal processing pipeline. Functional pipelining is widely used in the domain of multimedia applications. Task parameters have been derived from a real video graphics pipeline processing pixels of a digital image. The proposed allocation and scheduling techniques can be easily extended to all applications using pipelining as workload allocation policy, and aim at providing system designers with an automated methodology to come up with effective solutions and cut down on design time. In the open literature, approaches to this kind of problems usually make very strong simplifying assumptions, like infinite number of processing units, zero time communication or unlimited memory capacity [16, 17, 1]. In addition, they often do not consider the real implementation of the solution they produce. We make simplifying assumptions to derive a problem model as well, but we also include a validation stage in our framework, in order to assess the impact of such approximations and verify the mismatch between off-line and on-line solution. An MPSoC virtual platform, called MP-Arm [14, 8], has been used for the validation of models and theoretical results, i.e. for implementing the on-line solution. There are three main reasons potentially originat- ing the mismatch between the off-line and the on-line ...

Similar publications

Article
Full-text available
The deployment of the next generation computing platform at ExaFlops scale requires to solve new technological challenges mainly related to the impressive number (up to 10^6) of compute elements required. This impacts on system power consumption, in terms of feasibility and costs, and on system scalability and computing efficiency. In this perspect...
Conference Paper
Full-text available
Typical MPSoC FPGA product design is a rigid waterfall process proceeding one-way from HW to SW design. Any changes to HW trigger the SW project re-creation from the beginning. When several product variations or speculative development time exploration is required, the disk bloats easily with hundreds of Board Support Package (BSP), configuration a...
Article
Full-text available
Multiprocessor SOC platforms have been adopted for a wide range of high-performance applications, like automotive and avionic systems. Task assignment and processing unit allocation are key steps in the design of predictable and efficient embedded systems. Given the execution modes of applications, we propose a methodology to compute a task to proc...
Article
Full-text available
Task mapping has been a hot topic in MPSoC software design for decades. During the mapping process, load balance and communication optimization have been two important performance optimization factors. This paper studies the relations between load balance, inter-processor communications and communication pipeline technique during the mapping proces...

Citations

... Dans [10] des extensions supplémentaires à [11] Dans [110], les auteurs proposent un algorithme statique efficace, permettant d'optimiser l'énergie consommée par les tâches de communications de systèmes à base de NoC, dont les liens sont à tension évolutive. Afin de déterminer une vitesse de lien optimale, l'algorithme proposé (basé sur une formulation génétique) explore globalement l'espace de conception du système, incluant l'assignation des tâches, le placement des tiles, l'allocation de chemins de routage et l'assignation des vitesses de liens. ...
... 99.73%, la borne inférieur de l'énergie consommée par une tâche est donné par l'Equation 9. 11. ...
... Dans ce travail, le facteur de qualité générique Q est remplacé parÊ, issu de l'Equation 9. le facteur de seuil Q th par le critère énergétiqueÊ min de l'Equation 9. 11. La liste listT oV isit est la liste de tous les noeuds, ordonnés suivant leur proximité (du plus proche au plus éloigné). ...
Article
With the advanced technologies (typ. < 32nm), it is more and more difficult to control the manufacturing variabilities. It impacts more severely the working frequency and the consumed energy, and induces more and more failure inside the device. This is particularly true for MPSoC with a large number of computing cores. With the increasing needs (performance, functionalities, low power, fault tolerance) and heterogeneous characteristics (frequency, energy, failures) it becomes difficult to apply to systems able to meet these requirements. This work focus on this perspective to deal with these issues for the massively parallel MPSoC, based on 2D mesh topology. This thesis proposes an automated methodology, allowing the mapping and scheduling of application on the targeted system. It takes into account the variability, energy and computing power. Furthermore, this thesis proposes a fault tolerant adaptive mapping technique, paired with an original failure recovering strategy. This strategy allows to guarantee the termination of the application in the presence of failures, without the check-point requirement. The technique has been extended with an adaptive distributed algorithm, taking into account the manufacturing variability and aimed at reducing the consumed energy.
Article
In this paper, we describe a technique to design UML-based software models for MPSoC architecture, which focuses on the development of the platform specific model of embedded software. To develop the platform specific model, we define a process for the design of UML-based software model and suggest an algorithm with precise actions to map the model to MPSoC architecture. In order to support our design process, we implemented our approach in an integrated tool. Using the tool, we applied our design technique to a target system. We believe that our technique provides several benefits such as improving parallelism of tasks and fast-and-valid mapping of software models to hardware architecture.