ArticlePDF Available

Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform

May 2017
ACM Transactions on Embedded Computing Systems 16(4):1-24

May 2017
16(4):1-24

DOI:10.1145/3055512

Authors:

Navonil Chatterjee

French National Centre for Scientific Research

Suraj Paul

Indian Institute of Technology Kharagpur

Santanu Chattopadhyay

Indian Institute of Technology Kharagpur

In Network-on-Chip (NoC)-based multicore systems, task allocation and scheduling are known to be important problems, as they affect the performance of applications in terms of energy consumption and timing. Advancement of deep submicron technology has made it possible to scale the transistor feature size to the nanometer range, which has enabled multiple processing elements to be integrated onto a single chip. On the flipside, it has made the integrated entities on the chip more susceptible to different faults. Although a significant amount of work has been done in the domain of fault-tolerant mapping and scheduling, existing algorithms either precompute reconfigured mapping solutions at design time while anticipating fault(s) scenarios or adopt a hybrid approach wherein a part of the fault mitigation strategy relies on the design-time solution. The complexity of the problem rises further for real-time dynamic systems where new applications can arrive in the multicore platform at any time instant. For real-time systems, the validity of computation depends both on the correctness of results and on temporal constraint satisfaction. This article presents an improved fault-tolerant dynamic solution to the integrated problem of application mapping and scheduling for NoC-based multicore platforms. The developed algorithm provides a unified mapping and scheduling method for real-time systems focusing on meeting application deadlines and minimizing communication energy. A predictive model has been used to determine the failure-prone cores in the system for which a fault-tolerant resource allocation with task redundancy has been performed. By selectively using a task replication policy, the reliability of the application, executing on a given NoC platform, is improved. A detailed evaluation of the performance of the proposed algorithm has been conducted for both real and synthetic applications. When compared with other fault-tolerant algorithms reported in the literature, performance of the proposed algorithm shows an average reduction of 56.95% in task re-execution time overhead and an average improvement of 31% in communication energy. Further, for time-constrained tasks, deadline satisfaction has also been achieved for most of the test cases by the developed algorithm, whereas the techniques reported in the literature failed to meet deadline in about 45% test cases.

A Network-on-Chip (NoC) architecture with multiple cores.

…

Task mapping for application task graph G 1 using FA-DRA.

…

Effect of task replication on communication energy of the mapped applications.

…

Average number of tasks restarted.

…

Comparison of deadline performance.

…

Figures - uploaded by Suraj Paul

Content may be subject to copyright.

Content uploaded by Suraj Paul

Content may be subject to copyright.

108

Fault-Tolerant Dynamic Task Mapping and Scheduling

for Network-on-Chip-Based Multicore Platform

NAVONIL CHATTERJEE, SURAJ PAUL, and SANTANU CHATTOPADHYAY,

Indian Institute of Technology Kharagpur

In Network-on-Chip (NoC)-based multicore systems, task allocation and scheduling are known to be impor-

tant problems, as they affect the performance of applications in terms of energy consumption and timing.

Advancement of deep submicron technology has made it possible to scale the transistor feature size to the

nanometer range, which has enabled multiple processing elements to be integrated onto a single chip. On

the ﬂipside, it has made the integrated entities on the chip more susceptible to different faults. Although a

signiﬁcant amount of work has been done in the domain of fault-tolerant mapping and scheduling, existing

algorithms either precompute reconﬁgured mapping solutions at design time while anticipating fault(s) sce-

narios or adopt a hybrid approach wherein a part of the fault mitigation strategy relies on the design-time

solution. The complexity of the problem rises further for real-time dynamic systems where new applications

can arrive in the multicore platform at any time instant. For real-time systems, the validity of computation

depends both on the correctness of results and on temporal constraint satisfaction. This article presents an

improved fault-tolerant dynamic solution to the integrated problem of application mapping and scheduling

for NoC-based multicore platforms. The developed algorithm provides a uniﬁed mapping and scheduling

method for real-time systems focusing on meeting application deadlines and minimizing communication

energy. A predictive model has been used to determine the failure-prone cores in the system for which a

fault-tolerant resource allocation with task redundancy has been performed. By selectively using a task repli-

cation policy, the reliability of the application, executing on a given NoC platform, is improved. A detailed

evaluation of the performance of the proposed algorithm has been conducted for both real and synthetic

applications. When compared with other fault-tolerant algorithms reported in the literature, performance

of the proposed algorithm shows an average reduction of 56.95% in task re-execution time overhead and an

average improvement of 31% in communication energy. Further, for time-constrained tasks, deadline satis-

faction has also been achieved for most of the test cases by the developed algorithm, whereas the techniques

reported in the literature failed to meet deadline in about 45% test cases.

CCS Concepts: rNetworks →Network on chip;rHardware →Fault tolerance;rSoftware and its

engineering →Scheduling; Embedded software; Real-time schedulability;

Additional Key Words and Phrases: Network-on-chip, dynamic mapping and scheduling, task replication,

communication energy, deadline, fault tolerance

ACM Reference Format:

Navonil Chatterjee, Suraj Paul, and Santanu Chattopadhyay. 2017. Fault-tolerant dynamic task mapping

and scheduling for network-on-chip-based multicore platform. ACM Trans. Embed. Comput. Syst. 16, 4,

Article 108 (May 2017), 24 pages.

DOI: http://dx.doi.org/10.1145/3055512

Authors’ address: N. Chatterjee (corresponding author), S. Paul, and S. Chattopadhyay, Department

of Electronics and Electrical Communication Engineering, Indian Institute of Technology Kharagpur,

West Bengal, India, 721302; emails: navonil@iitkgp.ac.in, Suraj.Paul@alumnimail.iitkgp.ac.in, santanu@ece.

iitkgp.ernet.in.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted

without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that

copies show this notice on the ﬁrst page or initial screen of a display along with the full citation. Copyrights for

components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.

To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this

work in other works requires prior speciﬁc permission and/or a fee. Permissions may be requested from

Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)

869-0481, or permissions@acm.org.

2017 ACM 1539-9087/2017/05-ART108 $15.00

DOI: http://dx.doi.org/10.1145/3055512

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:2 N. Chatterjee et al.

1. INTRODUCTION

Advancements in semiconductor technology have made it possible to integrate a large

number of Intellectual Property (IP) cores, Memory Units, and Processing Elements

(PEs) onto a single chip, resulting in a System-on-Chip (SoC) design. Major challenges

faced by SoC designers include intercore communication, synchronization, parallelism

between the tasks, and so forth. Traditional bus-based SoC designs often exhibit lim-

ited performance, higher latency, and increased power consumption. To overcome these

drawbacks of a bus-based SoC, Network-on-Chip (NoC) has been proposed as a viable

solution [Benini and De Micheli 2002]. In the NoC paradigm, intercore communication

is provided by a set of routers and point-to-point communication links between them. A

core is attached to a router through a Network Interface (NI) module, which interfaces

the computation and communication units and converts the data from the IP cores

into packetized form. Routers route the packet from source node to destination node

depending on the underlying network topology and routing strategy [Kundu and

Chattopadhyay 2014]. With rapidly shrinking transistor geometries and rigorous

voltage scaling, the gain in performance in terms of power, packing density, and area

consumption has been offset by the dependability of on-chip communication elements

and processing entities [Borkar et al. 2004]. This is mainly attributed to vulnerability

of fabricated NoC components to transient, intermittent, and permanent faults

[Constantinescu 2003]. Transient faults occur when there is one or more bit-errors in

the transmitted packet, which may be the effect of crosstalk [Kohler et al. 2010] or

impact of alpha and neutron particle strikes [Chou and Marculescu 2011]. Intermittent

faults occur in burst, repeat themselves from time to time, and tend to occur in the

same location [Constantinescu 2002]. Transient and intermittent faults can often be

handled at the software level. Permanent faults occur due to device wearout caused

by Negative Bias Temperature Instability (NBTI), electromagnetic interference, and

oxide breakdown [Fick et al. 2009]. If a permanent fault occurs during the operation

of the chip, we need to have additional resources available to replace the faulty

components. The redundancy introduced in the design may help to overcome such a

situation but at the cost of additional components. This leads to consumption of extra

energy, which may even exceed the energy budget for a given application. Therefore,

one of the design goals in case of permanent faults is to provide support for tolerating

multiple faults without sacriﬁcing solution quality and honoring the energy budget

of the application. This motivates us to limit our research to permanent faults only

and to evolve a software-redundancy-based strategy to overcome such fault scenarios,

where the IP cores are likely to be impaired by faults, which may be during their ﬁeld

operation. The present work is conﬁned to failure of processing elements, while the

network communication elements (i.e., routers and links) are assumed fault free.

In this work, we present a uniﬁed mapping and scheduling strategy for dynamic

resource allocation in Fault-Tolerant Dynamic Systems (FTDSs). The goal is to achieve

resource-efﬁcient fault-resilient application execution in many-core systems under

core-to-core reliability variations, where no a priori knowledge of the complete struc-

ture of applications exists. Such variation in reliability can be caused by aging [Das

et al. 2013], device parameter variation, and thermal variations [Hajimiri et al. 2011].

Important issues that need to be addressed while solving the problem of dynamic task

mapping and scheduling for FTDSs are energy consumption and timeliness of task

completion, along with imparting fault tolerance for reliable application execution.

Computation energy is mostly governed by the type of application (e.g., multimedia,

networking, etc.), while communication energy is largely determined by the mapping

solution. Another important issue that needs to be given due attention is the tim-

ing characteristics of the applications, which affect the quality of the solution. An

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:3

application is characterized by execution times and deadlines associated with indi-

vidual tasks in it. To improve the quality of the solution and maintain the utility of

the results of computations (depending on the execution time), task deadlines must

be honored for many practical real-time applications. To achieve higher reliability

and deadline performance on multiple cores, in this article, we have introduced a

reliability-driven task mapping and scheduling strategy that involves determination

of task allocation modes (i.e., task allocation with or without redundancy) and task allo-

cation decisions (i.e., mapping the (redundant) tasks on many-cores with heterogeneous

reliability characteristics). The overall contributions of this article are as follows:

—Proposes method to determine allocation mode of tasks for tolerating single and

multiple permanent core failures

—Proposes a communication-energy- and deadline-aware dynamic mapping and

scheduling method for real-time applications

—Presents algorithm for incorporating software-redundancy-based fault tolerance dur-

ing resource allocation of concurrent applications

The rest of this article is organized as follows. In Section 2, we present the literature

review. Section 3 describes the NoC-based system model. Some preliminary deﬁnitions

and the problem statement are given in Section 4. The fault-aware dynamic resource al-

location policy that helps to mitigate the problem of core failures at runtime is presented

in Section 5. A detailed description of the experimental setup and associated results

are given in Section 6. In Section 7, the conclusion and future work are presented.

2. LITERATURE REVIEW

2.1. Hardware Redundancy-Based Fault-Tolerant Techniques

In this section, we present some selected works on fault-tolerant approaches for the

NoC platform. Some works [Shivakumar et al. 2012; Schuchman and Vijaykumar 2005]

propose using microarchitecture-level redundancy to improve system reliability and in-

crease processor lifetime. Shivakumar et al. [2012] utilized the inherent redundancy

present in modern microprocessors to improve yield and enhance graceful degradation

of the system performance in case of failure. It considers granularity from the proces-

sor subcomponent level to the whole processor for proposing fail-in-place capabilities

within a single chip. Schuchman and Vijaykumar [2005] proposed a defect-tolerant

microarchitecture, namely, Rescue. Core-level redundancy has been shown to be more

beneﬁcial compared to microarchitectural-level redundancy for homogeneous multi-

processor platforms.

For many-core systems, as a single core is small and inexpensive compared to the

entire chip, core-level redundancy is considered to be an efﬁcient solution for yield

improvement [Zhang et al. 2008, 2009]. Here, an ncore processor has been used with

mredundant cores. Zhang et al. [2009] proposed a core-level redundancy scheme that

effectively tolerates defects in homogeneous multicore systems and increases yield.

Defective cores change the topology, and the programmer is faced with the challenge

to optimize parallel applications. To address this problem, Zhang et al. [2009] have

provided a uniﬁed topology that is isomorphic, regardless of the underlying physical

topologies. In Wang et al. [2013], shift operations—row bi-shift and column shift—have

been used for redistributing fault-free PEs of a processor array in reconﬁguration. This

reduces congestion and latency of the reconﬁgured topology by reducing long links.

The increase in area, throughput, and delay due to the use of spare cores has been

addressed in Ren et al. [2015]. Using the maximum ﬂow algorithm, it optimizes the

use of spare cores with improvement in the repair rate of faulty PEs, having polynomial-

time complexity.

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:4 N. Chatterjee et al.

An analytical model for yield and cost estimation for spare core-based multicore sys-

tems has been provided in Shamshiri et al. [2008] and Shamshiri and Cheng [2011].

The yield model for multicore systems takes into account the number of cores, num-

ber of spares, core yield, manufacturing test quality, in-ﬁeld test quality, and burn-in

process. Based on this model, the authors have experimentally investigated the effects

of these factors on the total cost of the chip. It is shown that the overall cost can be

signiﬁcantly reduced by adding a few dynamically reconﬁgurable spares. A fault-aware

method to manage resources in NoC-based multiprocessors, called FARM (Fault-Aware

Resource Management), has been proposed in Chou and Marculescu [2011]. The place-

ment of spare cores is ﬁxed for all applications for migrating the tasks on occurrence

of faults. However, as spare core allocation is not adaptively set based on the struc-

ture of incoming applications, FARM imposes high communication energy and perfor-

mance overhead after failure recovery. Khalili and Zarandi [2013] proposed a method

to adaptively determine spare cores for each application depending on the criticality

of its tasks. It assigns spare cores near to the processing cores for critical tasks. The

proposed mapping technique reduces overall communication energy and performance

overhead [Chou and Marculescu 2011] by reducing the distance between the spare core

and the failed core.

2.2. Software-Redundancy-Based Fault-Tolerant Techniques

A ﬁxed-order Block and Band reconﬁguration technique has been studied in Yang and

Orailoglu [2007]. The initial schedule is statically partitioned into multiple bands, and

regular reassignment capability is embedded into it. A group transfer of a set of depen-

dent tasks to a new PE is performed upon execution-driven resource variation. This mi-

gration to other functional core(s) is determined by the band in which the tasks belong.

Das et al. [2015] and Das and Kumar [2012] present an execution trace-based dy-

namic reconﬁguration of application mapping for different fault scenarios. The pro-

posed technique performs a design-time analysis on the execution trace of an applica-

tion, modeled as synchronous data ﬂow graphs. The process captures the optimal points

with respect to the throughput and migration overhead for multiple fault scenarios.

The results are updated in the database. Out of all captured points, the one with the

highest throughput/energy ratio is selected as the ﬁnal mapping and executed at run-

time for tolerating core faults. In Lee et al. [2010], the technique presented analyzes all

possible processor failure scenarios at compile time and stores resultant task remap-

ping information to use it for tolerating processor failures at runtime. A re-execution

slot-based reconﬁguration mechanism has been studied in Huang et al. [2011]. Normal

and re-execution slots of a task are scheduled at design time using an evolutionary algo-

rithm to minimize certain parameters, like throughput degradation. At runtime, tasks

on a faulty core migrate to their re-execution slot on a different core. However, schedule

length can become unbounded for high-fault-tolerance systems. Moreover, analysis is

based on acyclic graphs and therefore cannot be applied to streaming applications with

cyclic task dependencies. The work reported in Bolanos et al. [2013] also extends the

mapping and scheduling solution by performing dynamic remapping of tasks executing

on processors detected as faulty at runtime. It calculates mapping solutions for some

deﬁned failure scenarios and stores in the database. At runtime, upon detection of a

fault, the precomputed mapping solution is evoked from the database. Das et al. [2014]

discuss an ofﬂine task remapping technique for a heterogeneous multiprocessor system

for all processor fault scenarios that minimizes the communication energy and task

migration overhead. This is a two-step technique, where the ﬁrst step includes the ini-

tial mapping phase (communication-energy-driven design space exploration), and the

second step includes fault tolerance (communication-energy- and migration-overhead-

aware task mapping). The later step is used to generate fault-tolerant mappings that

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:5

are stored in memory for use at runtime. When one or more PEs fail, the best mapping in

terms of communication energy and migration overhead is fetched and applied. Arnold

and Fettweis [2011] proposed a fault-aware dynamic task scheduling approach that

detects and isolates erroneous PEs and ensures error-free execution of applications.

The interconnects and CoreManager are considered fault free. In case of permanent

PE faults, the processor is considered to be nonfunctional, while the task is re-executed

on a different PE.

From the previous discussion, it can be noted that some of the existing strategies

compute the reconﬁgured mapping necessary to counteract the PE failures and

store in the memory. This requires complete application task graph information and

assumes that the application to be executed is fully known beforehand. However,

with an increase in the degree of fault tolerance, the time taken to explore the

remappings for all possible sequences of fault occurrence becomes a determining

factor in fault-resilient operation of the system. Other strategies adopt an approach

that brings down the overhead of exhaustively searching, by storing all remapping

scenarios for a given degree of fault tolerance. It is affected by exploiting the fact that

the probability of multiple PE failures occurring simultaneously is very low. Thus,

at each stage, task reconﬁguration for only a single PE failure scenario is computed

and stored in memory to be used for fault tolerance. Methods using task migration to

reliable PEs need checkpointing to store the task state and transfer it to different PEs.

This article proposes an improved software-based technique that eliminates the need

for any checkpointing. No additional memory is required to store the PE fault-tolerant

mapping. Also, the energy spent on reconﬁguring the tasks to reliable PEs is reduced

by the proposed technique. In addition, unlike hardware redundancy methods, there is

no additional energy and area consumption.

3. SYSTEM MODEL

3.1. Hardware Architecture Model

We have considered a two-dimensional (2D) mesh-based NoC topology consisting of

heterogenous PEs, as shown in Figure 1(a). A special-purpose PE runs the Real-Time

Operating System (RTOS), while general-purpose PEs are for executing tasks. The

method proposed in this work can be extended to any other topological conﬁguration

of the PEs. The Manager Core, shown in Figure 1(b), is implemented as a software

module and makes decisions using runtime mapping/scheduling algorithm in the Task

Mapping and Scheduling Unit (TMSU) to allocate the tasks of incoming applications to

appropriate resources. In addition, the Manager Core also monitors the status of every

PE to determine its health/availability. The resource occupancy status of individual

general-purpose PEs is updated in the Resource Manager (RM) at runtime to provide

the Manager Core with the latest information about the resource availability. Thus,

the Manager Core performs dynamic resource allocation by scheduling the tasks in a

ready list and suitably mapping them on available cores. For fault tolerance, the Fault

Mitigation Unit (FMU) in the Manager Core performs task duplication and allocates

the recovery tasks on an alternative PE. The selection of the alternative PE and task

allocation methodology has been detailed in Section 5.

A PE has three parameters (PEi,PEcap,PEtasks ). PEiis the PE identiﬁer that provides

a unique identiﬁcation for a PE. Each PE has a ﬁxed amount of memory. The number

of tasks a PE can support is determined by the size of available memory, similar to

Maqsood et al. [2015]. In the given NoC system, PEs can support a maximum of PEcap

number of tasks. The set PEtasks denotes the set of tasks allocated to the PE. A PE can

execute only one task at a time even though multiple tasks might be allocated to it. In

the proposed work, we have considered a PE to exist in two states: (1) idle and (2) busy.

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:6 N. Chatterjee et al.

Fig. 1. A Network-on-Chip (NoC) architecture with multiple cores.

If no task is assigned to the PE or it has completed the assigned tasks, then the state of

the PE is deﬁned as idle. On the other hand, if the PE is processing any allotted task,

it is said to be in busy state. Additionally, an assigned task may wait for its input data

packets to arrive on the PE before it is ﬁnally taken up for execution.

3.2. Communication Infrastructure Model

The communication infrastructure consists of a router-based on-chip network that

uses wormhole packet switching. The routers considered in this work have no virtual

channel; that is, one logical channel is associated with each physical I/O port. The FIFO

depth of each input channel is considered as 8, with a data width of 32 bits. The arbiter

supports multiﬂit packets and uses round-robin scheduling. A packet has a maximum

size of 64 ﬂits, each ﬂit having a width of 32 bits. The packet composition is as follows:

(1) header ﬂit, (2) body ﬂit, and (3) tail ﬂit. The bits 0 and 1 of the ﬂit denote the ﬂit

type: Head (01), Body (10), or Tail (11). Invalid ﬂits are represented by 00. Next, for

the source address, bits 2 to 9 are allotted, while for the destination address, bits 10

to 17 are reserved. Here, source and destination addresses have been assumed to be

8-bit, supporting a maximum of 256 PEs. The rest of the bits are used for data payload.

Deterministic XY routing has been used for routing packets from the source node to the

destination node through an on-chip communication network. It is a dimension order

routing where each packet is ﬁrst routed along the x-axis till it reaches the destination

column. Then, the packets are traversed along the y-axis to reach the destination node.

The XY routing algorithm uses minimum path for packet communication and ensures

deadlock- and livelock-free routing at the same time.

3.3. Fault Model

As noted in Figure 1, the NoC system contains several components, each of which

may be prone to faults. Failure may occur on one or more routers, links, and PEs.

However, this work considers the faulty behaviors of PEs only. A processor may become

permanently faulty due to aging or wear and tear of its elements over time. In such

cases, no new tasks are mapped onto it, which is ensured by the Manager Core. The

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:7

Manager Core and the on-chip communication entities have been assumed to be fault

free, as in Chou and Marculescu [2011] and Khalili and Zarandi [2013]. The work of

Chatterjee et al. [2014] has presented schemes to ensure fault tolerance of the NoC

infrastructure. Although we have considered a single Manager Core, having its multiple

copies when permitted by system architecture can further impart fault tolerance for the

platfrom. Since the NoC infrastructure has been assumed to be fault free, even in the

case of failed PEs, the router to which it is associated functions normally. Therefore, in

our case, the mesh architecture remains connected and hence supports the XY routing

scheme for packet traversal. We assume that suitable fault detection mechanisms

[Liberato et al. 2000] are already existing on the given platform and the time consumed

for detection of faults to be negligible.

3.4. Temperature Sensors

Thermal sensors are integrated in a chip for runtime temperature measurement. Such

sensors could be found in Intel Xeon Series [Intel 2008] processors with one embedded

sensor per core and IBM POWER6 [Floyd et al. 2007] processors with 25 thermal

sensors. The measurement inaccuracy of these devices is rectiﬁed by sensor calibration

techniques as in Yao et al. [2011]. In this work, we have concentrated on the problem of

fault-tolerant task scheduling and mapping assisted by the temperature readings of the

thermal sensors embedded in the PEs. The thermal sensors are placed across the chip

with one sensor per PE, as shown in Figure 1. These sensors send the individual core

temperatures to the Manager Core at some periodic interval of time. The temperature

of all the cores is stored in an array, coretemp. The Manager Core uses the reliability

model presented in Equation (9) to generate the reliability of individual processing

cores depending on the temperature values stored in coretemp. Based on the reliability

values of individual PEs, the task allocation mode for individual tasks is determined

at runtime.

4. PROBLEM FORMULATION

In this section, we will ﬁrst introduce necessary deﬁnitions and notations required to

formally state the dynamic mapping and scheduling problem for fault-tolerant resource

allocation.

Application Task Graph: An application is represented as a directed acyclic graph

G=(T,E), where Tis the set of nodes representing the tasks of the application and E

is the set of directed edges showing the dependency and communication between the

tasks of the application. A task ti∈Tis represented by four parameters (id

i,exi,dli,sli),

where id

iisthetaskidentiﬁer,exiis the execution time of ti,anddliis the completion

deadline. sliindicates the slack time of the task, as explained next.

1. The execution time of a task is the time taken by the task to complete its computation

while running uninterrupted on a PE.

2. The completion deadline is the time by which a task must be completed.

3. The slack time is the margin between the time at which a task would complete if

it started now and its deadline. This indicates the size of the available scheduling

window. It is expressed as

sli=deadli ne −current time −task execution time.(1)

This is a dynamic attribute of the task as it depends on the runtime situation.

An edge eij ∈Erepresents the communication between the tasks tiand tj. The weight

of edge eij, denoted by wij, represents the communication volume from tito tj.

Topology Graph: The NoC topology graph is a directed graph N=(P,L)with

each core PE

i∈Prepresenting a PE in the topology. Each PE has an associated

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:8 N. Chatterjee et al.

router by which it is attached to the NoC. The directed edge lij ∈Lrepresents a direct

communication link between the router for processors PE

iand PEj. The weight of the

edge lij, denoted by bwij, indicates the bandwidth available across the edge lij.

Task Mapping: Aprocessor allocation of the task graph Gon a ﬁnite set of processors

Pis the processor allocation function map: T −→ Psuch that if a given task ti∈T

is assigned to PEj∈P,map (ti)=PEjdetermines the spatial allocation of tasks.

In dynamic task mapping, processor allocation is invoked when an allocated task re-

quests to communicate with a task that has not yet been mapped/scheduled. Temporal

assignment is the determination of starting time of execution of each task. It assumes

that allocation of tasks to processors has been completed before.

Task Scheduling: Aschedule of a task graph Gon a ﬁnite set Pof processors is the

function pair (start,map), such that

—start :T−→ Q+is the function giving the start time of a task in G,

—map :T−→ Pis the processor allocation function,

where start(ti) represents the time at which task tibegins execution on an identiﬁed

PE given by map(ti) [Sinnen 2007].

Thus, the two functions start and map describe temporal and spatial assignment of

tasks, represented by the nodes of the task graph, to the processor set Pof a target

system. If the time at which a scheduled task is completed is given by ﬁnish(), then

ﬁnish(ti)=start(ti)+exi.(2)

Communication Time: Communication time ct(eij) of an edge eij is the time the

communication volume (represented by wij) takes from the origin PE to completely

arrive at its destination PE. In other words, ct(eij) is the time taken for sending the

data between PEs executing tasks tiand tj. Due to the property of the system model

considered, no two tasks can execute on the same PE at the same time. Also, the de-

pendency between the tasks of an application Gmust be met. The following conditions

express this:

Condition 1.1: For any two tasks, tiand tj∈T,

map(ti)=map (tj)⇒start(ti)≥ﬁnish(tj)or start(tj)≥ﬁnish(ti).(3)

Condition 1.2: For every edge eij,

start(tj)≥ﬁnish(ti)+ct(eij).(4)

Earliest Available Time: For a schedule Sof a task graph Gon P, the earliest time

at which a PE

k∈Pwill be available is given by function EAT deﬁned as

EAT(PE

k)=max

∀ti∈T|PE

k=map(ti)ﬁnish(ti).(5)

When all the leaf tasks of Gﬁnish, the schedule Sterminates and the resources used

are released, which may then be allocated to a new application.

Task Allocation Modes: Atasktihas an allocation mode denoted by ψi=

{EA,FAR}, where EArepresents the communication-energy-aware mode in which task

tiis assigned to a core in a given NoC platform such that the communication energy

of the application is reduced. FAR denotes failure-aware redundancy mode, which is

activated for task tito tolerate core failures. For a given degree of fault tolerance f,we

allocate fredundant copies of a task. For the task set Twith Ntasks, we use an array

ψ=(ψ1,ψ

2,...,ψ

N) to indicate the allocation decision taken for all tasks in the given

application. Each element of the array belongs to subset TEA or TFAR ,TEA ∩TFAR =∅,

where TEA and TFAR denote the set of tasks allocated in the EA and FAR modes, respec-

tively. It may be noted that for each task marked to be implemented in failure-aware

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:9

mode, there are duplicate copies of the task. Thus, deﬁning Tmod =TEA ∩TFAR ,wehave

|Tmod |≥|T|depending on the degree of fault tolerance.

Communication Energy: Energy is consumed when data packets are transferred

from a source PE to a destination PE. The Manhattan distance between the source

and destination PEs is used to obtain the number of physical links Nland the number

of routers Nr, traversed for communication between two dependent tasks. Let Erbe

the energy consumption in pJ in a router for 1-bit transmission and Elrepresent the

energy consumption in pJ in a link to transmit 1 bit. Communication energy, Ecomm,for

allocated applications on a given NoC platform is estimated as

Ecomm =

∀Gi

∀eij∈Gi

(Er∗Nr+El∗Nl)∗wij

Nr=HC(ma p(ti),ma p(tj)) +1

Nl=HC(ma p(ti),ma p(tj)),

(6)

where HC represents the hop count and wij is the communication volume between

tasks tiand tjin megabits.

4.1. Problem Statement

Given the following as inputs:

(1) A set Gof arrived applications, each of which is represented by an application task

graph G=(T,E)

(2) A target NoC platform given by N=(P,L)

(3) Timing information of the tasks of the applications

—Execution time, exi>0

—Completion deadline, dli≥exi

Determine a dynamic resource allocation for all tasks of the arrived application

to ﬁnd

—allocation array, ψ,

—map :Tmod −→ P|PE

tasks ≤PE

cap,and

—start(ti) on the identiﬁed PE,

such that while tolerating f core failures:

(1) ﬁnish(ti)≤dli,and

(2) Ecomm of application is reduced.

Migration energy for fault-tolerant design adds substantially to the energy con-

sumption for any given application. As faults occur, tasks from faulty core(s) need to be

migrated to other functional core(s). Low migration overhead is possible by remapping

only the tasks from faulty core(s) and keeping all other task mappings unchanged. On

the other hand, for real-time applications, satisfying the deadline demands remapping

techniques with low execution overhead for fault recovery mechanisms. Methods such

as Das and Kumar [2012] and Das et al. [2013] use the ofﬂine one-fault look-ahead

policy to get a mapping, but it also imposes a penalty on deadline performance of the

application. For real-time applications, we address the aforementioned challenges by

taking advantage of a selective task replication policy for failure recovery.

5. THE PROPOSED APPROACH

The ﬁrst stage of our algorithm honors the task deadline and minimizes the communi-

cation energy consumption of the application. Next, the algorithm addresses processor

failures using a proactive policy where tasks allotted to susceptible PEs are replicated

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:10 N. Chatterjee et al.

to minimize the effect of faulty processors on the executing application. The replicated

tasks are mapped onto available PEs with higher reliability. In order to determine the

candidate tasks for applying redundancy, a reliability-driven approach is adopted as

outlined in the following subsection.

5.1. Reliability-Driven Selective Task-Redundancy

Before detailing the fault-tolerant resource allocation policy, we ﬁrst determine the

potential tasks for applying the fault-tolerant strategy. This is accomplished as follows.

In the mesh-based NoC topology, we have used thermal sensors to detect the tem-

perature of individual cores. The recorded temperature values are then communicated

to the Manager Core. Using the temperature value, the reliability of the corresponding

PE, described in Section 5.3, is estimated and stored in the manager PE. Depending

on the reliability value, the PEs are categorized into reliable and unreliable PEs as

follows:

type(PE)=reliable,if reliabilityof PE >critical reliability

unreli able,otherwise.(7)

To determine the value of critical reliability for PE categorization, the temperature

threshold is selected as in Ahmed et al. [2014]. Although unreliable PEs are available

for runtime task allocation, they are more likely to undergo failure during task exe-

cution. Thus, the task running on such a PE should have a backup that may be used

if the corresponding PE fails while the application is running. This strategy enables

tasks of applications to execute uninterrupted despite core failures.

To improve the system reliability, we have proposed a software-redundancy-based

task replication technique. This method generates a cloned task that inherits all the

processing requirements such as execution time, data dependency on other tasks, and

completion deadline from the original task. The cloned/recovery tasks obtained by

replication are available in the system in two forms and are executed depending on the

time of their activation: active replication and passive replication [Eles et al. 2008; Ahn

et al. 1997]. Active replication is a space redundancy technique, where the duplicated

tasks are executed along with their original counterparts, irrespective of the occurrence

of faults. On the other hand, passive replication involves activation of the backup task

only if the fault occurs at the PE executing the original task. It assumes that the

system is equipped with a fault detection mechanism that updates the failure status

of the individual processors to the Manager Core. If the task assigned on a failed PE

has a duplicate copy on a different PE, at the instance of failure, the duplicate task is

activated.

5.2. Fault-Aware Dynamic Task Allocation (FA-DRA)

Algorithm 1 presents the proposed scheme. In order to allocate resources to an applica-

tion, we initialize the Ready List with the mature tasks, ready to be executed. Mature

tasks are tasks with no predecessors or whose predecessors have already ﬁnished ex-

ecution. The function find unoccupied PE list keeps a record of the set of processors

currently free. Based on the availability of free processors and the Task Selection Func-

tion TSF(), tasks in the Ready Lis t are scheduled in a particular sequence. As the time

taken to run the algorithm adds to the fault-aware core allocation time, we consider

the timing characteristics of tasks to choose TSF():

1. Minimum Processing Time First (Min_exc): TSF(ti)=Min(exi)∀ti∈Ready Li st

2. Maximum Processing Time First (Max_exc): TSF(ti)=Max(exi)∀ti∈Ready Lis t

3. Least Deadline First (Min_dl): TSF(ti)=Min(dli)∀ti∈Ready Li st

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:11

The Minimum Processing Time First scheme selects the task with the least execution

time, while the Maximum Processing Time First does task selection based on the max-

imum execution time for mapping and scheduling. The Least Deadline First heuristic

chooses the task having the earliest deadline among the mature tasks to allocate next.

ALGORITHM 1: Fault-Aware Dynamic Resource Allocation

Input: G=(T,E); N=(P,L);

Output: fault-aware mapping and scheduling ∀ti∈T

1Ready List ←mature tasks of application;

2ProcAvl=ﬁnd unoccupied PE list();

3ProcBusy =∅;

4PE

rel =∅;

5\\PE

unre l is the set of PEs with low reliability

6PE

unre l =get unreliable PE();

7while Ready List!=∅do

8for ti∈Ready Li st do

9Sel task =TSF(ti);

10 if ProcAvl!=∅ then

11 Taget Parent =most communicating parent of Sel task;

12 Best Free PE =processor assigned to Target Parent;

13 if Best Free PE ∈ProcAvl AND can support Sel task then

14 PEsel =Bes t Free PE;

15 update ProcAvland ProcBusy;

16 else

17 ProcBusy =ﬁnd occupied PE list(PEk)with EAT <Slack time of Sel task;

18 if ProcBusy !=∅then

19 Best Busy PE=select a PE ∈ProcBusy nearest to Target Parent;

20 PEsel =Bes t Busy PE;

21 else

22 Target Parent =next most communicating parent;

23 go to line 12;

24 \\Identify a PE with reliability higher than PEsel

25 if PE sel ∈PEunrel then

26 Sel task∗=replica of Sel task;

27 for PE j∈Pdo

28 if PE j∈ProcAvlANDPE

j/∈PEunre l then

29 PErel =PErel ∪PE j;

30 PErecovery =minimum distance PE ∈PErel from PEsel

31 allocate Sel task to PEsel and Sel task∗to PErecovery;

32 assign start-time of Sel task on PEsel and Sel task∗on PErecovery ;

33 update EAT of PEsel and PErecovery;

34 delete Sel task from Ready Li st;

A task can have dependency on multiple predecessors. In case of multiple parents,

it is preferable to map the task to the processor of the highest communicating parent,

provided that processor is available. However, if such a PE is unavailable, the algo-

rithm does not assign the task to an available PE closest to its parent, even if it can

support the task. This is in contrast with the as-soon-as-possible paradigm of resource

allocation. The primary idea is that if the task has enough slack time, a PE advanta-

geous in terms of communication energy consumption can be chosen. The algorithm

then uses find occupied PE list() to choose a target PE for execution of the selected

task depending on the position and Earliest Available Time (EAT) of the busy PEs.

Such a PE can be the one on which the parent was executed, or in its neighborhood.

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:12 N. Chatterjee et al.

The currently executing task on such a PE should ﬁnish within the slack time margin

of the selected task. Assigning such a PE helps to lower the communication energy

while satisfying its deadline constraint. In case this condition is not met, the selected

task is assigned to an available PE, which is closest to the PE where the parent task

was executing.

Next, we describe the fault-aware dynamic resource allocation scheme. As seen in

Algorithm 1, PEs are selected depending on the task deadline and communication en-

ergy consumption. The selected PEs are then checked for susceptibility to failure using

their corresponding reliability values. If the selected PE is unreliable, the algorithm

clones the original task and executes the fault-tolerant scheme presented in lines 25

to 30 of Algorithm 1. It tries to map the recovery task to an available PE in close vicin-

ity of the original task to reduce the communication overhead. The chosen PE (PErel)

should be a PE not present in the set of unreliable PEs, PEunrel . Such a PE is referred

to as reliable PE (PErel). Next, the algorithm allocates the cloned task on PErecovery ,a

reliable PE chosen closest to PEsel. Once the PE for the selected task and its replica

(PEsel and PErecovery , respectively) are decided, the start times of execution of the tasks

on the identiﬁed PEs are assigned depending on the communication delay with its

predecessors. After completing the resource allocation for the chosen task, the task

is deleted from the Ready List. Finally, the set of occupied and free PEs is updated.

This process is continued till all the tasks of applications are mapped and scheduled

(with/without redundancy).

5.3. Reliability Analysis

The proposed task recovery mechanism replicates tasks executing on a PE that is likely

to fail and resumes execution in the form of its replica on a different PE. In order to

decide the tasks to be replicated, a model to determine the reliability of the processing

core is required. The failure rate of each PE present in the network graph is estimated

by using the model presented in Chou and Marculescu [2011]. It associates the failure

rate of PE with its temperature and is given as

λp=Ae −E

KT ,(8)

where Eis the activation energy (0.9eV), Kis the Boltzmann constant (8.617 ×10−5eV

K−1), and Tis the absolute temperature given in Kelvin. Ais constant, and its value

is selected such that the failure rate per cycle for each core operating at useful life is

10−11, under normal core temperature, that is, 55◦C [Chou and Marculescu 2011]. The

previous equation is used for reliability analysis of the PEs in the NoC system. After

the PEs have gone through the prenatal mortality period, their reliability is formulated

[Chang et al. 2011] as

R(t)=e−λpt.(9)

Based on the value of the reliability of individual PEs, a subset of them are marked as

PEunre l . These are the set of PEs that are more likely to fail as their reliability values

are low. For a mesh-based NoC system, composed of PEs, routers, and interconnecting

links, system reliability can be modeled using a series conﬁguration where failure of

any component results in the failure of the entire system. As router and link reliability

computation are beyond the purview of this article, they are not taken into account

while formulating reliability for NoC. Schemes described in Chatterjee et al. [2014]

may be applied to address such cases. Thus, the expression for reliability is deﬁned as

RPE total(t)=

|P|



j=1

P(t).(10)

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:13

Fig. 2. Task mapping for application task graph G1using FA-DRA.

Here, Rj

P(t) is the reliability of jth PE at time t.RPE total(t) corresponds to the reli-

ability of all PEs taken together. Incorporating the aforementioned reliability model

in fault-tolerant application mapping, the number of PEs executing the tasks of an

application should be taken into consideration. Therefore, we obtain the application

reliability for a given NoC platform as

Rapp(t)=

∀i|PE

i∈PEmap

P(t),

wher e,PEmap ={PEi|PE

i=map(ti),∀ti∈TofG(T,E)}.

(11)

While allocating resources, a task may be mapped onto an available PEunrel .Since

such PEs are less reliable (i.e., they are more susceptible to faults), a fault-tolerant

policy is necessary. Toward this end, cloned copies of the original tasks are assigned to

PEs with higher reliability. Otherwise, original tasks are considered for mapping onto

PEs without any recovery copies. Therefore, the series model of application reliability is

transformed into the series-parallel model by incorporating the proposed fault-tolerant

approach. As a result, the new application reliability becomes

Rapp(t)=

∀i|PEi∈PEmap

P(t)+

∀j|PEj∈PEunr el

∀k|PEk∈PEunrel

(1 −Rj

P(t))Rk

P(t),(12)

where PEunre l represents the set of processors whose reliability is low.

Rapp(t)=Rapp(t)−Rapp(t) (13)

The increase in application reliability with the fault-tolerant policy of task replication

is shown in Equation (13).

5.4. Working of the Proposed Algorithm

In this section, the working of the proposed algorithm has been illustrated using appli-

cation G1on a 3 ×3 NoC platform, as shown in Figure 2. The details about the tasks

and other characteristics, such as execution time and deadline, have been mentioned

in the ﬁgure. We have represented the NoC platform using a grid structure, where each

grid represents a PE. The PEs are numbered using a row major order, considering the

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:14 N. Chatterjee et al.

left-hand top corner grid as the starting PE, that is, PE0. The shaded grids represent

the selected PEs, and the labels inside them indicate the tasks assigned to that PE.

For simplicity, we have considered that at most two tasks can be mapped per PE. In

the given example, PE5is assumed to be an unreliable PE. The TSF() function uses

the Minimum Processing Time First scheme for task selection from Ready List.

Let the root task t0be mapped onto the starting PE (PE4). After the task t0completes,

the children tasks of t0(i.e., t1,t2,t3,andt4) are ready for execution. Applying the task

selection function, t1is the next candidate selected for mapping. As PE4is the processor

where the parent task was executed and is available, it is selected for mapping of

t1. Next, t2is selected by TSF(). As the parent processor, PE4is occupied, and the

neighbourhood of PE4is searched. Four possible positions at one-hop distance are

available, all of which are equally suitable. Let t2be allocated to PE1. The algorithm

uses a similar procedure for mapping the rest of the tasks t3and t4.Whent1completes,

t5and t6are ready for execution. As t5has a lower execution time compared to t6,the

TSF() selects t5and allocates it to PE1.

Since more than two tasks cannot be mapped onto a single PE, PE1and PE4are not

available for allocating t6. Next, the algorithm explores the occupied neighboring PEs

at one-hop distance and checks for the condition mentioned in step 17 of the algorithm.

Since both PE3and PE7are busy and their earliest available time exceeds the slack

time of t6,taskt6is assigned to PE5.AsPE5is an unreliable PE, it is likely to fail

while executing t6. Therefore, a replica of t6(i.e., t∗

6) is created and assigned to PE2

with higher reliability and is a close neighbor of the PE5where the original task is

allotted. This policy is carried out by lines 25 to 30, in Algorithm 1. After completion

of PE allocation for the duplicate task, the algorithm is executed until all remaining

tasks in the Ready Li st are mapped and scheduled. Next, task t7is ready to be mapped.

Algorithm 1 intelligently allocates it to the neighborhood of the most communicating

parent, that is, t3(mapped to PE7) instead of t2(mapped to PE1), which aids in reducing

communication energy. Task t8is mapped to PE8, which is in the vicinity of processor

PE5and PE7where its parents tasks t6and t7are mapped, respectively—thus, the ﬁnal

mapping obtained by the proposed approach as shown in Figure 2(b). It is observed

that the number of tasks allocated to an unreliable PE is less compared to the reliable

ones. This reduces the overhead of replication of additional tasks if assigned to the

unreliable processors.

5.4.1. Response of FA-DRA to PE Failures.

In this section, we describe the behavior of

the FA-DRA algorithm on the occurrence of faults in PEs. The system on which the

application is mapped may consist of unreliable PEs, that is, PEs with low reliability.

Let us consider the situation given in Figure 2. Using the FA-DRA algorithm, task t6

is assigned to an unreliable PE, PE5.AsPE5is prone to failure, Algorithm 1 makes

a replica of t6,thatis,t∗

6, for safety. Using the replication policy mentioned in Algo-

rithm 1, the cloned copy t∗

6is mapped onto PE2. This cloned task helps to overcome

the unavailability of the original PE and reduce its impact on the execution of the

application.

We explain a fault scenario in which PE5, where t6was originally assigned, becomes

faulty at a time instant, say, t=9. Two strategies can be adopted for re-executing t∗

on the allocated processor PE2. Figure 3(a) shows the active replication strategy where

the replica is scheduled in a parallel manner for execution independent of occurrence of

faults. Under such a scheme, the result of computation of t∗

6is available to compensate

for the loss of data of the original task t6due to failure of PE5. Here, we observe that

t6meets its deadline even though PE5failed. Figure 3(b) shows a passive replication

strategy where the replica t∗

6is executed only after failure of the core. t6starts its

execution at t=6, but owing to the failure of its assigned processor at t=9, it

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:15

Fig. 3. Different fault-tolerant scheduling polices.

remains incomplete. In response to this core failure, task t∗

6starts execution on PE2.

Subsequently, task t∗

6completes at time t=13, violating its deadline. The ﬁnish time

of the application extends by three time units compared to that of Figure 3(a).

6. PERFORMANCE EVALUATION

6.1. Test Setup

We have developed a C++-based discrete event-driven simulator for dynamic mapping

and scheduling of applications onto an NoC-based MPSoC system. The input to the

simulator is the list of applications, the network architecture of the platform, strate-

gies for mapping and scheduling, the routing algorithm, and the fault tolerance policy.

The applications to be mapped are maintained as various input ﬁles. Next, we have al-

lowed the user to select a target platform size with the choice of degree of multitasking

of the constituent PEs (i.e., number of tasks each PE can support). The routing module

of the simulator determines the route as per the selected routing method, by which the

communication of packets between the tasks takes place. Faults are injected randomly

in the PEs and depend on the maximum number of faults that the system can with-

stand without failure. The selective task duplication policy as mentioned in Section 5

has been used for fault tolerance. The simulator is triggered as per the occurrence

of an event. An event is identiﬁed as the arrival of any application or departure of a

completed application from the target platform or failure of a functional PE. Whenever

one such event occurs, the allocation algorithm is executed. We have assumed that

the centralized manager core that controls the dynamic allocation and fault mitigation

is fault free and is always functional during the course of simulation. Depending on

the choice, the corresponding modules for recording the selected performance metrics

of interest are enabled for various test cases. In our case, the primary objective con-

sists of honoring the task deadlines and dynamically minimizing the communication

energy consumption of all running applications while imparting fault tolerance to the

multicore platform.

For each task of application, its execution time requirement has been assigned ran-

domly. For a task ti, the corresponding deadline is allotted as dli=k+exi, where exi

is the task execution time and kis a simulation parameter with positive value. De-

pending on the value of k, the urgency of scheduling for a given task is estimated. The

smaller the value of k, the lesser is the slack time available for scheduling the task.

On the other hand, a large value of kindicates higher slack time availability. This is

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:16 N. Chatterjee et al.

Table I. Test Categories

Category Description

Urgent sli≤0.3×exi

Moderate 0.3×exi<sli≤0.6×exi

Relaxed sli>0.6×exi

Table II. Simulation Parameters Used in ORION 2.0

Parameter Name Val ue

Technology 90nm

Transistor Type NVT

Vdd 1.0V

Router Frequency 500MHz

Flit Width 32

Number of Virtual Channels 2

Input Buffer Size 16

Number of Pipeline Stages 4

Wire Layer Type Global

Wire Width and Spacing DWIDTH_DSPACE

shown in Table I. In case of the Urgent category, the task to be scheduled has a time

window that is only 30% more than the execution time. Similarly, for the Moderate and

Relaxed categories, the corresponding time windows are shown in Table I. Simulations

have been conducted on 1,000 test cases each composed of 100 different test scenarios.

These scenarios are randomly generated and consist of combinations of the applica-

tions of the aforementioned test categories with varying grades of scheduling difﬁculty.

In each of these scenarios, random PE faults are injected and distributed across the

NoC platform. The test cases are investigated, taking into account both single and

multiple PE failures along with the order of their occurrence. The success of a test case

is determined by the number of applications successfully scheduled (i.e., satisﬁed their

deadline in the presence of PE failures). For brevity, 15 representative test results have

been reported.

We have conducted experiments for a 2D mesh-based NoC platform with link energy

(Elink =3.125 ×10−13 J/bit) and router energy (Erouter =5.24 ×10−12 J/bit) derived

from the ORION 2.0 power model [Kahng et al. 2009]. Table II shows the parameter

settings for ORION. For simplicity, we have used a generic ﬁve-port router for energy

evaluation. The model used to determine the communication energy and communi-

cation time is based on the Manhattan distance between the source and destination

node. In case the NoC is heavily loaded (i.e., the trafﬁc is high), the waiting time in

the buffers would rise. A sophisticated delay model would be required to estimate the

communication time and energy spent in packet communication between the PEs. As

the proposed algorithm deals with fault tolerance, such cases fall out of the scope of

this work. However, the congestion-aware delay estimation model given in Chao et al.

[2016] and Carvalho and Moraes [2008] could be used to estimate and mitigate the

network congestion problem.

Simulation has been carried out on an Intel i5 processor running at 3.0GHz fre-

quency. A target platform of an 8×8 NoC with one Manager Core at the center of the

platform has been considered for experiments. We have used XY routing for transmis-

sion of data (in ﬂits) between parent and child tasks (original and duplicate) mapped on

different PEs. The data packets from the original and recovery task copies are distin-

guished at the destination PE using the source address. The proposed algorithms have

been tested on both real benchmarks and synthetic applications. Real benchmarks

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:17

Fig. 4. Effect of task replication on communication energy of the mapped applications.

used in simulation are 263 decoder Mp3 decoder, 263 encoder Mp3 decoder, MPEG,

MWD, and Mp3 encoder Mp3 decoder. Synthetic applications with different topologies

and computation-communication behavior were generated using TGFF [Dick et al.

1998]. Each such application has a number of tasks varying from ﬁve to 64. While

representing the results, the nomenclature used for synthetic applications are given

as TGFFX, where “X” represents the application number. Each task is analyzed based

on its computation-communication behavior. The computation time for each task de-

pends on the PE on which it has been mapped. As we have considered a homogeneous

platform, PEs have identical clock frequency, and the computation energy consumed

by a task is similar for all PEs in the given NoC platform. In our experiments, the

computation time requirement of the tasks has been uniformly distributed between 5

and 300 clock cycles.

6.2. Evaluation of Fault-Aware Dynamic Resource Allocation Algorithm

In this subsection, we present the performance of the proposed algorithm to mitigate the

effect of unreliable PEs on application execution. In the simulation exercises, we have

assumed the unreliable PEs to be 10% of total PEs supported by the given NoC platform.

We also show the overhead incurred by the fault-tolerant policy on the deadline and

energy consumption of the running applications.

By choosing tasks with closer deadlines, the tasks can be completed within their

deadline. However, when tasks with a high volume of data communication demand

resources, they are more likely to get suitable free PEs, provided the already mapped

tasks on such PEs complete their execution. Therefore, giving preference to smaller

time-consuming tasks, when allocating resources to tasks in Re ady Li st , improves the

communication energy consumption. This helps to reduce the hop counts between the

communicating PEs, as the highly communicating tasks are mapped to PEs close to

one another. Thus, the Min_exc function has been used in the proposed algorithm for

fault-tolerant dynamic mapping and scheduling.

6.2.1. Communication Energy.

At ﬁrst, we analyze the behavior of different variations

of task redundancy—active and passive replication—used in the FA-DRA algorithm

for task recovery. In Figure 4, it can be observed that Active Replication (AR) con-

sumes more communication energy than Passive Replication (PR). On average, the

active replication strategy resulted in 35% more communication energy of the mapped

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:18 N. Chatterjee et al.

Fig. 5. Task deadline satisfaction for different task replication techniques in FA-DRA.

Fig. 6. Change in application reliability with faults.

application compared to that of passive replication. This is due to the fact that the

recovery tasks under the active replication strategy are always operating in the back-

ground. This increases the use of network routers and links of the NoC platform along

with the computation energy of the cores on which the replicated tasks are placed. In

contrast, cloned tasks under the passive replication scenario being activated only in

case of occurrence of faults gives a reduced increase in communication energy.

6.2.2. Deadline Performance.

The deadline performance of the FA-DRA algorithm under

active and passive replication policies is shown in Figure 5. We observe that the number

of tasks ﬁnishing execution within the deadline is, on average, 45.73% more in case

of active replication compared to that of passive replication. This is attributed to the

time of failure of a particular PE, which is not known a priori. If a task tiexecuting on

PEjfails at some time instant tand thappens to be very close to the ﬁnish time of the

task, it has a small time margin left to satisfy its deadline when it is activated on PE

fault detection. As a result, tasks having stringent deadlines fail to complete within

the given time. On the contrary, recovery tasks under active replication are scheduled

in parallel with the original task, which lowers the occurrence of deadline misses.

6.2.3. Effect on Application Reliability.

The application reliability for a given application

on an NoC platform is given by Equation (11). From Figure 6, we observe that as the

number of unreliable PEs increases, the initial reliability of execution of the applica-

tion drops. The proposed algorithm successfully alleviates the drop in reliability by

incorporating recovery copies for vulnerable tasks, as depicted in Figure 6. The tasks

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:19

Fig. 7. Average number of tasks restarted.

executing on the unreliable PEs are duplicated and allotted to PEs having high relia-

bility values. This enhances the reliability by exploiting the parallelism between the

duplicated tasks.

Thus, it is seen from the analysis that the fault-tolerant policy of the proposed FA-

DRA algorithm can potentially reduce the communication energy of the reconﬁgured

application and also satisﬁes the deadline constraint of the given task set. Moreover,

if the objective of the user is to reduce communication energy, the Manager Core im-

plements passive replication of the tasks placed on unreliable PEs. In case of deadline-

constrained application, active replication is implemented. As in dynamic scenarios the

application characteristics are not known a priori, the fault-tolerant policy for dynamic

resource allocation is decided by the user. Also, FA-DRA results in improvement in

application reliability.

6.3. Comparison with Existing Works

Next, we compare the quality of the solution achieved by the algorithm proposed in

this work with the solution given by similar algorithms [Das et al. 2015, 2013; Chou

and Marculescu 2011; Khalili and Zarandi 2013] reported in literature. Results on both

real and synthetic benchmarks have been shown to demonstrate the effectiveness of

the solution obtained in this work. The number of faulty PEs varies between two and

six and the results are averaged over all PE fault cases.

6.3.1. Number of Tasks Restarted.

We have compared the impact of the FA-DRA algo-

rithm and the Full Re-map [Das et al. 2015, 2013] policy, henceforth called FR, on

the re-execution of tasks when responding to the failed cores. Figure 7 represents the

average number of tasks restarted when FR and FA-DRA policies are used for each of

the application task graphs. We observe that the FR policy results in a higher number

of tasks needing to be restarted as compared to the policy used in FA-DRA. This is

because, in case of FR policy, all tasks of the given application are re-executed, in-

cluding those being executed by reliable cores, after reconﬁguring the application to

new mapping. On the other hand, task recovery in FA-DRA reduces such restarting

overhead by using cloned tasks. The requirement of restarting the tasks occurs both

in active and passive replication strategies when the tasks running on the failed cores

have no replica. This happens when the failed cores are different from the one pre-

dicted by the reliability model in Section 6. Particularly, for passive replication, tasks

re-execute in the form of its recovery copy when an unreliable PE executing it fails. On

average, the task re-execution overhead in the passive replication strategy is 43.53%

more compared to that of the active replication strategy. But both of these techniques

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:20 N. Chatterjee et al.

Fig. 8. Comparison of deadline performance.

when implemented in the FA-DRA algorithm give a 56.95% average reduction in task

re-execution as compared to the FR policy.

The results of deadline performance are shown in Figure 8. The number of tasks

restarted affects the deadline satisfaction of the application. In the FR policy [Das

et al. 2015, 2013], remapping the tasks of applications to reliable PEs during failure of

currently assigned PEs incurs time overhead. This is attributed to transferring the task

state space and resuming its execution on the different PE. This is reﬂected from the

results obtained in Section 6.2.1. The Failure-Aware Spare Allocation (FASA) [Khalili

and Zarandi 2013] technique resumes the execution of task(s) affected by PE failure

at the allocated spare core. This lowers the re-execution overhead of other tasks for

the given application. Consequently, the deadline performance improves as compared

to the FR policy. On the other hand, the proposed scheme when implemented with the

AR policy further lowers the number of tasks restarted (depicted in Figure 7). Active

replicas being scheduled in parallel with the original tasks ensures that the tasks

complete execution within their deadline even in the event of PE failure. Experimental

results show that, on average, a 47% improvement in task deadline satisfaction is

achieved with respect to the FASA [Khalili and Zarandi 2013] scheme. When compared

to the FR policy [Das et al. 2015, 2013], the proposed method achieves the best-case

improvement in deadline of about 82%.

6.3.2. Energy Consumption.

A processor is considered to be faulty if it cannot be used to

execute tasks anymore.

Tasks on failed processors are migrated to an available processor for re-execution.

Thus, Reconﬁguration Energy is deﬁned as the cost of migrating a task tifrom the failed

processor PE fail to recovery processor PErecover . This is represented in Equation (14)

[Maqsood et al. 2015]:

Ereconf ig (ti)=size(ti)×HCPEti

fail,PEti

recover ×El+HCPEti

fail,PEti

recover +1×Er.

(14)

Here, size(ti) represents the size of task ti, which includes both code and data. Therefore,

the total energy spent in reconﬁguring an executing application is given by

Etotal

reconf ig =

∀i|ti∈T

Ereconf ig (ti).(15)

As depicted in Equation (15), the number of tasks restarted by the aforementioned poli-

cies affect the total reconﬁguration energy. Figure 9 depicts the reconﬁguration energy

overhead normalized with respect to the FR strategy. We observe that the proposed

FA-DRA algorithm results in 55.1% reduced reconﬁguration energy as compared to the

FR policy. This is due to the fact that the FR policy recomputes the allocation for all

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:21

Fig. 9. Reconﬁguration energy spent in fault-tolerant policies.

Fig. 10. Communication energy comparison with different fault-tolerant policies.

tasks of the running application upon PE failures. As a result, more energy is spent on

migrating the tasks to newly allocated PEs. On the other hand, the FA-DRA policy, due

to the use of task redundancy, avoids such overheads by executing the task replicas to

recover the affected task(s) in the event of PE failures. However, when tasks executing

on a failed PE do not have any replica, the application is reconﬁgured by reallocating

the tasks. Therefore, the requirement of reconﬁguring the entire application is reduced,

which in turn saves the reconﬁguration energy.

Figure 10 shows a comparison of the ﬁnal value of average communication energy

consumption after reconﬁguring the application. The values are normalized with re-

spect to the FARM [Chou and Marculescu 2011] technique. It is observed that FASA

[Khalili and Zarandi 2013] results in, on average, 9.4% lesser energy consumption. This

is because FASA determines the placement of spare cores dynamically for each appli-

cation. This is in contrast to the FARM technique, which allocates a ﬁxed number of

spare core(s) at a predeﬁned location on the NoC platform for all applications. As FASA

considers the application characteristics while allocating the spare cores, it results in

improved performance, compared to FARM. When the strategy of Das et al. [2015] is

used with the initial mapping given in Das and Kumar [2012], the resultant mapping

of tasks shows a further reduction in communication energy by 42% as compared to

FASA. This is due to improved mapping of communicating tasks on the same or nearby

PEs. The proposed FA-DRA technique further improves the communication energy by

31% on average by exploiting the timing information of the tasks while allocating them

to PEs. It allocates heavily communicating tasks closer in terms of hop count. Due to

this, the data packets need to traverse fewer intermediate routers and links, giving

additional savings in communication energy compared to the aforementioned policies.

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:22 N. Chatterjee et al.

Thus, the observations show that for designing an energy-efﬁcient fault-tolerant mul-

tiprocessor system, both the resource allocation strategy and fault-recovery mechanism

affect the postfailure system performance.

7. CONCLUSIONS AND FUTURE WORKS

In this work, we have presented an improved fault-tolerant dynamic resource allocation

policy. The proposed algorithm presents a uniﬁed mapping and scheduling method

for real-time systems focusing on the application deadline and communication energy

while mitigating the effect of failure-prone PEs on application execution. By selectively

using a task replication policy, the reliability of the application executing on the given

NoC platform is improved.

We conducted a detailed evaluation of the performance of the algorithm on both real

and synthetic benchmark applications. On the basis of simulation results, it can be con-

cluded that the proposed algorithm exhibits better performance in terms of deadline

satisfaction, task re-execution overhead, and communication energy when compared

with other fault-tolerant algorithms to perform processor fault-aware task assignment.

Results show that using task slack time while placing the task replicas gives notable

improvements in communication energy, alleviating the effects of faults on the execut-

ing application. In addition, the proposed algorithm is able to honor the deadline for

most of the test cases. For fault-tolerant dynamic systems, where satisfaction of the en-

ergy constraint is necessary, the proposed algorithm is attractive to do online resource

allocation. The future work can be broadly classiﬁed in three dimensions: (1) placement

and fault tolerance of manager core, (2) trafﬁc management depending on the workload,

and (3) extension of present work considering a heterogeneous multicore platform.

REFERENCES

R. Ahmed, P. Ramanathan, and K. K. Saluja. 2014. Necessary and sufﬁcient conditions for thermal schedu-

lability of periodic real-time tasks. In Proceedings of the 2014 26th Euromicro Conference on Real-Time

Systems (ECRTS’14). 243–252. DOI:http://dx.doi.org/10.1109/ECRTS.2014.15

K. Ahn, J. Kim, and S. Hong. 1997. Fault-tolerant real-time scheduling using passive replicas. In Proceed-

ings of the Paciﬁc Rim International Symposium on Fault-Tolerant Systems, 1997. 98–103. DOI:http://

dx.doi.org/10.1109/PRFTS.1997.640132

O. Arnold and G. Fettweis. 2011. Resilient dynamic task scheduling for unreliable heterogeneous MPSoCs.

2011 Semiconductor Conference Dresden, Dresden, 1–4. DOI:10.1109/SCD.2011.6068747

L. Benini and G. De Micheli. 2002. Networks on chips: A new SoC paradigm. Computer 35, 1 (Jan. 2002),

70–78. DOI:http://dx.doi.org/10.1109/2.976921

F. Bolanos, F. Rivera, J. E. Aedo, and N. Bagherzadeh. 2013. From UML speciﬁcations to mapping and

scheduling of tasks into a NoC, with reliability considerations. J. Syst. Archit. 59, 7 (Aug. 2013), 429–

440.

S. Borkar, T. Karnik, and V. De. 2004. Design and reliability challenges in nanometer technologies. In Proceed-

ings of the 41st Annual Design Automation Conference (DAC’04). ACM, New York, NY, 75–75. DOI:http://

dx.doi.org/10.1145/996566.996588

E. Carvalho and F. Moraes. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In Proceed-

ings of the International Symposium on System-on-Chip, 2008 (SOC’08). 1–4. DOI:http://dx.doi.org/

10.1109/ISSOC.2008.4694878

Y. C. Chang, C. T. Chiu, S. Y. Lin, and C. K. Liu. 2011. On the design and analysis of fault tolerant NoC archi-

tecture using spare routers. In Proceedings of the 2011 16th Asia and South Paciﬁc Design Automation

Conference (ASP-DAC’11). 431–436. DOI:http://dx.doi.org/10.1109/ASPDAC.2011.5722228

H.-L. Chao, S.-Y. Tung, and P.-A. Hsiung. 2016. Dynamic task mapping with congestion speculation for

reconﬁgurable network-on-chip. ACM Trans. Reconﬁg. Technol. Syst. 10, 1, Article 3 (Sept. 2016), 25

pages. DOI:http://dx.doi.org/10.1145/2892633

N. Chatterjee, N. Prasad, and S. Chattapadhyay. 2014. A spare link based reliable network-on-chip design.

In Proceedings of the 18th International Symposium on VLSI Design and Test. 1–6. DOI:http://dx.doi.org/

10.1109/ISVDAT.2014.6881036

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Fault-Tolerant Dynamic Task Mapping and Scheduling for NoC-Based Multicore Platform 108:23

C. L. Chou and R. Marculescu. 2011. FARM: Fault-aware resource management in NoC-based multiprocessor

platforms. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’11).

1–6. DOI:http://dx.doi.org/10.1109/DATE.2011.5763113

C. Constantinescu. 2002. Impact of deep submicron technology on dependability of VLSI circuits. In Pro-

ceedings of the International Conference on Dependable Systems and Networks, 2002 (DSN’02). 205–209.

DOI:http://dx.doi.org/10.1109/DSN.2002.1028901

C. Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4 (July 2003),

14–19. DOI:http://dx.doi.org/10.1109/MM.2003.1225959

A. Das and A. Kumar. 2012. Fault-aware task re-mapping for throughput constrained multimedia applica-

tions on NoC-based MPSoCs. In Proceedings of the 2012 23rd IEEE International Symposium on Rapid

System Prototyping (RSP’12). 149–155. DOI:http://dx.doi.org/10.1109/RSP.2012.6380704

A. Das, A. Kumar, and B. Veeravalli. 2013. Reliability-driven task mapping for lifetime extension of networks-

on-chip based multiprocessor systems. In Proceedings of the Design, Automation Test in Europe Confer-

ence Exhibition (DATE’13). 689–694. DOI:http://dx.doi.org/10.7873/DATE.2013.149

A. Das, A. Kumar, and B. Veeravalli. 2014. Communication and migration energy aware task map-

ping for reliable multiprocessor systems. Future Generation Comput. Syst. 30 (2014), 216–228.

DOI:http://dx.doi.org/10.1016/j.future.2013.06.016 Special Issue on Extreme Scale Parallel Architectures

and Systems, Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Sys-

tems, {ICPADS} 2012 Selected Papers.

A. Das, A. K. Singh, and A. Kumar. 2013. Energy-aware dynamic reconﬁguration of communication-centric

applications for reliable MPSoCs. In Proceedings of the 2013 8th International Workshop on Reconﬁg-

urable and Communication-Centric Systems-on-Chip (ReCoSoC’13). 1–7. DOI:http://dx.doi.org/10.1109/

ReCoSoC.2013.6581540

A. Das, A. Kumar Singh, and A. Kumar. 2015. Execution trace–driven energy-reliability optimization for mul-

timedia MPSoCs. ACM Trans. Reconﬁg. Technol. Syst. 8, 3, Article 18 (May 2015), 19 pages. DOI:http://dx.

doi.org/10.1145/2665071

R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International

Workshop on Hardware/Software Codesign. IEEE Computer Society, 97–101.

P. Eles, V. Izosimov, P. Pop, and Z. Peng. 2008. Synthesis of fault-tolerant embedded systems. In Proceed-

ings of the Design, Automation and Test in Europe, 2008 (DATE’08). 1117–1122. DOI:http://dx.doi.org/

10.1109/DATE.2008.4484825

D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw. 2009. A highly resilient routing

algorithm for fault-tolerant NoCs. In Proceedings of the Conference on Design, Automation and Test in

Europe (DATE’09). European Design and Automation Association, Belgium, 21–26. http://dl.acm.org/

citation.cfm?id=1874620.1874628

M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. L. Rawson, J. C. Rubio, and M. S. Ware. 2007. System

power management support in the IBM POWER6 microprocessor. IBM J. Res. Devel. 51, 6 (Nov. 2007),

733–746. DOI:http://dx.doi.org/10.1147/rd.516.0733

H. Hajimiri, S. Paul, A. Ghosh, S. Bhunia, and P. Mishra. 2011. Reliability improvement in multicore

architectures through computing in embedded memory. In Proceedings of the 2011 IEEE 54th Interna-

tional Midwest Symposium on Circuits and Systems (MWSCAS’11). 1–4. DOI:http://dx.doi.org/10.1109/

MWSCAS.2011.6026672

J. Huang, J. O. Blech, A. Raabe, C. Buckl, and A. Knoll. 2011. Analysis and optimization of fault-tolerant

task scheduling on multiprocessor embedded systems. In Proceedings of the 2011 Proceedings of the 9th

International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11).

247–256. DOI:http://dx.doi.org/10.1145/2039370.2039409

Intel. 2008. Quad-Core Intel Xeon Processor 5400 Series. Retrieved from http://www.intel.com/Assets/enUS/

PDF/datasheet/318589.pdf.

A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area

model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe

Conference Exhibition, 2009 (DATE’09). 423–428. DOI:http://dx.doi.org/10.1109/DATE.2009.5090700

F. Khalili and H. R. Zarandi. 2013. A fault-tolerant core mapping technique in networks-on-chip. IET Comput.

Digital Techniques 7, 6 (Nov. 2013), 238–245. DOI:http://dx.doi.org/10.1049/iet-cdt.2013.0032

A. Kohler, G. Schley, and M. Radetzki. 2010. Fault tolerant network on chip switching with graceful perfor-

mance degradation. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 29, 6 (June 2010), 883–896.

DOI:http://dx.doi.org/10.1109/TCAD.2010.2048399

S. Kundu and S. Chattopadhyay. 2014. Network-on-Chip: The Next Generation of System-on-Chip Integration.

CRC Press.

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

108:24 N. Chatterjee et al.

C. Lee, H. Kim, H. W. Park, S. Kim, H. Oh, and S. Ha. 2010. A task remapping technique for reliable

multi-core embedded systems. In Proceedings of the 2010 IEEE/ACM/IFIP International Conference on

Hardware/Software Codesign and System Synthesis (CODES+ISSS’10). 307–316.

F. Liberato, R. Melhem, and D. Mosse. 2000. Tolerance to multiple transient faults for aperiodic tasks

in hard real-time systems. IEEE Trans. Comput. 49, 9 (Sept. 2000), 906–914. DOI:http://dx.doi.org/

10.1109/12.869322

T. Maqsood, S. Ali, S. U. R. Malik, and S. A. Madani. 2015. Dynamic task mapping for network-on-chip based

systems. J. Syst. Architecture 61, 7 (2015), 293–306.

Y. Ren, L. Liu, S. Yin, J. Han, and S. Wei. 2015. Efﬁcient fault-tolerant topology reconﬁguration using a

maximum ﬂow algorithm. ACM Trans. Reconﬁg. Technol. Syst. 8, 3, Article 19 (May 2015), 24 pages.

DOI:http://dx.doi.org/10.1145/2700417

E. Schuchman and T. N. Vijaykumar. 2005. Rescue: A microarchitecture for testability and defect tolerance.

In Proceedings of the 32nd International Symposium on Computer Architecture, 2005 (ISCA’05). 160–171.

DOI:http://dx.doi.org/10.1109/ISCA.2005.44

S. Shamshiri and K. T. Cheng. 2011. Modeling yield, cost, and quality of a spare-enhanced multicore chip.

IEEE Trans. Comput. 60, 9 (Sept. 2011), 1246–1259. DOI:http://dx.doi.org/10.1109/TC.2011.32

S. Shamshiri, P. Lisherness, S. J. Pan, and K. T. Cheng. 2008. A cost analysis framework for multi-core

systems with spares. In Proceedings of the IEEE International Test Conference, 2008 (ITC’08). 1–8.

DOI:http://dx.doi.org/10.1109/TEST.2008.4700562

P. Shivakumar, S. W. Keckler, C. R. Moore, and D. Burger. 2012. Exploiting microarchitectural redundancy

for defect tolerance. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 35–42.

DOI:http://dx.doi.org/10.1109/ICCD.2012.6378613

O. Sinnen. 2007. Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing).

Wiley-Interscience.

C. Wang, J. Wu, G. Jiang, and J. Sun. 2013. An efﬁcient topology reconﬁguration algorithm for NOC based mul-

tiprocessorarrays.InProceedings of the 2013 IEEE 10th International Conference on High Performance

Computing and Communications, 2013 IEEE International Conference on Embedded and Ubiquitous

Computing (HPCC EUC’13). 873–880. DOI:http://dx.doi.org/10.1109/HPCC.and.EUC.2013.125

C. Yang and A. Orailoglu. 2007. Predictable execution adaptivity through embedding dynamic reconﬁg-

urability into static MPSoC schedules. In Proceedings of the 2007 5th IEEE/ACM/IFIP International

Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 15–20.

C. Yao, K. K. Saluja, and P. Ramanathan. 2011. Calibrating on-chip thermal sensors in integrated cir-

cuits: A design-for-calibration approach. J. Electronic Testing 27, 6 (2011), 711–721. DOI:http://dx.

doi.org/10.1007/s10836-011-5253-4

L. Zhang, Y. Han, Q. Xu, and X. Li. 2008. Defect tolerance in homogeneous manycore processors using core-

level redundancy with uniﬁed topology. In Proceedings of the Design, Automation and Test in Europe,

2008 (DATE’08). 891–896. DOI:http://dx.doi.org/10.1109/DATE.2008.4484787

L. Zhang, Y. Han, Q. Xu, X. W. Li, and H. Li. 2009. On topology reconﬁguration for defect-tolerant noc-based

homogeneous manycore systems. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 17, 9 (Sept.

2009), 1173–1186. DOI:http://dx.doi.org/10.1109/TVLSI.2008.2002108

Received July 2016; revised February 2017; accepted February 2017

ACM Transactions on Embedded Computing Systems, Vol. 16, No. 4, Article 108, Publication date: May 2017.

Energy Efficient, Real-time and Reliable Task Deployment on NoC-based Multicores with DVFS

Conference Paper

Full-text available

Dec 2021

Task deployment plays an important role in the overall system performance, especially for complex architectures, including several cores with Dynamic Voltage and Frequency Scaling (DVFS) and Network-on-Chips (NoC). Task deployment affects not only the energy consumption but also the real-time response and reliability of the system. In this work, a task deployment approach is proposed to optimize the overall system energy consumption, including computation of the cores and communication of the NoC, under task reliability and real-time constraints. More precisely, the task deployment approach combines task allocation and scheduling, frequency assignment, task duplication, and multi-path data routing. The task deployment problem is formulated using mixed-integer non-linear programming. To find the optimal solution, the original problem is equivalently transformed to mixed-integer linear programming, and solved by state-of-the-art solvers. Furthermore, a decomposition-based heuristic, with low computational complexity, is proposed to deal with scalability. Finally, extended simulations evaluate the proposed methods.

High-performance application mapping in network-on-chip-based multicore systems

Article

Full-text available

May 2024
J SUPERCOMPUT

Md Farhadur Reza

The allocation of resources and scheduling of tasks, specifically mapping, in multicore systems on-chip (MCSoC), poses significant challenges. Tasks have diverse resource requirements and interact with each other, while network-on-chip (NoC)-based MCSoC consists of heterogeneous cores and communication networks. The heterogeneity of resources in MCSoC, along with the varying computational and communication demands of applications, makes mapping a complex optimization problem. We have mathematically modeled the mapping problem in NoC-based MCSoC using mixed-integer linear programming (MILP) with the objective of minimizing the execution time of applications. This model incorporates computation and communication capacity, energy budget constraints of MCSoC, and execution time requirements of applications. We further propose heuristics, including simulated annealing (SA) and genetic algorithm (GA), considering the capacity and budget constraints of MCSoC systems to accelerate applications and provide quick mapping solutions. Simulation results demonstrate that the performance from SA and GA heuristics is very close (within 10%) to the optimal solutions from MILP across various applications for 2D-Mesh NoCs with 16–100 cores. The energy consumption of SA and GA heuristics is also very close to the optimal solutions from MILP, with a few exceptions on small-scale 16-core NoC. Additionally, SA outperforms GA in most cases for all applications.

High-Performance Application Mapping in Network-on-Chip based Multicore Systems

Preprint

Full-text available

Feb 2024

Md Farhadur Reza

The allocation of resources and scheduling of tasks, specifically mapping, in multicoresystems on-chip (MCSoC), poses significant challenges. Tasks have diverse resourcerequirements and interact with each other, while network-on-chip (NoC)-based MCSoCconsists of heterogeneous cores and communication networks. The heterogeneity ofresources in MCSoC, along with the varying computational and communication demandsof applications, makes mapping a complex optimization problem. We have mathematicallymodeled the mapping problem in NoC-based MCSoC using mixed-integer linearprogramming (MILP) with the objective of minimizing the execution time of applications.This model incorporates computation and communication capacity, energy budget constraintsof MCSoC, and execution time requirements of applications. We further proposeheuristics, including simulated annealing (SA) and genetic algorithm (GA), consideringthe capacity and budget constraints of MCSoC systems to accelerate applications and providequick mapping solutions. Simulation results demonstrate that the performance fromSA and GA heuristics are very close (within 10%) to the optimal solutions from MILPacross various applications for 2D-mesh NoCs with 16 to 100 cores. The energy consumptionof SA and GA heuristics is also very close to the optimal solutions from MILP, witha few exceptions on small-scale 16-core NoC. Additionally, SA outperforms GA in mostcases for all applications.

Reliability-aware intelligent mapping based on reinforcement learning for networks-on-chips

Article

Full-text available

Jun 2022
J SUPERCOMPUT

Designing reliable Networks on Chip (NoCs) is critical, especially with the continuous scaling down of integrated circuit technology, which exposes NoCs to various types of faults. In this paper, a reliability-aware application mapping technique for improving the reliability of heterogeneous NoCs is proposed. It is based on a hybridization of the Multi-Objective Particle Swarp Optimization (MOPSO) algorithm and Reinforcement Learning (RL). At design time, MOPSO and RL perform the optimization of an initial mapping and the prediction of fault-tolerant remapping scenarios, respectively. To teach and to produce an intelligent agent capable of generating an optimal adaptive remapping scheme to address run-time permanent processing element (PE) failures. Two models of RL agents are trained, each based on a different mechanism of task migration: 1) step-based agent and 2) swap-based agent. Experiments were carried out to assess the performance of our innovative technique on various sizes of NoCs, using real benchmarks and varying the levels of heterogeneity and failure in the NoC. The results of the experiments reveal that using RL to solve the reliability problem in NoCs yields interesting results in terms of reliability, energy consumption, execution time, and cost migration.

Communication and aging aware application mapping for multicore based edge computing servers

Article

Full-text available

Mar 2022
CLUSTER COMPUT

Technology advancement in semiconductors enables integration of large number of cores on a single chip that leads to the design and development of Multi-Processor System on Chip (MPSoC). In Network-on-Chip (NoC) based MPSoCs dozens and even 100s of cores exist on single chip. To improve the energy efficiency of such systems, workload consolidation has been extensively used. NoC based multicore systems face various challenges due to energy efficient workload consolidation. Workload consolidation leads to imbalanced task mapping that increases the power density of the chip. Consequently, NoC based MPSoCs suffer from overheating and thermal issues leading to faster aging of certain processing cores, which results in lower average lifetime reliability of the system. The uneven aging of some cores exacerbates time constraints that may further affect lifetime reliability of critical systems, such as embedded medical cyber physical systems. Existing reliability aware task mapping algorithms attempt to increase lifetime of MPSoCs by avoiding hotspots that increase communication overhead leading to application performance degradation. Therefore, in this article we propose algorithmic solutions that optimize aging of the system while reducing the communication overhead. Experiments show that proposed communication and aging aware algorithm reduces the communication overhead while marginally sacrificing lifetime reliability compared with some of the baseline and state-of-the-art techniques.

A Survey of Software-Defined Networks-on-Chip: Motivations, Challenges and Opportunities

Article

Full-text available

Feb 2021

Current computing platforms encourage the integration of thousands of processing cores, and their interconnections, into a single chip. Mobile smartphones, IoT, embedded devices, desktops, and data centers use Many-Core Systems-on-Chip (SoCs) to exploit their compute power and parallelism to meet the dynamic workload requirements. Networks-on-Chip (NoCs) lead to scalable connectivity for diverse applications with distinct traffic patterns and data dependencies. However, when the system executes various applications in traditional NoCs—optimized and fixed at synthesis time—the interconnection nonconformity with the different applications’ requirements generates limitations in the performance. In the literature, NoC designs embraced the Software-Defined Networking (SDN) strategy to evolve into an adaptable interconnection solution for future chips. However, the works surveyed implement a partial Software-Defined Network-on-Chip (SDNoC) approach, leaving aside the SDN layered architecture that brings interoperability in conventional networking. This paper explores the SDNoC literature and classifies it regarding the desired SDN features that each work presents. Then, we described the challenges and opportunities detected from the literature survey. Moreover, we explain the motivation for an SDNoC approach, and we expose both SDN and SDNoC concepts and architectures. We observe that works in the literature employed an uncomplete layered SDNoC approach. This fact creates various fertile areas in the SDNoC architecture where researchers may contribute to Many-Core SoCs designs.

Sixer: A low-overhead, fully-distributed test scheme with guaranteed delivery of packets in networks-on-chip

Article

Mar 2023
MICROELECTRON RELIAB

Biswajit R Bhowmik

Statistical traffic pattern for mixed torus topology and pathfinder based traffic and thermal aware routing protocol on NoC

Article

Jul 2022
INTEGRATION

The network-on-chip (NoC) is an intra-communication on the chip extended from system-on-chip (SoC). The NoC design suffers from high failure rates due to the problem of routing in the traffic conditioning. In this paper, a realistic traffic pattern is used to verify the proposed routing scheme of the NoC with the novel design topology. Here a topology named mixed-torus topology is designed with the combination of processing block and the non-processing block that has the information about the processing block, which are associated with its torus block. The designed topology is presented with the encoder application model, and the set of tasks are mapped and scheduled to the processing block by the spider monkey optimization algorithm. Then the routing of data through the non-processing block is done by the pathfinder based traffic and thermal aware adaptive routing protocol (PFTTAR), which will perform the path diversity phase and the path selection phase for better routing. The designed NoC is then treated with the statistical traffic pattern generation for the analytical study of the proposed work and is implemented in Xilinx.

Intrinsic Random Optical Features of the Electronic Packages as Physical Unclonable Functions for Internet of Things Security

Article

Full-text available

Dec 2021

The increasing security threat is a serious challenge to the internet of things (IoT). Hardware-based security primitive is an essential and powerful way to protect IoT devices from various attacks. But most of the current security hardwares are based on macrophysical features, which are usually produced by reproducible deterministic processes and can be copied by counterfeiters. Herein, a physical unclonable function (PUF) with high robustness based on the intrinsic random micro-/nanostructures of the electronic packages is proposed thereby demonstrating a low-cost and label-free hardware security solution for IoT. Using the unique surface micropattern and the spatially coded laser scattering, the proposed PUFs can be used as anticounterfeiting labels, authentication tokens and cryptographic key generators. With the help of the proposed PUFs, the safety protection is still effective even when the attacker is able to get access to the database because the secret keys are inherently hidden in the complex microscopic stochastic physical features of the PUFs but not the database. The proposed package-enabled PUFs provide a promising and practical physical protection solution for IoT security.

Fault-Tolerant Application Mapping on Mesh-of-Tree based Network-on-Chip

Article

Jan 2021
J SYST ARCHITECT

Network-on-Chip (NoC) has been considered as an efficient communication infrastructure to support Multi-Processor System-on-Chip (MPSoC) design requirements for future generation computing. With the increased application requirements and rapid scale down in VLSI technology, the probability of forming thermal hotspots in the processing elements is relatively high which may lead to system failure. Therefore, efficient fault-tolerant techniques are required to improve the reliability of a system by addressing the failures that may occur at different component levels in NoC. This paper proposes fault-tolerant NoC design methodologies to address the core failures that may occur in an application. An Integer Linear Programming (ILP) based mathematical formulation and Particle Swarm Optimization (PSO) based evolutionary approach have been proposed to perform fault-tolerant mapping using spare cores onto the Mesh-of-Tree (MoT) network. In the event of core failures, spare cores are used to enhance system reliability. Most of the approaches have fixed the position of spare core in the networks while performing fault-tolerant application-mapping onto NoCs. In contrast to fixing the position of spare core in MoT networks, flexibility is provided by our approaches ILP and PSO to place the spare core in the network. We have experimented with multimedia applications and synthetic applications generated using TGFF tool in static and dynamic environments. In static environment the experiments are performed (a) by scaling the MoT network size with fixed router fault-percentage, (b) by varying the router fault-percentage in the MoT network, and (c) by considering multiple failed cores in the application. In dynamic environment the experiments are carried out using cycle-accurate NoC simulator and performance parameters namely network latency, throughput, and power consumption are analysed. The experimental results have shown significant improvements using our approach over the approaches proposed in the literature.

A spare link based reliable Network-on-Chip design

Conference Paper

Full-text available

Jul 2014

In this paper we have presented a reliable On-chip interconnection network design using spare links. It helps to mitigate the problem of fault chain formation due to failure of boundary links. The modified router design uses the redundant ports in boundary routers along with spare links for establishing connection with adjacent routers in case of link faults. This design modification on mesh based network along with proposed routing algorithm improves system reliability in case of single and multiple link failures. The performance evaluation in terms of network latency has also been improved compared to recent works with minimal area overhead.

Network-on-chip: The next generation of system-on-chip integration

Book

Jan 2014

Addresses the Challenges Associated with System-on-Chip Integration Network-on-Chip: The Next Generation of System-on-Chip Integration examines the current issues restricting chip-on-chip communication efficiency, and explores Network-on-chip (NoC), a promising alternative that equips designers with the capability to produce a scalable, reusable, and high-performance communication backbone by allowing for the integration of a large number of cores on a single system-on-chip (SoC). This book provides a basic overview of topics associated with NoC-based design: communication infrastructure design, communication methodology, evaluation framework, and mapping of applications onto NoC. It details the design and evaluation of different proposed NoC structures, low-power techniques, signal integrity and reliability issues, application mapping, testing, and future trends. Utilizing examples of chips that have been implemented in industry and academia, this text presents the full architectural design of components verified through implementation in industrial CAD tools. It describes NoC research and developments, incorporates theoretical proofs strengthening the analysis procedures, and includes algorithms used in NoC design and synthesis. In addition, it considers other upcoming NoC issues, such as low-power NoC design, signal integrity issues, NoC testing, reconfiguration, synthesis, and 3-D NoC design. This text comprises 12 chapters and covers: The evolution of NoC from SoC—its research and developmental challenges NoC protocols, elaborating flow control, available network topologies, routing mechanisms, fault tolerance, quality-of-service support, and the design of network interfaces The router design strategies followed in NoCs The evaluation mechanism of NoC architectures The application mapping strategies followed in NoCs Low-power design techniques specifically followed in NoCs The signal integrity and reliability issues of NoC The details of NoC testing strategies reported so far The problem of synthesizing application-specific NoCs Reconfigurable NoC design issues Direction of future research and development in the field of NoC Network-on-Chip: The Next Generation of System-on-Chip Integration covers the basic topics, technology, and future trends relevant to NoC-based design, and can be used by engineers, students, and researchers and other industry professionals interested in computer architecture, embedded systems, and parallel/distributed systems.

Wiley Series on Parallel and Distributed Computing

Chapter

Sep 2006

Oliver Sinnen

Dynamic Task Mapping with Congestion Speculation for Reconfigurable Network-on-Chip

Article

Sep 2016

Network-on-Chip (NoC) has been proposed as a promising communication architecture to replace the dedicated interconnections and shared buses for future embedded system platforms. In such a parallel platform, mapping application tasks to the NoC is a key issue because it affects throughput significantly due to the problem of communication congestion. Increased communication latency, low system performance, and low resource utilization are some side-effects of a bad mapping. Current mapping algorithms either do not consider link utilizations or consider only the current utilizations. Besides, to design an efficient NoC platform, mapping task to computation nodes and scheduling communication should be taken into consideration. In this work, we propose an efficient algorithm for dynamic task mapping with congestion speculation (DTMCS) that not only includes the conventional application mapping, but also further considers future traffic patterns based on the link utilization. The proposed algorithm can reduce overall congestion, instead of only improving the current packet blocking situation. Our experiment results have demonstrated that compared to the state-of-the-art congestion-aware Path Load algorithm, the proposed DTMCS algorithm can reduce up to 40.5% of average communication latency, while the maximal communication latency can be reduced by up to 67.7%.

Orion 2.0: A fast and accurate NoC power and area model for early-stage design space exploration

Article

Jan 2009

Necessary and Sufficient Conditions for Thermal Schedulability of Periodic Real-Time Tasks

Conference Paper

Jul 2014

With growing need to address the thermal issues in modern processing platforms various performance throttling schemes have been proposed in literature (DVFS, clock gating etcetera). In real-time systems such methods are often unacceptable as they can result into potentially catastrophic deadline misses. As a result real-time scheduling research has been focused in developing algorithms which meet the compute deadline while satisfying power and thermal constraints. Basic bounds that can determine if a set of tasks can be scheduled or not were established in the 70's based on computation utilization of processing power and no new results have been forthcoming that deal with thermal effect based bounds. In this paper we address the problem of thermal constraint schedulability of tasks and derive necessary and sufficient conditions for thermal feasibility of periodic task sets for a unicore system. We then extend some of these results to multi-coreprocessing environment. We demonstrate the efficacy of our results through extensive simulations.

dynamic task mapping

Article

Jun 2015
J SYST ARCHITECT

Efficiency of Network-on-Chip (NoC) based multi-processor systems largely depends on optimal placement of tasks onto processing elements (PEs). Although number of task mapping heuristics have been proposed in literature, selecting best technique for a given environment remains a challenging problem. Keeping in view the fact that comparisons in original study of each heuristic may have been conducted using different assumptions, environment, and models. In this study, we have conducted a detailed quantitative analysis of selected dynamic task mapping heuristics under same set of assumptions, similar environment, and system models. Comparisons are conducted with varying network load, number of tasks, and network size for constantly running applications. Moreover, we propose an extension to communication-aware packing based nearest neighbor (CPNN) algorithm that attempts to reduce communication overhead among the interdependent tasks. Furthermore, we have conducted formal verification and modeling of proposed technique using high level Petri nets. The experimental results indicate that proposed mapping algorithm reduces communication cost, average hop count, and end-to-end latency as compared to CPNN especially for large mesh NoCs. Moreover, proposed scheme achieves up to 6% energy savings for smaller mesh NoCs. Further, results of formal modeling indicate that proposed model is workable and operates according to specifications.

Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm

Article

May 2015

With an increasing number of processing elements (PEs) integrated on a single chip, fault-tolerant techniques are critical to ensure the reliability of such complex systems. In current reconfigurable architectures, redundant PEs are utilized for fault tolerance. In the presence of faulty PEs, the physical topologies of various chips may be different, so the concept of virtual topology from network embedding problem has been used to alleviate the burden for the operating systems. With limited hardware resources, how to reconfigure a system into the most effective virtual topology such that the maximum repair rate can be reached presents a significant challenge. In this article, a new approach using a maximum flow (MF) algorithm is proposed for an efficient topology reconfiguration in reconfigurable architectures. In this approach, topology reconfiguration is converted into a network flow problem by constructing a directed graph; the solution is then found by using the MF algorithm. This approach optimizes the use of spare PEs with minimal impacts on area, throughput, and delay, and thus it significantly improves the repair rate of faulty PEs. In addition, it achieves a polynomial reconfiguration time. Experimental results show that compared to previous methods, the MF approach increases the probability to repair faulty PEs by up to 50% using the same redundant resources. Compared to a fault-free system, the throughput only decreases by less than 2.5% and latency increases by less than 4%. To consider various types of PEs in a practical application, a cost factor is introduced into the MF algorithm. An enhanced approach using a minimum-cost MF algorithm is further shown to be efficient in the fault-tolerant reconfiguration of heterogeneous reconfigurable architectures.

Execution Trace--Driven Energy-Reliability Optimization for Multimedia MPSoCs

Article

May 2015

Multiprocessor systems-on-chip (MPSoCs) are becoming a popular design choice in current and future technology nodes to accommodate the heterogeneous computing demand of a multitude of applications enabled on these platform. Streaming multimedia and other communication-centric applications constitute a significant fraction of the application space of these devices. The mapping of an application on an MPSoC is an NP-hard problem. This has attracted researchers to solve this problem both as stand-alone (best-effort) and in conjunction with other optimization objectives, such as energy and reliability. Most existing studies on energy-reliability joint optimization are static-that is, design time based. These techniques fail to capture runtime variability such as resource unavailability and dynamism associated with application behaviors, which are typical of multimedia applications. The few studies that consider dynamic mapping of applications do not consider throughput degradation, which directly impacts user satisfaction. This article proposes a runtime technique to analyze the execution trace of an application modeled as Synchronous Data Flow Graphs (SDFGs) to determine its mapping on a multiprocessor system with heterogeneous processing units for different fault scenarios. Further, communication energy is minimized for each of these mappings while satisfying the throughput constraint. Experiments conducted with synthetic and real SDFGs demonstrate that the proposed technique achieves significant improvement with respect to the state-of-the-art approaches in terms of throughput and storage overhead with less than 20% energy overhead.

An Efficient Topology Reconfiguration Algorithm for NoC Based Multiprocessor Arrays

Conference Paper

Nov 2013

To realize the reliability of a high-performance multiprocessor system with a reconfigurable interconnect, there is a need to compute a interconnect topology that will allow for a high-throughput load distribution on top of the physical processor array. In this paper, we investigate the problem of topology reconfiguration for Network on Chip (NoC) based multiprocessor arrays with faulty processing elements (PEs). We propose two types of shift operations, i.e. row bi-shift operation and column shift operation, for redistributing fault-free PEs of a processor array in reconfiguration. We solve the topology reconfiguration problem by developing two efficient algorithms. The first algorithm, denoted as CRS, is able to generate a logical topology of desirable communication performance by alternately performing the two shift operations. The second algorithm revises the initial topology produced by CRS to further improve the communication performance, using tabu search techniques. Experimental results validate the efficiency of the proposed algorithm in comparison to previous approaches. For 16*16 physical arrays with 30% faulty PEs, the proposed approaches improve existing algorithms up to 39% in terms of message latency and congestion.

Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform

Abstract and Figures

Recommended publications

Modeling Yield, Cost, and Quality of a Spare-Enhanced Multicore Chip

A Dynamic Resource Allocation Strategy for NoC Based Multicore Systems with Permanent Faults

A Permanent Fault Tolerant Dynamic Task Allocation Approach for Network-on-Chip based Multicore Syst...

Task Mapping and Scheduling for Network-on-Chip based Multi-core Platform with Transient Faults

Deadline and Energy Aware Dynamic Task Mapping and Scheduling for Network-on-Chip based Multi-core P...